Code Smells are indications in a program of deeper problems in the overall application design. In other words, a code smell is an indicator of something done wrong, or something that could be improved – usually by refactoring. Code smells don’t always affect an application’s performance, but may massively increase the difficulty of maintenance later on. In this series of posts, I’ll be looking at a few code smells: What are they? What are their symptoms and how do we deal with them?
Duplicate code is a nice easy one to start off with. Often, we need code to do a similar task as existing code. Rather than doing it properly and refactoring the code into its own method, we copy the code to where we need it and change it about a bit to work in our current scenario. This is not a good thing!
Later, when we need to make fundamental changes to the way our functionality works, we have multiple places where we need to make that change. What happens if we miss one? This kind of insidious error can be very difficult to track down and sporadic in its appearance. That said, there are several automated tools for a variety of languages designed to help track down and eliminate duplicate code.
There are 3 categories of duplicate code:
- Character Identical – Every character in the code is the same. This is the simplest and most obvious instance of copy and paste programming
- Token identical – Every token in the code is identical, with some variations (for instance removing an unnecessary token).
- Functionally Identical – Often the hardest one to spot. This is where some code performs the same basic function (eg. looping over an array of integers) over different data.
Duplicate code causes several problems:
- It makes it much harder to comprehend the purpose of the code. Using the example above, reading two loops and verifying that they both calculate a sum of an integer array is much harder than making an assumption about the “sumIntArray” method.
- It can lead to the Long Method code smell (more on this later in the series)
- It can cause strange behaviour in your application (where you have updated one instance of the duplicate but not another)
- It increases the size of the source code, taking up more disk space and increasing the amount of lines you have to wade through
So how do we deal with this?
A good principle to follow here is the “Don’t Repeat Yourself” principle (or “Single Source of Truth“). These principles suggest that you should have a single point where a given operation is performed (in practice, SSoT is extended to include data too, for example a user’s data should be stored in a single object and only references to this object should be stored elsewhere).
Using the example above, any time we need to sum an int array, we should call the sumIntArray method. If we later decide that we want to sum all but the first int in the array, we change this single method and we can be confident that the change is applied across our entire application. We’ve also saved ourselves some typing and increased the readibility of our code.
I’ve heard some programmers use a rule called the “Rule of 3“. This is a general guideline that states:
Code may be copied once, but when the same code appears three times it should be extracted into a new procedure
Following this rule gives a nice balance between eliminating this code smell and getting silly extracting everything into its own methods.