Picture a city skyline, where skyscrapers rise over time. In the urban sprawl, new buildings are erected to meet the demands of population growth, contributing to the intricate and dynamic nature of the city.
Software evolves similarly. As the project grows, so do user requirements. Development teams must continue to build new features and functionalities on top of and around existing ones to meet these demands. As a result, your code becomes increasingly complex.
As city planners must grapple with balancing architectural innovation with maintaining navigable streets and other living standards, software developers face a similar dilemma. Without strategic planning and thoughtful design, software can quickly become a convoluted labyrinth, difficult for developers to traverse and maintain.
Code complexity is almost unavoidable in modern software development. But code complexity doesn’t automatically equate with problems and poor code quality—not if your team can measure and reduce it effectively.
What is Code Complexity?
Code complexity refers to the intricacy and sophistication of a software program. It’s defined by how challenging it is to understand, modify, and maintain the code.
The definition itself is somewhat relative. For example, a senior and junior developer can analyze the same code and have different opinions on its complexity. To make things more objective, specific aspects of code are used to define its complexity. Factors such as the number of lines of code, loops, nested structures, and conditional statements are all considered.
Here’s an example:
def calculate_fibonacci(n):
if n <= 1:
return n
else:
return calculate_fibonacci(n - 1) + calculate_fibonacci(n - 2)
This simple recursive function calculates Fibonacci numbers, but it exhibits high complexity due to its recursive nature. As n increases, the number of recursive calls grows exponentially, making it less efficient and more challenging to comprehend.
Managing such complexity often involves refactoring the code to use more efficient algorithms or iterative approaches. So, the more unwieldy your code becomes and the more effort it takes to maintain and debug, the more complex it’s considered.
In large projects, codebase size is often enough to make debugging and maintenance difficult. The job becomes even more arduous as the code becomes more complex through changes like introducing multiple dependencies or when the code receives multiple paths through conditions.
What Causes Code Complexity?
Dev team leads shouldn’t blame developers for rising code complexity. It’s part of the job. Even if every line of code your developers add is perfectly formatted, simple, and efficient, complexity increases over time due to sheer volume.
Sometimes, the leading cause of increased complexity relates to a design decision made when planning the project before a single line of code was written.
The reasons you’re seeing an increase in code complexity can be diverse. However, some of the most common causes include:
- Poor code clarity and readability, which can become a bottleneck in the production process.
- Poor documentation can compel engineers to revisit tasks, leading to accidental complexity. Good documentation can prevent code overlap and duplication and provides insight into the rationale behind code segments.
- Problematic architectural decisions made at the project's inception or during pivotal moments in development can have a negative impact on future code writing style, testability, and overall structure. Some examples of architectural decisions that could result in increased code complexity include:
- Opting for a monolithic architecture, where the entire application is built as one, tightly integrated unit.
- Choosing a framework that is either overly complex for the project's needs or rigid and challenging to customize.
- Neglecting to consider scalability during architectural planning.
- Opting for a monolithic architecture, where the entire application is built as one, tightly integrated unit.
- Poor resource allocation, like assigning tasks to engineers lacking the expertise to perform them properly.
- Incorporating legacy code into a modern system can introduce complexities due to outdated practices, different coding styles, and potential incompatibilities.
- Skipping or inadequately performing code reviews can result in the accumulation of suboptimal code.
- Feature creep—continuously adding new features without a well-defined plan.
- Poor version control practices, such as infrequent commits, inadequate branching strategies, or inconsistent commit messages.
- Managing dependencies poorly, including outdated libraries or overly complex dependency trees.
Each of these factors, when left unaddressed, has the potential to contribute to the complexity of a codebase. The first step to proactively managing these aspects is learning to measure their impact properly.
How to Measure Code Complexity
While every team doesn’t need to follow the same formula for keeping track of the growing complexity of their code, there are a variety of common metrics that development teams track to measure code complexity.
Cyclomatic Complexity
In 1976, Thomas McCabe proposed a metric for calculating code complexity called Cyclomatic Complexity, defining it as “a quantitative measure of the number of linearly independent paths through a program’s source code” that’s “computed using the control flow graph of the program.”
A control flow graph is like a map that helps developers understand the flow of a computer program or a set of instructions. It’s used to visualize how the program's instructions are executed, helping developers analyze and understand the different paths the program can take based on conditions, loops, and other control structures.
Imagine you have a recipe to bake a cake. Each step in the recipe is like a statement in a computer program. A control flow graph acts as a diagram that shows the order in which you follow the steps in the recipe. Each step is represented by a node (point) in the graph, and arrows connect the nodes to show the sequence of steps.
When calculating Cyclomatic Complexity, you’re drawing a control flow graph while using the following formula:
M (cyclomatic complexity) = E (number of edges) − N (number of nodes) + P (number of connected components).
The fewer the paths through a piece of code, and the less complex those paths are, the lower the Cyclomatic Complexity of the code. To demonstrate the metric, let’s use three somewhat arbitrary Go code examples.
Example 1:
func main() { fmt.Println("1 + 1 =", 1+1) }
As there’s only one path through the function, it has a Cyclomatic Complexity score of 1, which we can find by running a tool like gocyclo on it.
Example 2:
func main() {
year, month, day := time.Now().Date()
if month == time.November && day == 10 && year == 2018 {
fmt.Println("Happy Go day!")
} else {
fmt.Println("The current month is", month)
}
}
In this example, we’re retrieving the current year, month, and day. With this information, we check if the current date is the 10th of November 2018 with an if/else condition.
If it is, the code prints “Happy Go day!” to the console. If it isn’t, it prints “The current month is” and the current month's name. The code example is made more complicated as the if/else condition is composed of three sub-conditions. Given that, it has a higher complexity score of 4.
Example 3:
func main() {
_, month, _ := time.Now().Date()
switch month {
case time.January:
fmt.Println("The current month is January.")
case time.February:
fmt.Println("The current month is February.")
case time.March:
fmt.Println("The current month is March.")
case time.April:
fmt.Println("The current month is April.")
case time.May:
fmt.Println("The current month is May.")
default:
fmt.Println("The current month is unknown.")
}
}
In this example, we’re printing out the current month based on the value of the month retrieved from the call to time.Now().Date(). There are seven paths through the function, one for each of the case statements and one for the default.
As a result, its Cyclomatic Complexity is 7. However, if we’d accounted for all the months of the year, along with a default, its score would be fourteen. That happens because gocyclo uses the following calculation rules:
1 is the base complexity of a function
+1 for each ‘if’, ‘for’, ‘case’, ‘&&’ or ‘||’
Using these three examples, we can see that by having a standard metric for calculating code complexity, we can quickly assess how complex a piece of code is. We can also see how different complex sections of code compare. However, Cyclomatic Complexity is not enough on its own.
Halstead Volume
Halstead Volume is a metric that analyzes the code's structure and the vocabulary used to gauge its complexity. The formula for calculating Halstead Volume is:
V (Halstead Volume) = N (Program length) * log2(n), which n = program vocabulary.
Program length refers to a computer program's total number of operators and operands. It encompasses both the unique operators and operands as well as their repetitions.
The more lines of code or instructions a program contains, the higher its program length.
Vocabulary size is the count of unique operators and operands in a program. It represents the diversity of the code's components. In Halstead Volume, the larger the vocabulary size, the more varied and potentially complex the code is considered to be.
Unique operators and operands are the distinct building blocks of a program. Operators are symbols that perform operations, and operands are the entities on which the operations are performed. Unique operators and operands contribute to the overall vocabulary size, and their variety influences the program's complexity.
In summary, Program length reflects the total number of code elements, vocabulary size measures the diversity of these elements, and unique operators and operands represent the distinct building blocks that make up a program. Halstead Volume combines these factors to assess the complexity of a software system by providing insight into the effort required to understand and maintain a software system.
Lines of Executable Code (LOC)
Lines of Executable Code (LOC) is a straightforward metric that counts the total number of lines in a program's source code that contribute to the actual execution of the software. It includes lines of code containing instructions, statements, and expressions, but typically excludes comments and blank lines.
LOC is a basic measure of the program's size and can be used to estimate development effort and maintenance requirements. However, it's essential to note that LOC alone doesn't provide a complete picture of code quality or complexity.
Coupling and Depth of Inheritance
Coupling is the degree of interdependence between different modules or components in a software system. Depth of Inheritance measures the number of levels in a class hierarchy.
High coupling can indicate a tight connection between different parts of the code, making it more challenging to modify or maintain. Similarly, a deep inheritance hierarchy may increase complexity and potential difficulties in understanding and extending the software.
Monitoring and managing coupling and depth of inheritance are crucial for creating modular, maintainable, and scalable software architectures. Techniques like loose coupling and limiting inheritance depth contribute to more flexible and understandable codebases.
Loose coupling is a design principle in software development that promotes minimal dependency between different modules or components within a system. In a loosely coupled system, changes to one module have minimal impact on other modules, and each module operates independently.
Limiting inheritance depth is a practice that involves controlling the number of levels or layers in a class hierarchy. In object-oriented programming, classes are organized into hierarchies where subclasses inherit properties and behaviors from their parent classes.
However, deep hierarchies can lead to increased complexity and reduced code maintainability.
By limiting the depth of inheritance, developers aim to create simpler and more understandable class structures.
Maintainability Index
The Maintainability Index is used to gauge the ease with which a computer program can be maintained and modified. The index is calculated via a formula that considers the Halstead Volume, Cyclomatic Complexity, depth of inheritance, and the percentage of comments in the code and is as follows:
Maintainability Index = 171 - 5.2 * ln(Halstead Volume) - 0.23 * (Cyclomatic Complexity) - 16.2 * ln(Lines of Code)
A higher Maintainability Index indicates a more maintainable codebase. This metric assists software developers and managers in assessing the potential challenges and costs associated with maintaining and evolving a given software system.
By considering factors like code volume, complexity, and documentation, the Maintainability Index provides insights to support decisions to improve a software project's long-term sustainability.
Cognitive Complexity
Cognitive Complexity evaluates the understandability and readability of source code by considering the cognitive effort required for a human to comprehend it. It goes beyond traditional metrics like lines of code or cyclomatic complexity and considers factors such as nested control structures and logical operators.
Nested control structures refer to the situation where one or more control flow statements (such as if, while, for, etc.) are placed inside another. For example, having an if statement inside another if statement creates a nested structure. The depth of nesting increases with each additional layer of control structures. While some nesting is inevitable and necessary, excessively deep nesting can make code more complex and harder to understand.
Logical operators are symbols or words used to combine or modify logical statements. The most common logical operators are AND (&&), OR (||), and NOT (!). They are often used in conditional statements to create more complex conditions by combining simpler ones.
In the context of cognitive complexity, nested control structures and complex logical expressions contribute to the overall difficulty of understanding code. Minimizing unnecessary nesting and simplifying logical conditions can improve code readability and make it more maintainable.
The Cognitive Complexity score is calculated by assigning weights to various programming constructs and their nesting levels. For example, simple constructs like a single if statement might have a lower weight, while complex nested conditions or loops would contribute to a higher cognitive complexity score.
The goal of measuring cognitive complexity is to identify code sections that might be challenging for developers to understand, leading to potential maintenance issues.
Rework Ratio
The rework ratio is a software development metric that measures the amount of rework or corrective work done on a project in relation to the total effort expended. It provides insights into the efficiency and quality of the development process. The formula for calculating the rework ratio is as follows:
Rework Ratio = Total Effort/Effort on Rework ×100
Here, "Effort on Rework" represents the time and resources spent on fixing defects, addressing issues, or making changes after an initial development phase, and "Total Effort" is the cumulative effort invested in the entire project.
A high Rework Ratio suggests that a significant portion of the project's effort is dedicated to correcting errors or making adjustments, indicating potential inefficiencies or issues in the development process.
Monitoring the rework ratio allows project managers and teams to identify areas for improvement, enhance quality control measures, and ultimately reduce the need for extensive rework, leading to more efficient and successful software development projects.
Switch Statement and Logic Condition Complexity
Switch statements are control flow structures in programming that allow a variable or expression to be tested against multiple values. It provides a concise way to deal with multiple cases or conditions in an efficient manner.
The switch statement typically involves a series of cases, each containing specific code to be executed if the tested variable matches a particular value. It can be a cleaner alternative to a series of nested if-else statements, especially when dealing with multiple mutually exclusive conditions.
Logic condition complexity refers to the level of intricacy and difficulty in understanding the logical conditions present in a piece of code. This complexity can arise from the use of multiple conditions, logical operators (such as AND, OR), and nested structures.
Efficient use of switch statements and managing logic condition complexity are crucial aspects of writing clear and maintainable code in various programming languages.
In this code example, we’re taking the second Go example from earlier and splitting the compound if condition into three nested conditions; one for each of the original conditions.
func main() {
year, month, day := time.Now().Date()
output := fmt.Sprintf("The current month is %s", month)
if month == time.November {
if day == 13 {
if year == 2018 {
output = fmt.Sprintf("Happy Go day!")
}
}
}
fmt.Println(output)
}
Now let’s build on this by considering the following three question:
- What if we had, as we do above, multiple, complex if conditions?
- What if we had multiple if conditions and the code in the body of each one was complex?
- Would the code be easier or harder to understand?
Generally speaking, the greater the number of nested conditions and the higher the level of complexity within those conditions, the higher the complexity of the code.
Software Developer Skill Level
What about the skill level of the developer? Have a look at the C version of the second Go example below.
#include <stdio.h>
#include <time.h>
#include <string.h>
int main()
{
time_t t = time(NULL);
struct tm tm = *localtime(&t);
const char * months[12] = {"January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"};
if (tm.tm_year == 2018 &&
strncmp(months[tm.tm_mon], "November", strlen(months[tm.tm_mon])) == 0
&& tm.tm_mday == 10)
{
printf("Happy C Day!.\n");
} else {
printf("The current month is %s.\n", months[tm.tm_mon]);
}
}
Technically, it does what the other examples do. However, it requires more code to achieve the same outcome. To be fair, if the coder had greater familiarity with C, the code might be no longer than the Go example.
However, let’s say this is the minimum required to achieve the same outcome. Given the more verbose nature of C’s syntax compared to Go, it’s harder to understand if you compare the two.
What’s more, if you had no prior experience with C, despite a comparatively similar Cyclomatic Complexity score, what would your perception be? You would probably consider the code more complicated regardless of their similar scores, right?
That’s why the developer plays an essential role in defining code complexity and your team’s ability to measure it accordingly.
Benefits of Measuring Code Complexity
Measuring code complexity is an important aspect of software development that offers numerous benefits, including:
- Better Code Quality: Measuring code complexity allows developers to identify potential problems early in development. This, in turn, helps to improve code quality by ensuring that issues are addressed before the code is deployed in production.
- Improved Maintainability: Code complexity directly relates to the ease with which code can be maintained and updated. By measuring code complexity, developers can identify areas of the code that are difficult to maintain and improve. This can help reduce the cost of maintaining the code over time.
- Reduced Bugs: Complex code is more prone to errors and bugs. By measuring code complexity, developers can identify areas of the code that are likely to cause problems and focus their testing efforts accordingly. This can reduce the number of bugs and improve the overall reliability of the code.
- Faster Development: Measuring code complexity can also help to speed up development. By identifying areas of the code that are particularly complex, developers can focus on simplifying these areas, which can reduce the amount of time required to write, test, and deploy the code.
- Improved Collaboration: Code complexity metrics can identify areas of the code that are particularly complex and require additional expertise. This can help improve collaboration between team members, as developers with specific skills can be brought in to help with particularly complex code areas.
- Better Code Documentation: By measuring code complexity, developers can identify areas of the code that are particularly complex and require additional documentation. This can help ensure the code is well-documented and easier to understand for other developers needing to work with it in the future.
- Better Tests: By knowing how many independent paths exist through a piece of code, you can understand how many paths there are to test. This enables us to have a higher level of code coverage. By knowing how many code paths there are, you can also know how many paths need to be tested. As a result, you have a measure of how many tests are required, at a minimum, to ensure that your code is properly covered. Automated code review tools are still underutilized by teams. According to our State of Software Quality survey, Over 40% of teams still conduct unit and frontend testing manually.
- Lower Costs: When the risk of potential defects is reduced, fewer defects can be found and removed. As a result, the maintenance cost is also reduced.
- Learning: Helping developers learn and grow is also a benefit of understanding why their code is considered complex.
How to Reduce Complexity of Code
Creating code for a feature-rich application or software is never simple. The complexity of code is a significant consideration in this process, making it crucial for developers to exercise caution during coding, testing, or upgrades to prevent unnecessary intricacies.
To simplify and reduce code complexity, consider the following strategies:
- Create Clear Requirements: Clear and well-documented requirements form the foundation of any software project, preventing feature creep and unnecessary complexities. Establishing a clear roadmap from the outset helps maintain project focus and clarity. Make following adopted coding standards an integral part of your requirements.
- Prioritize Modular Design: Breaking down software into smaller, self-contained modules facilitates better complexity management. Each module can focus on specific functionality, enhancing understanding, development, and maintenance. This approach promotes code reusability and collaboration among developers.
- Prioritize Minimalism: Introduce dependencies and side effects only when necessary, as they contribute to system complexity. Libraries, although useful, can add complexity and potential issues. Minimize their usage or ensure their clear presence in the codebase.
- Get Rid of Useless Code: Identify and remove unnecessary code elements, such as unused classes, single-implementation interfaces, and redundant design patterns, as they contribute to code complexity.
- Improve Documentation: Well-documented code, APIs, and architecture choices provide insights for software structure and functionality. Good documentation aids in reducing complexity and enhancing performance, particularly for developers joining the project later.
Embrace Automated Code Analysis to Reduce Code Complexity
Managing code complexity is an ongoing challenge for developers striving for efficient, maintainable software. While the strategies outlined in this post provide valuable guidance, leveraging code analysis tools like Codacy can further enhance your efforts.
Codacy offers automated insights into code quality, helping identify areas of improvement and potential complexity pitfalls. Our platform uses Cyclomatic Complexity to identify files with complex methods in your repository. We also offer a long list of static analysis tools and code security analysis tools to improve your code quality, coverage, and security holistically.
Sign up for a free 14-day Codacy trial today to see how it works.