TLDR: open source projects spend at least 15.27% of their time paying Technical Debt. That is almost a day per week.
How much time should you invest in refactoring and cleaning your code during your sprints?
I wanted to write about a question that we face internally and that we get asked all the time: how much of a priority is technical debt? In other words: how much time should you and your team dedicate to paying back technical debt and investing in code cleanup?
Giving too little importance, the project collapses under its own un-maintainability and requires a rebuild. Giving too much importance, you don’t ship any features and business doesn’t grow.
There must be a tradeoff. The question is: how much of your time should you dedicate cleaning code?
To try to answer this question, let’s look at open source code. It is believed that Open source has better code quality than proprietary code (according to Coverity). One of the reasons might be that “Sunlight is the best bleach”.
Also very important is the fact that best practices that start in the open source end up permeating the rest of the closed source industry.
Here’s how we’re going to answer this:
We extracted the opened and closed issues from Github open source projects during the month of October of 2015. Github is kind enough to make their data available and queryable via the GithubArchive.
We then separated all the issues that had a tag focused on technical debt (like “refactoring” or “cleanup”).
We found a total of 4031 technical debt issues opened or closed during October 2015 corresponding to 1025 open source projects.
We discarded all the projects that don’t label their issues, that didn’t have technical debt labels or that were too young (less than 10 issues created). This eliminated thousands of issues (and a few hundred projects) that could give a different picture. However, with this sampling, we get a clearer view of the true active open source contributions.
With this in mind, we can now start tinkering with the data.
Let’s introduce a ratio to helps us make sense of the data:
Tech Debt Issues% = Number of closed technical debt issues / Total number of closed issues
With this ratio, we can see the percentage of tech debt issues closed and understand how open source projects prioritized their work regarding technical debt during the month of October 2015.
Technical debt % issues closed of popular open source projects
Technical debt issue % for popular projects. As you can see above, some of the most popular open source projects invest a considerable amount of time in technical debt issues.
We don’t know if this is a good thing or a bad thing (it’s not really the point of this exercise to say that). What is important is to understand that technical debt is a priority for some of the most popular open source projects.
We see projects spending a third (33%) of their time with technical debt. A third of a week is a considerable investment.
We then turned our focus on the remaining projects. Here’s what we found:
On average, 15.27% of closed issues are about technical debt.
That is almost a day of the week dedicated to technical debt.
This is important because it tells us how much time we should probably be investing as well. If the open source is the driver of best practices, and there are evidences showing it has greater code quality than the rest of the industry, then it is wise to try to copy their methods.
45% of projects have at least 10% of closed technical debt issues
16% of projects have at least 30%
4.5% invest more than 50%
We need take into consideration different project realities, programming languages, contribution sizes, ages, etc. And this leads us into a different question that is also very pertinent.
An important question is: why? Why are the reasons behind prioritizing technical debt over other issues.
This doesn’t take the value out of observing how much time projects are investing in technical debt.
We won’t be able to complete clear this answer, as it is complex. But let’s try to look at two factors we could extract from the data
Does the time of the month influence the priority of technical debt?
Short answer: no.
We extracted the day of the month the issues were opened and closed. The objective was to see if the day of the month of week had any influence into the priority of technical debt issues.
The charts below are in logarithmic scale
When we observe the closing issue schedule, we clearly see weeks appearing in the saw like chart.
However, when we look at the ratio of closed tech debt issues, no clear pattern emerges. It’s quite even throughout the month. Hence it’s no clear indicator that time is an influence into how people prioritize their technical debt.
We did the same analysis for opened issues and found similar results.
Maybe a more interesting analysis of time would be to look at a full year of technical debt for open source projects.
Does activity (total number of issues) predict priority on technical debt?
Short answer: no.
We tried then to plot the ratio of closed technical debt issues vs the total number of issues created in each project. The objective was to try to see if more activity (more issues created) had any influence over the focus in improving code quality.
In the chart on the left, we see surprisingly that it’s almost the opposite. As a project has more issues, the ratio of technical debt closed issues appears to go down.
However this can be easily explained by the sheer number of issues closed/opened by the projects with bigger activity.
Due to so much dispersion, the correlation is very weak and so few conclusions can be made.
A more interesting analysis would be to see project age and understand if that has any impact over the ratio.
We wanted to answer: how much time should we invest in technical debt?
We tried to answer by looking at open source contributions, namely the ratio of closed tech debt issues vs total closed issues per project.
We found, in our sampling, that projects are spending on average 15.27% of their tickets in technical debt. This means almost a day per week dedicated to code quality.
We tried to find a few reasons why but, despite being interesting, both were not good predictors of the ratio.
What does it mean for us? The next time you’re assigning your technical debt tickets for your sprint, at least 15% can be a good estimate.
Until next time!
Edit: We just published an ebook: “The Ultimate Guide to Code Review” based on a survey of 680+ developers. Enjoy!
Codacy is used by thousands of developers to analyze billions of lines of code every day!
Getting started is easy – and free! Just use your GitHub, Bitbucket or Google account to sign up.