Home Developer Code review comments: should 20% be about style and best practices?

Code review comments: should 20% be about style and best practices?

Author

Date

Category

Code review comments on style and best practices make up at least 20% of software development time reviewing code [1]. However, is this the optimal proportion of time to spend towards this area of code review?

If you are not currently dedicating the time to code review in general, you should be. This is because code reviews have become an integral part of our modern development workflow [2],[3].

But code review can be ineffective.
Your effectiveness to spot defects (bugs) is reduced after reviewing 200 lines of code in 1 hour[4].
This previous paper also makes this obvious: if you’re reviewing 50 lines of code you will find many issues; if you’re reviewing 1M lines of code this is you:

Another way they can be ineffective is by being over concerned about style and best practices in your code review comments.

After many conversations with engineers, I have a slight feeling we are spending too much time focusing on style, format and best practices.

So this is our question:

What percentage of comments are about code style, formatting issues and best practices?

Process

To find interesting data points, I followed this process:

  1. Download a month’s worth of Github’s open source activity
  2. Extract all pull request comments
  3. Identify some patterns on a smaller sample
  4. Try to count those patterns in the whole data set

So in essence, we’re searching for patterns in Github’s activity.

Github kindly offers their data to do analysis on it through githubarchive.org.

I downloaded March activity and extracted all the pull request comments done on open source projects. You can find 273416 pull request comments here.

Then, I tried to find the most recurring phrases in a subset of smaller comments. I filtered the data set per size and compared every string against each other using Jaccard’s Index.

Data

Here are the emerging patterns from the selection:

emerging patterns from the selection

List of comments and the number of times they appeared similarly on the sampled data setThis is interesting because it shows our formatting issues on the top.

This is a very limitative view. The reason why is because I filtered the data set greatly to achieve results quickly. The complexity of the operations is O(N²) and I didn’t want to wait for 74B comparisons. So the data set was filtered to have only strings between 20 and 30 characters.
Smaller comments may also be more targeted towards styling and formatting issues.
So, while tempting, we shouldn’t take any conclusions from this table.

However, this table gave us an enough interesting picture to proceed.

Table targeting smaller comments

We see that there are certain words that these comments are using that evidence the nature of the comment.

By counting the words that can reflect the intention of referring to style, format or best practices, we can have a better insight into how many of these comments exist.

And so we selected expressions from the previous findings and counted the number of comments that contained them.

We see that the number of comments with matches amount to 20% of the total number of comments.

Limitations

There are limitations to this analysis.

Any word can appear more than once in a comment. Given the nature of the words analyzed, I think this could be a good enough approximation.

I stopped my word counts after getting the round number of 20 percentage but this could in fact be much higher. There are many best practices comments that are not represented by the keywords they contain.

Conclusion

I wanted to study Github pull request comments and how many of these comments are related to styling issues.

Finding 20% of these comments being related to styling and best practices is a good evidence that we’re concerned about the way our code looks.

It is my opinion we should move towards complete automation and reduce the time invested in enforcing these rules.


For more blogs on code review check out How Code Review Increases Developer Productivity and Best Practices.

Also, we just published an ebook: “The Ultimate Guide to Code Review” based on a survey of 680+ developers. Enjoy!


References

1: http://www.quora.com/How-much-per-day-or-week-do-engineers-spend-doing-code-review-at-companies-such-as-Google-Facebook-GitHub-Twitter-Foursquare-etc
2: http://blog.codinghorror.com/code-reviews-just-do-it/
3: http://blogs.atlassian.com/2014/03/every-team-needs-kick-ass-code-reviews/
4: http://www.pitt.edu/~ckemerer/PSP_Data.pdf


About Codacy

Codacy is used by thousands of developers to analyze billions of lines of code every day!

Getting started is easy – and free! Just use your  GitHub, Bitbucket or Google account to sign up.

GET STARTED

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Subscribe to our newsletter

To be updated with all the latest news, offers and special announcements.

Recent posts

Pair programming at Codacy and why we do it

Pair programming, also known as pairing or “dynamic duo” model is not a new concept, and it was pioneered by C/C++ guru...

Enhanced security for C++, Java, and Scala with Clang-Tidy and SpotBugs

As part of our effort to continue expanding our language support, we are excited to announce the support of two new tools...

Improve the efficiency of your remote engineering team

COVID-19 hit the ground running and the world felt the impact. Although tech companies seemed to be ahead of the curve by...

Further Enterprise security analysis for Scala

We’re excited to announce the latest addition to our suite of security analysis: Spotbugs. SpotBugs is a program which...

Free Codacy Pro account to fight COVID-19

Our hearts go out to everyone who has been directly or indirectly impacted by the global coronavirus (COVID-19) pandemic. We are committed...