Home Developer Code review comments: should 20% be about style and best practices?

Code review comments: should 20% be about style and best practices?

Author

Date

Category

Code review comments on style and best practices make up at least 20% of software development time reviewing code [1]. However, is this the optimal proportion of time to spend towards this area of code review?

If you are not currently dedicating the time to code review in general, you should be. This is because code reviews have become an integral part of our modern development workflow [2],[3].

But code review can be ineffective.
Your effectiveness to spot defects (bugs) is reduced after reviewing 200 lines of code in 1 hour[4].
This previous paper also makes this obvious: if you’re reviewing 50 lines of code you will find many issues; if you’re reviewing 1M lines of code this is you:

Another way they can be ineffective is by being over concerned about style and best practices in your code review comments.

After many conversations with engineers, I have a slight feeling we are spending too much time focusing on style, format and best practices.

So this is our question:

What percentage of comments are about code style, formatting issues and best practices?

Process

To find interesting data points, I followed this process:

  1. Download a month’s worth of Github’s open source activity
  2. Extract all pull request comments
  3. Identify some patterns on a smaller sample
  4. Try to count those patterns in the whole data set

So in essence, we’re searching for patterns in Github’s activity.

Github kindly offers their data to do analysis on it through githubarchive.org.

I downloaded March activity and extracted all the pull request comments done on open source projects. You can find 273416 pull request comments here.

Then, I tried to find the most recurring phrases in a subset of smaller comments. I filtered the data set per size and compared every string against each other using Jaccard’s Index.

Data

Here are the emerging patterns from the selection:

emerging patterns from the selection

List of comments and the number of times they appeared similarly on the sampled data setThis is interesting because it shows our formatting issues on the top.

This is a very limitative view. The reason why is because I filtered the data set greatly to achieve results quickly. The complexity of the operations is O(N²) and I didn’t want to wait for 74B comparisons. So the data set was filtered to have only strings between 20 and 30 characters.
Smaller comments may also be more targeted towards styling and formatting issues.
So, while tempting, we shouldn’t take any conclusions from this table.

However, this table gave us an enough interesting picture to proceed.

Table targeting smaller comments

We see that there are certain words that these comments are using that evidence the nature of the comment.

By counting the words that can reflect the intention of referring to style, format or best practices, we can have a better insight into how many of these comments exist.

And so we selected expressions from the previous findings and counted the number of comments that contained them.

We see that the number of comments with matches amount to 20% of the total number of comments.

Limitations

There are limitations to this analysis.

Any word can appear more than once in a comment. Given the nature of the words analyzed, I think this could be a good enough approximation.

I stopped my word counts after getting the round number of 20 percentage but this could in fact be much higher. There are many best practices comments that are not represented by the keywords they contain.

Conclusion

I wanted to study Github pull request comments and how many of these comments are related to styling issues.

Finding 20% of these comments being related to styling and best practices is a good evidence that we’re concerned about the way our code looks.

It is my opinion we should move towards complete automation and reduce the time invested in enforcing these rules.


For more blogs on code review check out How Code Review Increases Developer Productivity and Best Practices.

Also, we just published an ebook: “The Ultimate Guide to Code Review” based on a survey of 680+ developers. Enjoy!


References

1: http://www.quora.com/How-much-per-day-or-week-do-engineers-spend-doing-code-review-at-companies-such-as-Google-Facebook-GitHub-Twitter-Foursquare-etc
2: http://blog.codinghorror.com/code-reviews-just-do-it/
3: http://blogs.atlassian.com/2014/03/every-team-needs-kick-ass-code-reviews/
4: http://www.pitt.edu/~ckemerer/PSP_Data.pdf


About Codacy

Codacy is used by thousands of developers to analyze billions of lines of code every day!

Getting started is easy – and free! Just use your  GitHub, Bitbucket or Google account to sign up.

GET STARTED

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Subscribe to our newsletter

To be updated with all the latest news, offers and special announcements.

Recent posts

New Public Product Roadmap – See What's In Store For Q1

We’re excited to release a public product roadmap for 2020.  Now, those outside Codacy can see what our team is working on...

4 Trends To Boost Software Quality in 2020

Today’s abundance of technology has created an increased demand for higher software quality. With companies aware of this, spending on IT worldwide...

7 Reasons Dev Teams Choose Codacy Over Other Automated Code Review Tools

Today more than 100,000 developers at leading global organizations like PayPal and Adobe across industries use Codacy (read customer case studies). ...

Introducing GitHub Sync For Seamless User Management

Announcing GitHub Sync: Synced Organizations For GitHub We are thrilled to announce "GitHub Sync" -- automatic user and repository...

Ringing In 2020 With Codacy

It’s been an exciting 2019 at Codacy and we expect more excitement in 2020. Over the last 12 months we’ve built out...