2013-08-06 12.12.19 HDR-2

Optimizing code in Scala

This is a blog post of our Code Reading Wednesdays from Codacy (http://www.codacy.com): we make code reviews easier and automatic.

I’ve been busy in Codacy making things run faster.
We’ve been hunting down performance bottlenecks for some time now since our platform has been seeing more and more users registered.
This is a small catchup of my experience running different tools and how I ended up using first order logic to make things faster.

Troubleshooting performance in Scala

There are a limited number of profiling tools for Scala.
The big problem in using profiling tools designed for Java in Scala is that you lose some of Scala’s abstractions. In other words, there isn’t a 1 to 1 relation between what the tools tell you, what’s taking too long in your code or what makes it hard to understand.
Furthermore, I wanted to be able to measure execution times, memory use, number of threads and actor load distributions.

These are two tools that I think were useful and that I see myself using in the near future to understand what’s going on internally. Futhermore, you can use them for free right now.

There are other tools which are missing in this list (like the Typesafe console) but that we’re certainly look into them in the near future.

Note that these were not ran in production; they executed locally in my machine.

Visualvm

visualvm

Visualvm is a great choice for profiling a JVM application. It’s a visual tool integrating several command line JDK tools and lightweight profiling capabilities.
The snapshot above shows the tool running on our code (executing some heavy code analysis).

VisualVM allows you to take samples of your app’s memory and cpu usage and then navigate through it. There’s also a tight integration with JVM (e.g. you can issue Garbage Collection whenever you want) which is good for low-level monitoring.
However, as I previously mentioned, the capacity to match Scala’s abtractions is of great value and unfortunately VisualVM does not yet support this.
In this sense, the next tool proved to be much more interesting.

Takipi

Screenshot 2013-12-11 15.59.36

Takipi is a product designed for performance monitoring and profiling in Scala. Above is also a screenshot of Takipi executing on our code.

Compared to VisualVM, Takipi runs as a service which means that you install a java agent plugged into your JVM which then sends information to their servers. Although sounding complicated, the installation was really really painless: you execute a bash script one-liner and you’re ready. We were all surprised in how quick Takipi was to set up.
Takipi allows you to see Exceptions and Errors caught. The interface is really pleasant as well.
What I liked most is that Takipi provides good support Scala. This means that when Exceptions happen, you see them exploding in your own code instead of a intermediary Java representation between Scala and the JVM.
Overral this is a great tool and free up to 2 servers for Scala.

Real scenario optimization

I wanted to share a real case scenario using the tools described above.

We caught a case where we were filtering a lot of classes generated by our parsing mechanisms which was taking a long time to process.
This case was identified by seeing too much calls for the same method.

In the root of the problem there was a function which behaved as a predicate for a filtering mechanism which was checking sequences’ (Seqs) elements for validity.
Because we used a lot of exists and forall which are methods of Scala collections which take predicates, we discovered that our predicate function actually mapped beautifully to First Order Logic

2013-08-08 09.31.16 HDR

Above is the direct translation of our function to FOL. Kinda awesome right?

Regarding the performance fine tuning, it turned out that there were much more cases for which this predicate was not true.

Because we use an exist, we were effectively passing by the entire sequence.

So we hypothesised that, since we are seeing much more cases where that condition is not met (hence triggering a full check of the sequence), switching the exists for a forall would make performance better in the long haul.
This however had to have implications the way the predicate was being used.
Instead of looking for falses in huge truth lists, we started looking for trues in huge truth lists (big thanks to Workaphobia for reviewing this and provifing this reasoning)

After applying our long and load testing analysis to it, we registered the performance improvements:

2013-08-09 19.13.34 HDR

It’s not everyday you get to use your old CS lessons.

Rodrigo (@rtfpessoa)


Brought to you by the makers of Codacy (http://www.codacy.com): an automated code review tool focused on giving you code analysis results that matter.
We simplify and save time of your code reviews and pull requests.

Follow us at https://twitter.com/codacy