Saturday 28 July 2012

Routes to Faster Java Compilation

My recent post comparing the compilation speed of the ecj batch compiler and javac hints at the topic that I've been starting to explore in my copious free time: how to get a faster build out of a java project. This matters to me a lot, since I spend the majority of my day not answering emails coding in Java for an OSS project. Here's where I'm at now:

  1. Develop as much as possible in the IDE, including running tests. If there's some way to avoid doing a command line build, then avoiding it allows faster progress. On the Selenium project there's a reasonable amount of effort invested to make this possible. It's mostly successful, but not as fast as I'd like because...
  2. Minimize the dependencies in your code base. The IDE based test runs I do often need to shell out to build things like firefox extensions. That kills the performance of the build and makes doing end-to-end testing far less pleasant.
  3. Delete dead code. The less there is to compile, the better.
  4. Evaluate the tool chain. Past assumptions don't always hold true, so revisit them from time to time.
  5. Going from Java to a JAR is almost always the right approach, particularly if you're dealing with a large number of files. Running "stat" on each java file and class file and output JAR file is often less efficient than running "stat" on just the java files and the output JAR. However...
  6. Use an SSD. Compiling code is an exercise in small and random reads and writes. A spinning platter disk isn't the best choice for this. I need to rerun my checks to see if the statting of class files is now worth the extra effort.
  7. Get more memory. Your OS will cache things pretty aggressively, but hitting swap will murder your build times.
  8. More memory and an SSD offset many of the disadvantages of a slower CPU, particularly in a single-threaded build.
  9. Build in parallel. If you don't, then make sure the clock speed of your CPU is as high as possible.
Now, I'd love to say that I do all of these things, but I can't. In particular, I don't build in parallel, which is a real waste of 7 of the cores in my personal machine (or even more for my machine at work) The main reason for that is that the build tool that I use (rake) has fairly blunt support for running in parallel and the layers of abstraction wrapped around it make figuring out when to use multitasks harder.

The thing that sticks with me the most, however, is that compilation speeds are fastest when the compiler only has a few files to deal with. With Java, that leads to having lots of small targets rather than one massive glob of the entire file system. Or it would if the way that determining whether a JAR had changed was determined by something other than the last modified time of the file. There's an additional wrinkle: how do you avoid a small change in a method (say adding a logging statement) from causing a complete recompilation of everything that depends on the JAR that contains that method? Hmmm... I wonder....