Tuesday, 13 March 2018

A Month in Selenium: February

January was a quiet month for Selenium hacking, but it laid the groundwork for February's efforts. These largely centred around code cleanup in the Grid server, and migrating the project to make better use of our own abstractions over JSON and HTTP.

Why do we have our own abstractions for these incredibly common tasks? There are two main reasons. The first is that we'd like freedom to be able to choose our underlying implementation for these things, without needing to extensively rework our own first-party code. The second is that third party libraries offer generalised APIs that need to meet the needs of all users, whereas we have very specific needs met by these APIs and may need to work around some of the sharp edges (for example, in the java code, lots of classes that need to be serialised to JSON have a toJson method that GSON knows absolutely nothing about). This is typically done by writing adapters.

We started using the Apache HttpClient by default as it's the HTTP library used by HtmlUnit, which we used to ship as part of the core Selenium distribution. In keeping with the other drivers out there, the HtmlUnit team now work on the HtmlUnitDriver, so it's no longer kept in the main project source repo. The interesting thing is that since we made the choice a long, long time ago to use the HttpClient, the HTTP standard has moved forward. HTTP/2 is now a thing. HTTP/2 support is coming as part of HttpClient 5. In order to take advantage of the new options and capabilities, we'd have to rework our existing abstractions anyway, so why not take a look around for something else to use? Better yet, if we use an HTTP library that isn't a dependency of one of our dependencies, we're less likely to end up with clashing versions.

One of the reasons that Java has a terrible reputation for start up speed is because people have massively bloated classpaths. As it stands, the Selenium standalone server weighs in at a portly 24MB. The Apache HttpClient weighs in at about 1.4MB of this total, before we do the update. After the update, the beta of 5 is a touch under 1MB. In comparison, OkHttp (which already supports HTTP/2) with its dependencies is approximately 500kb. In other words, OkHttp is smaller, already supports HTTP/2, and isn't a dependency of our dependency.

So, we switched the project to use OkHttp instead of the Apache HttpClient.

Within the client code, making this change was relatively trivial. The problem is that the server-side code had leaked Apache's APIs into the code. Before we can replace the Apache HttpClient, we need to first of all replace all those usages. That's made somewhat harder by the fact that it's exposed as part of the public APIs of various classes that other libraries extend.

Fortunately, we have a process for deprecating and deleting APIs. First of all, we mark the methods to be deleted as "deprecated" for at least one release. And then we delete them. Of course, if you're going to deprecate a method, you really should provide an alternative and migrate as many uses as can be found to use the replacements. A bulk of my work this month was spent making these changes.

Of course, we needed to do a release, so we lined up 3.9 to start the process. In order to do the release, we needed to actually build it. There had been reports of some issues building the release artefacts on Windows. To resolve this, I had to update our fork of Buck to pull in the latest changes from Facebook, and then to try and work around those issues. Naturally, the Buck developers aren't aware of our fork, so merging in their changes was a somewhat time-consuming affair. Once that I was done, I wrote what I thought was a fix and pushed a new version of our fork of Buck.

I didn't work. Oh well.

The final step in doing a release is trying to get our CI builds green. These take an incredible amount of time to run, and I wondered whether we could speed them up. Travis has support for caching, so it would be nice to use that. My attempts to use caching were foiled because the cache takes into account environment variables, which we use to separate our builds. There's a bug open in the Travis tracker to allow us to name builds, which would have allowed us to work around this, but it's still open. Ho hum. As a work around, I wrote a simple wrapper around Buck that we can call within our CI servers. This makes better use of Buck's ability to parallelise work automatically, and this has helped bring our build times down. Hurrah!