Rocket Powered

Many Months in Selenium: to November

2018-11-26T16:36:00.002+00:00

Well, it's been a long time since I sat down and wrote a post about the adventures in Selenium-land. Time for an update!

Since I last wrote, the work of updating the JSON handling code in the java tree has been completed, and it appears to be stable. However, it would be a terrible waste of time if that was all that we had done, and fortunately it's not :)

The big thing is that we've now finished the 3.x release cycle, and we're getting ready for 4.0. Someone foolishly let me pick version numbers, so the last few releases of 3 tended to get ever closer to π. Judging by some of the bug reports, the initial jump, from 3.14 to 3.141 appears to have confused some folk, but now that we've reached 3.141.59 I think the point has been made (and, just maybe, the joke is wearing thin)

The 4.0 release is going to be a lot of fun. My main focus has been the new Selenium Grid, which features a more modern design, for use with things such as AWS and Kubernetes (and docker compose). Of course, maintaining a simple user experience has been high on our list of goals, so people used to the existing approach of "hub" and "node" will continue to be able to run the system like that. The biggest change is that under-the-covers, the new standalone server and the new Grid are exactly the same software, which is a huge change from the current approach where we have two not-terribly-well-integrated codebases in the same binary.

Another big change is that we're exploring the move from using Buck to build much of Selenium to Bazel. This hasn't been something I've been keen on doing, since I used to be the tech lead on Buck, and I think it has a number of useful properties. Despite this, Bazel has a comparatively huge amount of community support, and that means that people wanting to hack on the project have a smaller learning curve to climb.

A Month in Selenium: March

2018-03-31T17:56:00.001+01:00

The month from February to March has been a fun one. At the beginning of March, I attended SauceCon, and gave a keynote on "Lessons From a Decade in Selenium". While the original talk had focused on milestones such as when we first started shipping code, or when we switched to Git, or when someone joined the project, as I sat in the airport waiting to fly, I realised that this was an incredibly dull talk; surely the point of keynote is to give people something to think about and consider?

This explains why I was busy rewriting the entire thing at 12km above the ground in a metal tube zipping along at a smidge over 900km/h.

In the end, I spoke about what makes working on Selenium so rewarding, focusing on the themes of "Joy", "Serendipity", "Thankfulness", "Community", "Growth", and "Striving". I've yet to see the official feedback, but I believe that the talk was well received, as people kept returning to the main themes throughout the conference.

SauceCon itself was a lot of fun. We were lucky to have some of the Selenium committers (old and new), and also supporters of the project from companies such as Sauce Labs itself, and Applitools (who are providing almost all the effort going into the new Selenium IDE) In addition, the Appium developers were well represented too. It was great to be surrounded by so many people who have spent so much time pouring energy into Open Source Software, and to catch up with some of my favourite people. There's a lovely photo of Jonathan Lipps and myself in matching bowling shirts, which I'm happy to see he tweeted.

Since we had so many people in the same place, we decided to release Selenium 3.10. The main highlights of this release were behind the scenes for most users, as we focused on the continued clean up of the internal of Grid, and the continued use of our own abstractions to handle HTTP and JSON. Having said this, there were user-supplied patches, notably moving us from Selenium's own "Duration" class to the one that ships with the JRE. Deleting code is a lot of fun.

One reason for shipping 3.10 was to lay the groundwork for a terrible dad-joke: releasing Selenium 3.11 on the 11th March (3/11 in US date format). Jim Evans and I had noticed that 3.11 was also one of the most famous of the Windows releases, so we decided to lean into the joke, and shipped "Selenium for Workgroups" as well in March. The Selenium server even reports this to users. 3.12 won't have this feature.

In a bid to help our Windows developers ship the Selenium jars, I merged a ton of upstream changes to our fork of Buck, and then spent some time attempting to resolve the issue where zips created on Windows create unreadable directories when unpacked. My fix doesn't resolve the issue, so I filed an issue with the upstream Buck project in the hope that they'd fix it for me. If I get some free time, I'll try this as well.

I've also been working on replacing GSON within our tree (though on my local machine). By the end of the month of work, I had a forked version of Selenium that didn't use GSON at all for outputting JSON. Sadly, I was a little over-ambitious when attempting to finish the work by also deserialising from JSON to proper types. It turns out that there's a bunch of code in Grid that relies on the current semantics of GSON to function. Stepping back, it looks like most of this is because GSON isn't aware of our own types, and it should be relatively easy to replace some of this. At least I know I should be working on next….

Well, that, and a new way of starting sessions that allow users to properly make use of all the features that the W3C New Session command offer.

A Month in Selenium: February

2018-03-13T18:53:00.001+00:00

January was a quiet month for Selenium hacking, but it laid the groundwork for February's efforts. These largely centred around code cleanup in the Grid server, and migrating the project to make better use of our own abstractions over JSON and HTTP.

Why do we have our own abstractions for these incredibly common tasks? There are two main reasons. The first is that we'd like freedom to be able to choose our underlying implementation for these things, without needing to extensively rework our own first-party code. The second is that third party libraries offer generalised APIs that need to meet the needs of all users, whereas we have very specific needs met by these APIs and may need to work around some of the sharp edges (for example, in the java code, lots of classes that need to be serialised to JSON have a toJson method that GSON knows absolutely nothing about). This is typically done by writing adapters.

We started using the Apache HttpClient by default as it's the HTTP library used by HtmlUnit, which we used to ship as part of the core Selenium distribution. In keeping with the other drivers out there, the HtmlUnit team now work on the HtmlUnitDriver, so it's no longer kept in the main project source repo. The interesting thing is that since we made the choice a long, long time ago to use the HttpClient, the HTTP standard has moved forward. HTTP/2 is now a thing. HTTP/2 support is coming as part of HttpClient 5. In order to take advantage of the new options and capabilities, we'd have to rework our existing abstractions anyway, so why not take a look around for something else to use? Better yet, if we use an HTTP library that isn't a dependency of one of our dependencies, we're less likely to end up with clashing versions.

One of the reasons that Java has a terrible reputation for start up speed is because people have massively bloated classpaths. As it stands, the Selenium standalone server weighs in at a portly 24MB. The Apache HttpClient weighs in at about 1.4MB of this total, before we do the update. After the update, the beta of 5 is a touch under 1MB. In comparison, OkHttp (which already supports HTTP/2) with its dependencies is approximately 500kb. In other words, OkHttp is smaller, already supports HTTP/2, and isn't a dependency of our dependency.

So, we switched the project to use OkHttp instead of the Apache HttpClient.

Within the client code, making this change was relatively trivial. The problem is that the server-side code had leaked Apache's APIs into the code. Before we can replace the Apache HttpClient, we need to first of all replace all those usages. That's made somewhat harder by the fact that it's exposed as part of the public APIs of various classes that other libraries extend.

Fortunately, we have a process for deprecating and deleting APIs. First of all, we mark the methods to be deleted as "deprecated" for at least one release. And then we delete them. Of course, if you're going to deprecate a method, you really should provide an alternative and migrate as many uses as can be found to use the replacements. A bulk of my work this month was spent making these changes.

Of course, we needed to do a release, so we lined up 3.9 to start the process. In order to do the release, we needed to actually build it. There had been reports of some issues building the release artefacts on Windows. To resolve this, I had to update our fork of Buck to pull in the latest changes from Facebook, and then to try and work around those issues. Naturally, the Buck developers aren't aware of our fork, so merging in their changes was a somewhat time-consuming affair. Once that I was done, I wrote what I thought was a fix and pushed a new version of our fork of Buck.

I didn't work. Oh well.

The final step in doing a release is trying to get our CI builds green. These take an incredible amount of time to run, and I wondered whether we could speed them up. Travis has support for caching, so it would be nice to use that. My attempts to use caching were foiled because the cache takes into account environment variables, which we use to separate our builds. There's a bug open in the Travis tracker to allow us to name builds, which would have allowed us to work around this, but it's still open. Ho hum. As a work around, I wrote a simple wrapper around Buck that we can call within our CI servers. This makes better use of Buck's ability to parallelise work automatically, and this has helped bring our build times down. Hurrah!

Two Months in Selenium - November and December

2018-01-21T18:19:00.000+00:00

You may have noticed a distinct lack of an update last month. It's because I was focused on client work, Christmas, and the New Year, and took some time away from the keyboard. But I'm back now!

The W3C WebDriver spec is now at the stage where we need to demonstrate multiple compatible implementations. Realistically, this means that we need two passes for each test in our test suite. The browser vendors are working hard to get things working, and progress is being made. There's not been a huge amount for me to do here, so this is more of a waiting game than anything else from my perspective. Having said that, I'm on the hook for some sections in Level 2, so I should really sit down and write those (and the matching tests)

The main thing I've been focused on has been the Selenium Grid. There are a couple of things that we really need to solve with Grid. The first is that the code is complex and hard to deal with. When we originally released it, it took a huge amount of work to review the code for thread-safety and to debug many of the issues. That code has not become easier to reason about, which makes it harder to foster Open Source contributions.

Of course, that'd be fine if we didn't care about making any changes, but we do. When Grid came into being, it was normal to have a physical server for each node in the grid. If you were lucky, you might have a massive server with VMWare running on it, which you'd cycle virtual machines on to keep the Grid healthy. The world has changed. Docker is now A Thing, and there are multiple "Selenium as a Service" (SaaS) cloud providers.

There are some projects out there that implement some of the functionality of Grid. For example, selenoid makes use of Docker, but it doesn't use the W3C dialect of the webdriver protocol, which means it doesn't do protocol conversion, and it doesn't natively support cloud providers. Zalenium builds on top of Grid, and provides support for Docker and SaaS, but they've had to work within the existing architecture, and there are obvious rough edges.

Finally, we've wanted the selenium server to be a "Grid of one". If you go into the code of the server, you'll see that there are two fairly separate trees that live side-by-side. When you start the server, it picks one and then goes with it. It'd made things like supporting the W3C protocol harder than it should be, and it's not an elegant way to run things.

As a solution to this, it seems obvious that there should only be one code path. The problem is that the standalone server is too simplistic about how it assigns work, and (as discussed) the Grid code is too complex. Over the past few releases, I've been landing code to help resolve this:

The pass through mode: this makes the server proxy requests without doing an parsing or changes unless necessary.
The ActiveSession abstraction has been added. This makes adding new types of provider (SaaS, Docker) far easier to write.
A "new session pipeline" has been added, and this is being used to handle things like multiple versions of the webdriver protocol.

The most recent thing I've been working on has been a new scheduler. This will be rolled into the new session pipeline, and is composed of a number of pieces:

The "Scheduler": this is responsible for queuing new session requests, handling retries, and waiting until nodes become available. This class is thread-safe and designed to be the main entry point.
The "Distributor", which is solely responsible for ranking and ordering available hosts, and the sessions that can run on them.
The "Host" abstraction, which represents a physical place where sessions can be run. Each of these has a number of....
The "Session Factory", which is responsible for creating a new session.

The scheduler will sit within the new session pipeline. For the standalone server, we just add a single host. For the grid, we can add an arbitrary number of hosts (and therefore session factories)

As well as the new scheduler, we're preparing the 3.9 release. It should be out next week, if everything goes according to plan. :)

The Selenium Server & Creating New Sessions

2018-01-15T11:18:00.001+00:00

I've had the pleasure of being a co-editor of the W3C's WebDriver spec, as well as the original author and one of the current maintainers of Selenium's Java bindings, and one of the main authors of the current Selenium Server, particularly the pieces to do with implementing the W3C spec. So, as one of the few people on the planet who knows how all the pieces fit together, and why they fit together that way, I thought it might be helpful to explain how and why the Selenium Server handles a request to create a new session.

For this discussion, I'll use the terminology from the spec. A "remote end" refers to the Selenium Server, and a "local end" are the language bindings you're probably familiar with --- there's some in all the major programming languages, and about a million of them in the JS space too.

First of all, it's advisable for the local end to send a single request for a new session that includes the expected payloads for both the W3C and the JSON Wire Protocol dialects at the same time.

Consider the case where you just send the w3c payload ({"capabilities": {"browserName": "chrome"}}). In this case, a w3c server would correctly attempt to start a chrome session. However, a server that only obeys the JSON Wire Protocol will see an empty payload, in which case it's free to do whatever it wants.

Sending just the JSON Wire Protcol payload ({"desiredCapabilities": {"browserName": "firefox"}}) will create a firefox session in a server that understands the JSON Wire Protocol, but will cause a "no session created" error in a W3C compliant server (since that expects at least {"capabilities": {}} to be set).

So, we have the expected, legal behaviour of the remote end layed out.

For historical reasons, most bindings only accept a "desired capabilities" hash as the argument when creating a new driver instance. Converting the old-style payloads to legal W3C ones is a non-trivial exercise (for example, {"firefox_profile": "sdfgh"} is now {"moz:firefoxOptions": {"profile": "sdfgh"}}, but what happens if both are set? Also "platform" has become "platformName", but do the values match? Probably only at the OS family level, according to the note in the spec)

Most local end bindings get this mapping wrong, but the user doesn't care why their session isn't as they'd expected it to be, they just know it's not right. What to do? What, my friends, do we do?

The answer is to be generous about what we receive from the user and attempt to do what they want. Knowing that most local ends have at least a few problems converting the old format to the new format, the selenium server creates an ordered list of capabilities, putting the OSS ones at the front of the list to ensure maximum compatability.

So, now you know.

Why Use a Monorepo?

2018-01-15T10:56:00.000+00:00

A monorepo helps reduce the cost of software development. It does this in three different ways: by being simpler to use, by providing better discoverability, and by allowing atomicity of updates. Taking each of these in turn….

Simplicity

In the ideal world, all you’d need to do is clone your software repository, do a build, make an edit, put up a pull request, and then repeat the last three steps endlessly. Your CI system would watch the repository (possibly a directory or two within it), and kick off builds as necessary. Anything more is adding overhead and cost to the process.

That overheard starts being introduced when multiple separate codebases need to be coordinated in some way. Perhaps there’s a protocol definition file that needs to be shared by more than one project. Perhaps there’s utility code that’s shared between more than one project.

In many organisations developers may not have the ability to set up a repo on demand, so there’s a time and political cost in creating one. Then there’s the ongoing cost of maintaining them, backing them up, and so on. Especially if data is being duplicated between repositories, the aggregate total space used by these repos will also be larger.

Multiple repositories are not necessarily “simple”.

One straw man solution to the problems of coordination is to copy all required dependencies into your own repo, but then we’ve a huge pile of duplicated work that opens up the possibility of parallel but incompatible changes being made at the same time.

A better solution is to build binary artefacts that are stored in some central location, and grab those when required. Bad experiences with storing binaries in the VCS make many people shy of just checking in the artefacts, so this storage solution seems attractive. But the alternatives introduce complexity. Where previously we only had to worry about maintaining the uptime of the source control system, there’s now the additional cost of maintaining this binary datastore, and ensuring its uptime too. Worse, in order to preserve historical builds, the binary datastore needs to be immutable after a write. In my experience, rather than being a directory served using nginx or similar, people turn to commercial solutions even when free alternatives are available. The cost of building and running this infrastructure raises the total cost of development.

Another area where monorepos bring simplicity is when a package or library needs to be extracted from existing code. This process is simple in a monorepo: just create the new directories, possibly after asking permission from someone, and check in. Every other user receives that change with their next update, without needing to re-run tooling to ensure that their patchwork clients are up to date. Outside of a monorepo, the process can be more painful, especially if a new repository is needed for the freshly extracted code.

Identifying every place that is impacted by such a code change is also easy in a monorepo, even if you’re not using a graph-based build tool such as bazel or buck, but doing something like “maven in a monorepo”. The graph-based build tools typically have a mechanism to query the build graph, but if the tree is one place and you don’t have code-insight tools, then even “grep” can get you quite far.

There are arguments about monorepos stressing source control software, requiring different tool chains, or not being compatible with CI systems. I addressed those concerns in an earlier post, but the TL;DR is “modern DVCS systems can cope with the large repos, you don’t need to change how you build code, and your CI pipelines can be left essentially ‘as is’.”

Discoverability

One of the ways that monorepos drive down the cost of software development is by reducing duplication of effort.

It’s a truism that the best code is the code that is never written. Every line of code that’s written imposes an ongoing cost of maintenance that needs to be paid until the code is retired from production (at the very earliest!). Fortunately, a good software engineer is a lazy software engineer --- if they’re aware of a library or utility that can be used, they’ll use that.

In order to function properly, a monorepo needs to be structured to ease discoverability of reusable components, as covered in the post about organising a monorepo. One of the key supporting mechanisms is to separate the tree into functional areas. However, just because a monorepo is structured to aid discoverability, it doesn’t do anything to prevent “spaghetti dependencies” from appearing. What it does do is help surface these dependencies, which would exist in any case, without fancy additional tooling.

Naturally, a monorepo isn’t the only way of solving the problem of discovering code. Good code insight tooling can fill the same role (go Kythe!), as do central directories where people can find the code repositories that house useful shared code. Even hearsay and guesswork can suffice; after all, the Java world has coped with Maven Central for an incredibly long time.

Discovering code has other benefits. As a concrete example, it becomes possible to accurately scope the size of refactorings to APIs within an organisation: simply traverse the graph of everything impacted by a change, and make the change. What used to be a finger-in-the-air guess, or would require coordination across multiple repositories, becomes a far simpler exercise to measure. To actually perform the change? Well, there’s still politics to deal with. Nothing stops that.

Being able to identify all the locations that are impacted by any change makes CI infrastructure easier to write and maintain. After all, we use CI to answer the questions “is our code safe to deploy? And if not, why is it not safe?” In a monorepo, the graph of dependencies is easier to discover, and that graph can (and should!) be used to drive minimally-sized but provably correct builds, running all necessary build and test and not a single thing more. Needless to say, this means that less work is done by the CI servers, meaninging tighter feedback loops, and faster progress. Do you need a monorepo to build this graph? Of course not. Is building that infrastructure to replicate this something you’ve time to do? Probably not.

There is also nothing about using a monorepo that precludes putting useful metadata into the tree at appropriate points. Individual parts of the tree can include license information (particularly when importing third party dependencies), or READMEs that provide human-readable information about the purpose of a directory or package, and where to go for help. However, the need for some of this metadata (“how do I get the dependencies?”, “what’s the purpose of this package?”) can be significantly reduced by structuring the monorepo in a meaningful way.

Atomicity

Occasionally there are components that need to be shared between different parts of the system. Examples include IDL files, protobuf definitions, and other items that can be used to generate code, or must exist as a shared component between client and server.

Now, there’s reams to be written about how to actually manage updating message definitions in a world where there might be more than one version of that protocol in the wild, and having a monorepo doesn’t prevent you from needing to follow those rules and suggestions. What a monorepo allows is a definitive answer to the question of where these shared items should be. Traditionally, the answer has been:

In one of the clients
In the server
In a central location, referenced by everything
Gadzooks, we’ll copy the damn thing everywhere

Needless to say, the last approach is remarkably painful, since all changes to the definitions need to tracked across all repositories. In the first two cases, you may end up with unwanted dependencies on either client or server-side code. So the sensible thing to do is to store the shared item in a different repository. This will lead you to the horror of juggling multiple repositories, or, if you’re lucky, taking a dependency on a pre-built binary that someone else is responsible for building.

Interesting things happen when the shared item needs to be updated. Who is responsible for propagating the changes? Without a requirement to update, teams seldom update dependencies, so there’s out-of-band communication that needs to happen to enforce updates.

Using a monorepo resolves the problem. There’s one place to store the definition, everyone can depend on it as necessary, and updates happen atomically across the entire codebase (though it may take a long time for those changes to be reflected in production) The same logic applies to making small refactorings — the problem is easy to scope, and completion can be done by an individual working alone.

Summary

Monorepos can reduce the cost of software development. They’re not a silver bullet, and they require an organisation to practice at least a minimal level of collective code ownership. The approach worked well at Google and Facebook because those companies fostered an attitude that the codebase was a shared resource, that anyone could contribute to and improve.

For a company which prevents people from viewing everything and having a global view of the source tree, for whatever reason (commercial? Social? Internally competing teams?) a monorepo is a non-starter. That’s a pity, because there are considerable cost savings to be made as more and more share a monorepo. It’s also possible to implement a monorepo where almost everything is public, with parts selected pieces being made available as pre-compiled binaries or otherwise encrypted for most individuals.

Monorepos help reduce the cost of software over the lifetime of the code by simplifying the path to efficient CI, lowering the overhead of ensuring changes are propagated to dependent projects, and by reducing the effort required to extract new packages and components. As Will Robertson pointed out, they can also help reduce the cost of developing development support tooling by providing a single-point “API” to the VCS tool and the source code itself.

Complementary practices

Monorepos solve a whole host of problems, but, just as with any technical solution, there are tradeoffs to be made. Simply cargo-culting what Google, Facebook, or other public early adopters of the pattern have done won’t necessarily lead you to success. On the flip side of the coin, sticking with “business as usual” within a monorepo may not work either.

Although complex branching strategies might work in a monorepo, the sheer number of moving pieces means that the opportunity for merge conflicts increases dramatically. One of the practices that should be most strongly considered is adopting Trunk Based Development. This also suggests that developers work on short-lived feature branches, hiding work in progress behind feature flags.

Software development is a social activity. Merging many small commits without describing the logical change going in makes the shared resource of the repo’s logs harder to understand. This leads to a model that is less common than it used to be --- squashing the individual steps that lead to a logical change to a single commit, which describes that logical change. This makes using the commit logs a useful resource too. Code review tools such as Phabricator help make this process simpler.

Most importantly: stop and think. It is unlikely your company is Google, Facebook, Twitter, Uber, or one of the other high-profile large companies that have already adopted monorepos (but if you’re reading from one of those places, “Hi!”). A monorepo makes a lot of sense, but simply aping the big beasts and hoping for the best won’t lead to happiness. Consider the advantages to your organisation for each step of the path towards a monorepo, and take those steps with your eyes open.

Thanks

Thank you to Nathan Fisher, Josh Graham, Paul Hammant, Felipe Lima, Dan North, Will Robertson, and Chris Stevenson for the suggestions and feedback while I was writing this post.

Organising a Monorepo

2017-11-22T15:39:00.000+00:00

How should a monorepo be organised? It only takes a moment to come up with many competing models, but the main ones to consider are “by language”, “by project”, “by functional area”, and “nix style”. Of course, it’s entirely possible to blend these approaches together. As an example, my preference is “primarily language-based, secondarily by functional area”, but perhaps I should explain the options.

Language-based monorepos

These repos contain a top-level directory per language. For languages that are typically organised into parallel test and source trees (I’m looking at you, Java) there might be two top-level directories.

Within the language specific tree, code is structured in a way that is unsurprising to “native speakers” of that language. For Java, that means a package structure based on fully-qualified domain names. For many other languages, it makes sense to have a directory per project or library.

Third party dependencies can either be stored within the language-specific directories, or in a separate top-level directory, segmented in the same language specific way.

This approach works well when there aren’t too many languages in play. Organisation standards, such as those present in Google, may limit the number of languages. Once the number of languages becomes too many, it becomes hard to determine where to start looking for the code you may depend on.

Project-based monorepos

One drawback with a language-based monorepo is that it’s increasingly common to use more than one language per project. Rather than spreading code across multiple locations, it’s nice to co-locate everything needed for a particular project in the same directory, with common code being stored “elsewhere”. In this model, therefore, there are multiple top-level directories representing each project.

The advantage with this approach is that creating a sparse checkout is incredibly simple: just clone the top-level directory that contains the project, et voila! Job done! It also makes removing dead code simple --- just delete the project directory once it’s no longer needed, and everything is gone. This same advantage means that it’s easy to export a cell as an Open Source project.

The disadvantage with project-based monorepos is that the top level can quickly become bloated as more and more projects are added. Worse, there's the question of what to do when projects are mostly retired, or have been refactored to mostly slivers of their former glory.

Functional area-based monorepos

A key advantage of monorepos is “discoverability”. It’s possible to organise a monorepo to enhance this, by grouping code into functional areas. For example, there might be a directory for “crypto” related code, another for “testing”, another for “networking” and so on. Now, when someone is looking for something they just need to consider the role it fulfills, and look at the tree to identify the target to depend on.

One way to make this approach fail miserably is to make extensive use of code names. “Loki” may seem like a cool project name (it’s not), but I’ll be damned if I can tell what it actually does without asking someone. Being developers, we need snazzy code names at all times, and by all means organise teams around those, but the output of those projects should be named in as a vanilla a way as possible: the output of “loki” may be a “man in the middle ssl proxy”, so stick that in “networking/ssl/proxy”. Your source tree should be painted beige --- the least exciting colour in the world.

Another problem with the functional area-based monorepos is that considerable thought has to be put into their initial structure. Moving code around is possible (and possible atomically), but as the repo grows larger the structure tends to ossify, and considerable social pressure needs to be overcome to make those changes.

Nix-style monorepos

Nix is damn cool, and offers many capabilities that are desirable for a monorepo being run in a low-discipline (or high-individuality) engineering environment, incapable of managing to keep to only using (close to a) single version of each dependency. Specifically, a nix-based monorepo actively supports multiple versions of dependencies, with projects depending on specific versions, and making this clear in their build files.

This differs from a regular monorepo with a few alternate versions of dependencies that are particularly taxing to get onto a single version (*cough* ICU *cough*) because multiple versions of things are actively encouraged, and dependencies need to be more actively managed.

There are serious maintainability concerns when using the nix-style monorepo, especially for components that need to be shared between multiple projects. Clean up of unused cells, mechanisms for migrating projects as dependencies update, and stable and fast constraint solving all need to be in place. Without those, a nix-style monorepo will rapidly become an ungovernable mess.

The maintainability issue is enough to make this a particularly poor choice. Consider this the “anti-pattern” of monorepo organisation.

Blended monorepos

It’s unlikely that any monorepo would be purely organised along a single one of these lines; a hybrid approach is typically simpler to work with. These “blended monorepos” attempt to address the weaknesses of each approach with the strengths of another.

As an example, project-based monorepos rapidly have a cluttered top-level directory. However, by splitting by functional area, or language and then functional area, the top-level becomes less cluttered and simultaneously easier to navigate.

For projects or dependencies that are primarily in one language, but with support libraries for other languages, take a case-by-case approach. For something like MySQL, it may make sense to just shovel everything into “c/database/mysql”, since the java library (for example) isn’t particularly large. For other tools, it may make more sense to separate the trees and stitch everything together using the build tool.

Third party dependencies

There is an interesting discussion to be had about where and how to store third party code. Do you take binary dependencies, or pull in the source? Do you store the third party code in a separate third party directory, or alongside first party code? Do you store the dependencies in your repository at all, or push them to something like a Maven artifact repository.

The temptation when checking in the source is that it becomes very easy to accidentally start maintaining a fork of whichever dependency it is. After all, you find a bug, and it’s sooo easy to fix it in place and then forget (or not be allowed) to upstream the fix. The advantage of checking in the source is that you can build from source, allowing you to optimise it as along with the rest of the build. Depending on your build tool, it may be possible to only rely on those parts of the library that are actually necessary for your project.

Checking in the binary artifacts has the disadvantage that source control tools are seldom optimised for storing binaries, so any changes will cause the overall size of the repository to grow (though not a snapshot at a single point in time) The advantage is that build times can be significantly shorter (as all that needs to be done is link the dependency in)

Binary dependencies pulled from third parties can be significantly easier to update. Tools such as maven, nuget, and cocoapods can describe a graph of dependencies, and these graphs can be reified by committing them to your monorepo (giving you stable, repeatable historical builds) or left where they lie and pulled in at build time. As one of the reviewers of this post pointed out, this requires the community the binaries are being pulled from to be well managed: releases must not be overwritten (which can be verified by simple hash checks), and snapshots should be avoided.

Putting labels on these, there are in-tree dependencies and externally managed dependencies, and both come in source and binary flavours.

Thanks

My thanks to Nathan Fisher, Josh Graham, Will Robertson, and Chris Stevenson for their feedback while writing this post. Some of the side conversations are worth a post all of their own!

Tooling for Monorepos

2017-11-20T14:43:00.000+00:00

One argument against monorepos is that you need special tooling to make them work. This argument commonly gets presented in a variety of ways, but the most frequent boil down to:

Code size: a single repo would be too big for our source control system!
Requirement for specialised tooling: we're happy with what we have!
Reduces the ability of teams to move fast and independently
Politics and fiefdoms

Let’s take each of these in turn.

Code size

Most teams these days are using some form of DVCS, with git being the most popular. Git was designed for use with the Linux kernel, so initially scaled nicely for that use-case, but started to get painful after that. That means that we start with some pretty generous limits: a fresh clone of linux repo at depth 1 takes just shy of 1GB of code spread between in over 60K files (here’s how they make it work!). Even without modifying stock git, Facebook was able to get their git repo up to 54GB (admittedly, with only 8GB of code). MS have scaled Git to the entire Windows codebase: that’s 300GB spread between 3.5M files and hundreds of branches. Their git extensions are now coming to GitHub and non-Windows platforms.

Which is good news! Your source control system of choice can cope with the amount of code a monorepo contains. Hurrah!

But how long does that take to check out? I’ll be honest, checking out a repo that’s 1GB large can take a while. If that is, you check out the whole 1GB. Git, Mercurial, Perforce, and Subversion support “sparse” working copies, where you only clone those directories you need. The sparse checkout declarations can either be declared in files stored in source control, or they can computed. They likely follow cell boundaries within the monorepo. It should be clear that in the ideal case, the end result is a working copy exactly the same size as a hand-crafted repository containing just what’s needed, and nothing more. As a developer moves from project to project, or area to area, they can expand or contract their current clone to exactly match their needs.

So your checkouts don’t necessarily get larger. They may even get smaller.

But, what if you do have everything checked out? Your source control tool needs to know which files have changed. As the size of the repository grows, the slower these operations become, impacting developer performance. Except both Git and Mercurial have support for filesystem watching daemons (notably “watchman”) These allow file checking operations to scale linearly with the number of files changed, rather than with the number of files in the repository (I’d hope that even those using a “normal” large checkout would consider using this)

So everything is fine with the raw tooling. But what about your IDE?

I mean, yeah, if you’ve checked out the entire source tree, surely your IDE will grind to a halt? First of all, don’t do that --- use a sparse clone --- but if you insist on doing it, update your tooling. Facebook spent a chunk of resources to help make IntelliJ more efficient when dealing with large projects, and upstreamed those changes to Jetbrains, who accepted the patches. It was possible to pull in the source code for every Facebook Android app at the same time in IntelliJ. You may have a lot of code, but it’s unlikely to be that much. Other editors can also happily work with large source trees.

So, code size isn’t the problem you might imagine it is.

Requirement for specialised tooling

Quite often when people talk about monorepos, they also talk about the exotic tooling they use, from custom build systems, tricked-out source control servers, and custom CI infrastructure. Perhaps a giant company has the time and resources to build that, but you’re too busy doing your own work.

Except a monorepo doesn’t require you to do any of those things. Want to use a recursive build tool you’re already familiar with? Go ahead. Paul Hammant has done some interesting work demonstrating how it’s possible to use maven (and, by extension, gradle and make) in a monorepo.

Switching to a build tool such as buck or bazel does make using a monorepo simpler, because these tools provide mechanisms to query the build graph, and can be simply configured to mark various parts of the tree as being visible or not to particular rules, but using one of these isn’t required. One nice thing? You don’t need to write buck or bazel yourself --- they’re both already out there and available for you to use.

Similarly, if you’re comfy with jenkins or travis, continue using them. Admittedly, you’ll need to configure the CI builds to watch not just a repo, but a subdirectory within a repo, but that’s not too hard to do. If you’re using a graph-based build tool, then you can even use jenkins or buildbot to identify the minimal set of items to rebuild and test, but, again, there’s no need to do that. Just keep on trucking the way you do now.

Reduces the ability of teams to move fast and independently

Having a repository per-project or per-team allows them to operate entirely independently of one another. Except that’s not true unless you’re writing every single line of code yourself. It’s likely you have at least a few first and third party dependencies. At some point, those dependencies really should be updated. Having your own repo means that you can pick the timing, but it also means you have to do the work.

Monorepos naturally lead people to minimising the number of versions of third party dependencies towards one, if only to avoid nasty diamond dependency issues, but there’s no technical reason why there can’t be more than one version of a library in the tree. Of course, only a narcissist would check in a library without making an effort to remove the old versions. There are a pile of ways to do this, but my preferred way is to say that the person wanting the update manages the update, and asks for help from teams that are impacted by the change. I’ll cover the process in a later post. No matter how it’s done, the effect of having a single atomic change amortises the cost of the change over all the repos, reducing the cost of software development across the entire organisation by front loading the cost of making the change.

But perhaps it’s not the dependencies you enjoy freedom on. Perhaps it’s the choice of language and tooling? There’s no reason a properly organised monorepo can’t support multiple languages (pioneers such as Google and Facebook have mixed language repos) Reducing the number of choices may be an organisation-level goal, in order to allow individuals to cycle quickly and easily between teams (which is why we have code style guidelines, right?), but there’s nothing about using a monorepo that prevents you from using many different tool chains.

As a concrete example of this, consider Mozilla. They’re a remote-first, distributed team of iconoclasts and lovely folks (the two aren’t mutually exclusive :) ) Mozilla-central houses a huge amount of code, from the browser, through extensions, to testing tools, and a subset of the web-platform-tests. A host of different languages are used within that tree, including Python, C/C++, Rust, Javascript, Java, and Go, and I’m sure there are others too. Each team has picked what’s most appropriate and run with those.

Politics and fiefdoms

There’s no getting away from politics and fiefdoms. Sorry folks. Uber have stated that one of the reasons they prefer many separate repositories is to help reduce the amount of politics. However, hiding from things is seldom the best way to deal with them, and the technical benefits of using a monorepo can be compelling, as Uber have found.

If an organisation enthusiastically embraces the concept of collective code ownership, it’s possible to avoid anything other than purely social constructs to prevent ego being bruised and fiefdoms being encroached on. The only gateways to contribution become those technical gateways placed to ensure code quality, such as code review.

Sadly, not many companies embrace collective code ownership to that extent. The next logical step is apply something like GitHub’s “code owners”, where owners are notified of changes before they are committed (ideally. Using post-commit hooks for after the fact notification isn’t as efficient) A step further along, and OWNERS files (as seen in Chromium’s source tree) list individuals and team aliases that are required to give permission to land code.

If there is really strong ownership of code, then your source control system may be able to help. For example, perforce allows protection levels to be set for individual directories within a tree, and pre-commit hooks can be used for a similar purpose with other source control systems.

Getting the most of a monorepo

Having said that you don't need to change much to start using a monorepo, there are patterns that allow one to be used efficiently. These suggestions can also be applied to any large code repositories: after all, as Chris Stevenson said “any sufficiently complicated developer workspace contains an ad-hoc, informally specified, bug-ridden implementation of half a monorepo”

Although it’s entirely possible to use recursive build tools with a monorepo (early versions of Google’s still used make), moving to a graph-based build tool is one of the best ways to take advantage of a monorepo.

The first reason is simply logistical. The two major graph-based build tools (Buck and Bazel) both support the concept of “visibility”. This makes it possible to segment the tree, marking public-facing APIs as such, whilst allowing teams to limit who can see the implementations. Who can depend on a particular target is defined by the target itself, not by its consumers, preventing uncontrolled growth in access to internal details. An OOP developer is already familiar with the concept of visibility, and the same ideas apply, scaled out to the entire tree of code.

The second reason is practical. The graph-based build tools frequently have a query language that can be used to quickly identify targets given certain criteria. One of those criteria might be “given this file has changed, identify the targets that need to be rebuilt”. This simplifies the process of building a sensible, scalable CI system from building blocks such as buildbot or GoCD.

Another pattern that’s important for any repository that has many developers hacking on it simultaneously is having a mechanism to serialise commits to the tree. Facebook have spoken about this publicly, and do so with their own tooling, but something like gerrit, or even a continuous build could handle this. Within a monorepo, this tooling doesn’t need to be in place from the very beginning, and may never be needed, but be aware that it eases the problem of commits not being able to land in areas of high churn.

A final piece in the tooling puzzle is to have a continuous build tool that’s capable of watching individual directories rather than the entire repository. Alternatively, using a graph-based build tool allows a continuous build that watches the entire repository to at least target the minimal set of targets that need rebuilding. Of course, it’s entirely possible to place the continuous build before the tooling that serialises the commits, so you always have a green HEAD of master….

Thanks

My thanks to Nathan Fisher, Josh Graham, Paul Hammant, Will Robertson, and Chris Stevenson for their feedback and comments while writing and editing this post. Without their help, this would have rambled across many thousands of words.

Some Useful Monorepo Definitions

2017-11-19T23:14:00.001+00:00

The concept of a monorepo seems so self-evident that there is little need to define it. Just co-locate all your code in one place, and you’re done, right?

The problem is that this doesn’t capture lots of the nuance of the term. After all, if all you have is a single project, then, by this definition, you have a monorepo. While technically correct (the best kind of correct!) this doesn’t feel right. There has to be more to it than that.

Monorepo

Summary: A monorepo represents the body of code and supporting digital assets owned by an organisation. Within that body of code, it’s possible to draw logical boundaries around certain areas, either shared libraries, individual projects, or other groupings.

Discussion:

Previously, I’ve written that a monorepo is “a unified source code repository used by an organisation to host as much of its code as possible.” That does the job, but I think it falls short of succinctly describing the goals of a monorepo in favour of an implementation of the pattern. Oh well, exploration of an idea is an iterative process, with each iteration being able to use the insights from previous iterations. Let’s iterate again!

Cell

Summary: A cell is an atomic unit representing a single logical piece within the monorepo.

Discussion:

When we were working on Buck, we struggled for a long time to come up with the best name for the logical areas with the monorepo. Initially, they were formed from the individual repositories we were coalescing into the monorepo. However, “repository” was an overloaded term, and so one we wanted to avoid. Similarly, “module” already has established meaning in some of languages we wanted to support.

In the end, we settled on using a biological metaphor. Because a monorepo represents a body of code, and these logical groupings represent the atomic units that the monorepo is constructed from, we called them cells. In many organisations, pre-monorepo, a cell represents a single repository.

Because of this mapping to a conceptual repository, a cell is a great candidate for Open Sourcing. Should this happen, it’s entirely possible that there needs to be some tooling to map file structure from the shape used within the monorepo to the shape expected by the OSS library. Ideally, that tooling would allow code to be both imported and exported to and from the monorepo, rather than only allowing a push in a single direction.

Projected Monorepo

Summary: A set of repositories presented as if they were a monorepo, typically via additional tooling.

Discussion:

Monorepos may be classified by the way that the code within is organised, but there is another approach: the projected monorepo. This isn’t a monorepo in the (umm…) traditional sense, where all the code is in the same code repository, but something that acts as if it were a monorepo through external tooling. An example would be the Android Open Source project, which uses “repo” to stitch together multiple separate repositories into something that acts as a single cohesive whole. To a lesser extent, things like git submodules also fulfill the same role of creating projected monorepos.

In a projected monorepo, it is clear where the cells lie --- they’re the individual repositories that are being stitched together to form the new whole.

Target

Summary: The individual units addressable by the build tool, which are used to declare dependencies.

Discussion:

Within a monorepo there are targets. These are units that are addressable by the build tool, and are also typically used to declare dependencies. They typically have concrete outputs, such as libraries or binaries. Targets are human-readable, and are most commonly given as a path within the repository.

A cell is typically composed of many targets. As an example, perhaps a cell consists of a single library. There might be targets within that cell would allow the library to be built, the tests for that library to built, and (perhaps) another to allow those tests to be run.

Graph-based build tool

Summary: A build tool designed for use within a monorepo where build files are located throughout the source tree and used in a non-recursive manner.

Discussion:

It's common to use a graph-based build tool with monorepos. These are tools that are natively designed for a monorepo, and operate on the directed acyclic graph of dependencies between targets. They typically provide the ability to build polyglot projects, and the ability to query the build graph. The two major examples are Google’s bazel and Facebook’s buck. Both of these tools can trace their user-facing design to Google’s “Blaze” build tool.

Admittedly, behind the scenes almost every build tool makes use of basic graph theory in order to work: after all, most tools to a topological sort of targets in order to work their magic, and they frequently have commands that allow that graph to be queried. The major difference between these other tools and what I’m terming a “graph-based build tool” is the use of build files throughout the tree that are used in a non-recursive way. This encourages the creation of relatively small compilation units.

Hopefully these terms, and the various ways of organising a monorepo, give us a common language to discuss monorepos in a meaningful way.

Thanks

My thanks to Kent Beck, Nathan Fisher, Josh Graham, Paul Hammant, Will Robertson, and Chris Stevenson for their comments and feedback while writing this post. The conversations have definitely helped clarify and improve this post.

A Month in Selenium - October

2017-11-19T20:58:00.002+00:00

Another month, another update, you lucky people. The highlights:

W3C TPAC.
I attended the W3C's TPAC meeting in California. This is the main get-together for many of the "working groups" that are working on standards as part of the W3C. It's also where the Browser Tools and Testing Working Group met to discuss progress on the WebDriver spec.

Good news! Once we clean up the implementation report, we're ready to move to "Proposed Recommendation", which is the last step before becoming a standard (or "Recommendation" in W3C parlance).

More good news! The "Level 2" version of WebDriver will have a new logging infrastructure added. This will make it easier for you (yes, you!) to figure out where failures have occurred. Better insight should lead to more stable tests.

Even more good news! Some of the folks from Sauce Labs attended the face-to-face meeting. They help bring an additional perspective to the design and use cases of the protocol. Until now, the group has been mostly composed of browser implementors and people from the Selenium project. The more people involved with the spec, the better it's going to be.

The minutes for this face-to-face session are available, as are the minutes for the other face-to-face sessions.

Hacking on Selenium
Last month, we were closing in on the Selenium 3.7 release. This month we shipped 3.7.0 and then, because of a small oversight where we missed a jar file in the downloadable artefacts, 3.7.1. There are some nice changes in there. As mentioned last month, one of the areas of focus has been improving how we handle the New Session command when dealing with a local end that might speak both the W3C and JSON Wire Protocol dialects of the webdriver protocol. One of the things that the spec says we're meant to do is pass though additional top-level fields in the new session payload. 3.7.1 now does this (hurrah!)

One of the nice things from the work in 3.7 is that we've laid the groundwork for a clean up of the Selenium Grid code. As part of that, we restored a behaviour where a Grid Node, configured with a path for the Firefox or Chrome binary would have this path injected into any capabilities when starting a session. Making the nodes even more configurable is something that's on the road map for a later release.

More next month!

A Month in Selenium - September

2017-10-18T10:46:00.001+01:00

I realise that this blog has been pretty quiet. Part of the reason for that is that I'm terrible at sitting down and just writing. What I really need is an incentive. That incentive arrived this month in the form of the Selenium Fellowship, which takes the form of a stipend to fund work hacking on Selenium. Part of the agreement is a monthly blog post. So, you all have the Software Freedom Conservancy to thank :)

So, what contributions have I been making to the Selenium project this month?

There are two major highlights. The first of these is Selenium Conf, which was in Berlin. I gave the State of the Union keynote (so called because the first one was an update of how the merger of the Selenium and WebDriver projects was going) Over the past few Selenium Conferences, the theme has slowly been building that Open Source Software depends on people to move it forward. This time, the message was far starker, as I counted the number of people who contribute to key parts of the project --- for some pieces, we depend on one person alone. I also covered the various moving pieces in the project, using Kent Beck's "3X" model as a framework to hold the talk together.

As well as being part of the show at SeConf, I also had the pleasure of helping out Jim Evans in the "Fix a Bug, Become a Committer" workshop. He did a great job explaining how the pieces fit together, and by the end of the workshop, we had everyone building Selenium and running tests in their IDEs of choice (provided that choice wasn't "Eclipse"), which is a testament to the hard work he'd put into preparing the session. It did highlight that the "getting started" docs probably need a bit of a polish to become usable. I was also invited to do a Q&A with the folks in the "Selenium Grid" workshop, where I broke from theme to talk about the role of QA in a team. Thanks for being patient, everyone!

In terms of code, as I write this, I've landed 57 commits since September 17th. Part of this was to help shape the 3.6 release. For Java, the theme of this release was the slow deprecation of the amorphous blob of data that is "DesiredCapabilities" to the more strongly-typed "*Options" classes (eg. FirefoxOptions, ChromeOptions, etc). The idea behind the original WebDriver APIs was to lead people in the right direction: if they could hit the "autocomplete" keyboard combination in their IDE of choice, then they'd be able to figure out what to do next. The strong typing is a continuation of this concept, and is something that all the main contributors are fans of.

One implementation detail we made in the Java tree is that each of the Options classes are also Capabilities. I made this choice for two reasons. The first is philosophical. We don't know ahead of time what new features will land in browsers (headless running for Chrome and Firefox are examples), so we'll always need an "escape hatch", to allow people to set additional settings and capabilities we're not aware of. The second is pragmatic. The internals of Selenium's java code is set up to deal with Capabilities, and people extending the framework have been dealing with them as an implicit contract of the code.

In the wild, there are two major, and one very minor, "dialects" of the JSON-based protocol spoken by the various implementations. The first is the original "JSON Wire Protocol", and the second is the version of that protocol that has been standardised as part of the W3C "WebDriver" specification. We took pains when standardising to make sure that a JSON Wire Protocol response is almost always a valid W3C response (technical note: because all values are returned as a JSON Object with a "value" entry, which contains the return value), but there are two areas where the dialects diverge wildly.

One area is around the "Advanced User Interactions" APIs. The end point offered by the W3C spec is significantly more flexible and nifty than the original version in the Selenium project, but it is also a lot more complex to implement.

The other area is around "New Session", which is command used to create a new Selenium session. The JSON Wire Protocol demands that the user place the set of features that they're interested in using into a "desiredCapabilities" JSON blob. This was originally designed as part of a "resource acquisition is initialisation" pattern --- you'd load up the blob with everything you might want (a chrome profile, an equivalent firefox profile, the proxy you'd like to use) mashing together items that theoretically only belonged to one browser into a single unit. The remote end was then to do a "best effort" attempt to meet those requirements, and then report back what it had provided. The local end (the driver code) was then to test whether or not the returned driver was suitable for whatever it was that users wanted to do. Which is why they were called desired capabilities --- you made a wish, and then could look to see if it came true. If nothing matched, it was legit for a selenium implementation to just start up any driver and give you that.

The W3C protocol is a lot more structured. It provides for an ordered series of matches that can be made, with capabilities that must be present in all cases. For our example above, the proxy would be used for any driver, and then there'd be an ordered set of possible matches for chrome and then firefox (or vice versa). Each driver provider gets a chance to fulfill that request, and if it can, then we use that driver. If nothing matches, then we fail to initialise the session and return an exception to the users.

The more structured data used by the W3C New Session command is sent in a different key in the JSON blob, and this is by design. In theory, it's possible to map a JSON Wire Protocol "New Session" payload to the W3C one, and to map the W3C structure to something close to the JSON Wire Protocol payload. Sadly, this process is complex and error prone, and there are language bindings that have been released that get this wrong to one degree or another (and, indeed, some that don't even make the effort) All this means that the Selenium Server has to try and discern the user's intent from the blob of data sent across the wire. Getting this right, and flexible, has been the focus of the forthcoming 3.7 release. It's fiddly work, but it'll be worth it in the end.

Another common problem we see is that some servers out there speak the W3C protocol natively (eg. IEDriverServer, geckodriver, the Selenium Server) and others don't yet (eg. safaridriver, chromedriver, and services such as Sauce Labs). A big part of the 3.5 release was the "pass through" mode, which means that if the Selenium Server detects that both ends speak the same "dialect" of the wire protocol, it'll just shuttle data backwards and forwards. However, if it detects that the two ends don't speak the same protocol, it'll do "protocol conversion", mapping JSON Wire Protocol calls to and from W3C ones. This has been made easier by the fact that the W3C spec is congruent with the JSON Wire Protocol -- the two have identical end points for many commands.

But not all commands. The main ones that have been causing grief have been the advanced user interaction commands, particularly when a local end speaks the JSON Wire Protocol, and the remote end speaks the W3C one. Just such this situation arises for users of some cloud-based Selenium servers, and its been a constant source of questions from users. To help address this, I've landed some code that does emulation of the common JSON Wire Protocol advanced user interaction commands (things like "moveTo"). Hopefully this will address the majority of headaches that people are experiencing using this new functionality.

Let's see what the next month brings. Hopefully, we'll ship 3.7 :)

The Poetry of Code

2016-06-03T09:37:00.000+01:00

Write a poem about a sunrise. Perhaps you'll leap straight in, and start writing freeform verse. Perhaps you'll choose a style; a haiku, or a limerick? Something using iambic pentameter or rhyming couplets? Your choice of approach places tangible constraints on how you express yourself.

What aspect of the sunrise will you write about? The sun itself, or the environment it rises over? Maybe there's a seascape to be evoked, or mountains. Maybe a city?

Now ask a friend to write a poem about a sunrise. I promise you, it won't be the same. To the outside observer watching you work, both of you will look alike --- scratching words on a page with a pen --- but the results are wildly different.

You both work alone. Your art is your own. It's wonderful.

Write a program to sort some numbers. Perhaps you'll leap straight in, and start writing freeform code. Perhaps you'll choose a style; Object Orientation perhaps, or a functional approach? Something using Java or Python? Your choice of approach places tangible constraints on how you express yourself.

What algorithm will you choose to write? A bubble sort, or a quick sort? Maybe a shell sort to be implemented, or a sleep sort? Maybe there's some other approach?

Now ask a friend to write a program to sort some numbers. I promise you, it won't be the same. To the outside observer watching you work, both you will look alike --- typing words on a keyboard --- but the results are wildly different.

You seldom work alone. Your art is a collaborative exercise. It's wonderful.

Sometimes I'm an Idiot

2015-12-04T15:00:00.001+00:00

Recently, I've had a few too many things on my plate to deal with, and have been flirting with burn out, so it's time to take stock and stop being an idiot. In order to stop doing something, one must understand the things that are being done. In light of that, let me enumerate some of the ways in which I am an idiot:

Recently, for only the second time this year, I went to one of the many social events organised by work. They're a great chance to hang out with people I'd not normally see. I've been an idiot for prioritizing sitting at a keyboard over spending time getting to know other people and getting a better view of the company I work for. More broadly, I spend too much time at work, and it doesn't bring happiness. Don’t you be an idiot too. Go and talk to someone.
I've still got about half my annual leave to take even though it’s now December. I've been an idiot for not prioritizing resting and looking after myself. I'm in touching distance of finishing a big project, and once that's done I'll be taking all my remaining holiday. You should also take your leave. No project has failed because someone went on holiday.
For months, I was "too busy at work" to go to the doctor about a nagging pain in my foot. After it became so chronic walking to the office every morning was painful, I finally caved and went to seek help. It's going to take months to sort this shit out. If I'd taken the time to go to the doctor sooner I'd be better already, and things wouldn't be as painful or complicated as they are. I was an idiot for not prioritizing my own health. How can I work when I'm sick? If you're feeling rubbish, or you need treatment, go and get it. Then, once you’ve done that, come back and be busy.
Switching off from work is a must. Drawing a line between the office and home is vital. I've been an idiot for looking at work stuff after hours, when I can't really do anything about it, and never properly disconnecting. Facebook have an app called @Work and a recently-launched @Work Chat app, and I use these extensively. Your work may use a different email or calendar server you use for your personal life. When someone who's not an idiot leaves the offices, they mute work-related conversations, calendars and emails. Not doing that is a great way to burn out. Don't be an idiot.
At both Google and Facebook, I've had regular international travel in order to talk to people and collaborate with teams. This has meant I've not felt able to book myself into after-work courses even though I've wanted to. I am an idiot for letting work stop me from improving myself and my life. As a concrete example, my girlfriend was Turkish (she’s still Turkish. Figure the rest out for yourself). I thought it would be nice to learn the language so that we could go on holiday, visit her family, or just chat at home (and --- hey! --- learning a new language is always fun). I tried a combination of Rosetta Stone and text books, but I always let work get on top of me, so I never put in the work required. I knew I needed to take a structured class, and I knew that would need to be in the evening. I never signed up, because I was “too busy”. Now I deeply regret that, and kick myself routinely for being an idiot. Yesterday, I finally signed up for a Turkish class even though it’s a ten week course, and even though we’re no longer going out. At least I can still enjoy learning. I really am an idiot for not doing this already.

The main lesson I’m (finally!) learning is that I’ve been an idiot because I’ve let work dominate my life. Neither Facebook nor Google made me work these hours, or worry this much, or stress about all the things that I do and have done. I let work do that to me. I’ve had enough of being an idiot.

I have a horrible feeling that once I’m well rested, de-stressed, and feel like there’s more to life than a constant grind of work, I’ll be better at my job, and happier too. I'll have no way of dealing with not feeling like shit, but it'll be fun to find out. I don’t know whether this will be the case for sure, but doing the same-old, same-old isn’t working well.

I’ll keep you posted.

Monorepo --- One Source Code Repository to Rule Them All

2015-04-21T12:11:00.003+01:00

What is a monorepo? It's a unified source code repository used by an organisation to host as much of its code as possible.

This is the pattern followed by companies such as Google, Facebook and the BBC, and is the way that I prefer to structure large scale code, as discussed in my post ruminating on codebases I have known.

I'm not entirely sure where I first heard the term but I like the way it demonstrates an intentional approach, rather than being the result of happenstance, so I use it when I can.

Of course, you need some tooling around a monorepo of any significant size. At some point, there should be more posts about that, but for now, take a look at the video of how Facebook handles this from F8 in 2015.

Android: Forking Java by Mistake

2015-02-26T21:46:00.001+00:00

Java has been forked, and Google is the reason. Allow me to explain.

Back in the days of Cupcake and Donut, when Android was new and shiny, one of the things that made it attractive to developers was that they could use a language they were familiar with on this new platform. That language was Java, and of course the version used was a modern one. The most recent version of Java at the time was version 6.

Of course, Java moved forward. 6 was end-of-lifed in February, 2013, and version 7 is now the oldest version supported by Oracle. Java 7's end of public updates is looming, coming as it does in April 2015. Java 7 introduced a bunch of new APIs and useful features. Some of these, such as better generics inference, are syntactic sugar and provided by the compiler, but some of these (notably "try-with-resources") need support from the runtime. That support only appeared in Android KitKat.

Versions of Android before KitKat still account for just under 60% of the market according to Google's own dashboards. That means that someone who wants to target as much of the Android market as possible has a choice to make: use Java 7's fancy new features, or stick with Java 6. Android's toolchain supports taking Java 7 bytecode, so all the syntactic sugar provided by the compiler is available, but you can't use the new features. Things are only going to get worse as Java 8 gets wider adoption --- features such as lambdas look like they're going to be widely used, especially as the functional paradigm becomes more widespread.

Java has a vast collection of OSS and commercial libraries out there for Doing Useful Things. If an app chooses to target Java 6, every library it depends on must also make that same choice.

This means that the Java market is now forking. Server-side Java is forging ahead, and the libraries that it uses are increasingly starting to use modern Java features, unless enough of their users ask for Android compatibility.

So, what can Google do to keep the world moving forward? How can developers use a more modern Java whilst still being usable by the largest part of the Android market.

The simplest thing would be to release a shim that developers can optionally load on pre-KitKat Android. Oracle's lawsuit about API infringement may make this a deeply undesirable route for them to follow.

Alternatively, individual developers can pack any required classes and APIs into their own apps. That seems like an error-prone way of doing things --- it's way too easy to accidentally use these new APIs by accident, and someone who only tests on a recent device will miss the versioning problem.

Finally, I guess some benevolent third party could create the required shim, but getting widespread usage may be difficult, and it's hardly ideal.

Once Java 8 features come into widespread use (or the use of invokedynamic gets more traction), the situation won't be so simple. I seriously hope the big brains running Android are getting ahead of this problem --- we'll need platform support to solve this problem properly, and we'll need it soon.

Until that support arrives, Java has been forked, with two family trees each with Java 6 as their common ancestor.

Pioneers and Settlers

2014-10-01T13:27:00.001+01:00

I enjoyed breaking the world up into two kinds of developers so much, I thought I'd do it again.

The problems that we ask developers to solve are many and varied, but they all contain some mixture of the known and the unknown (sorry to come over all Rumsfeld this early in the post). A new project, the kind of which the company has never undertaken before, is riddled with the unknown to start with, whereas rewriting the legacy system in another language with a team who know the system and both the old and new languages is a pretty solid slab of known nastiness.

The kind of person you need for each type of problem differs. I used to think of them as "Starters" and "Finishers", but those terms are anodyne and lack the opportunity to be grossly misinterpreted. I call them "Pioneers" and "Settlers" now.

A pioneer is the kind of person you want to tackle a problem rife with the unknown. They're undaunted by the lack of a map, and positively enjoy the uncertainty. They tend to operate on their own, or in very small groups, and explore a problem with gusto. There's a strong chance code will be written, discarded, and written again, many times over. They might stand at a whiteboard and argue about design, covering it with boxes, arrows and misleading labels, before deciding the best thing to do is to build both approaches and see which one works.

But they get the job done. And once they know that the unknowns have been worked on, reduced to a manageable level and understood, they lose interest. The thing that drives them is taking a challenge that no-one else has overcome and showing that it's not really _that_ hard.

A pioneer is just the kind of person you want to get a project off the ground. And then you probably want them off the team. They've served their purpose, and now they're going to look for trouble.

What you want after the original skeleton is in place are Settlers. They take the rough trial laid out, and they make it habitable, maintainable and a Nice Place To Be. The problems that they solve tend not to be the "what the heck are we trying to do?" ones, but the "how the heck are we going to make this work?" ones. They're qualitatively different types of question, with very different challenges.

It's entirely possible that the pioneer, hotshot iconoclast that they were, has blazed a trial through the least pleasant route. They were only interested in getting things working, not doing it the best way possible. The settlers may be forced to throw out everything that the pioneer has done, leaving just the faint whiff of the original scheme in the air. I suspect that they do this more than most teams would like to admit.

It is vital to note that the challenges and design question a settler faces are no less taxing than those faced by pioneers, they're just different.

The pioneer finds "what" satisfying, and the process of solving "how" an anathema once there's working code. A settler finds iterating on the "how" a deeply rewarding experience, but the "what" may not hold much interest at all.

Of course, it depends on the project to determine what the correct mix of pioneers to settlers is. Sometimes, you just need a small team of pioneers. Sometimes you need a room full of settlers. Sometimes you need to start with pioneers, and then replace them with settlers. It's okay. The types of problems that need to be solved on a project vary over time. If they didn't, software development wouldn't be this much fun.

If you ever work with me, the chances are that you'll find I like the pioneer work an awful lot. I like to go shooting off into the darkness, meandering with glee into rough edges and emerging, triumphant and bleeding into the light having shown that, yes, yes it is possible to do something crazy. It takes a force of will for me to be a settler, and I rapidly get uneasy and unhappy when I try it.

Cavemen and Plainsmen

2014-09-26T15:37:00.000+01:00

It is, of course, a gross overgeneralisation to say that there are two kinds of developers in the world. In this post I plan to grossly overgeneralise by discussing the two kinds of developers that are out there in the world.

Software developers like to write code, but the way that a developer likes to write code can vary wildly. Take, for example, the caveman. When given a problem, the caveman likes to retreat into his cave. There, in the dark, and away from the prying eyes of the people around him, they can take the rough stone of the basic problem and fashion it into a thing of beauty. The art of creation take a long time, so for those of us watching from the outside, it's as if the developer has ceased to exist.

Finally, the caveman emerges from the gloom, clutching the gorgeous artifact you've been waiting to see for so long. Or rather, they sometimes emerge from the gloom clutching a precious artifact. Sometimes, in the dark, they drop the rough stone they're working on and accidentally pick up a coprolite. This fossilised turd has now been thoroughly polished and shaped. It's perfect, but there's no mistaking that it's a turd.

Oh well. At least while it was happening the manager could relax and rest easy: although they had no idea what their caveman was working on, it was clear that they working on something because they were in the cave.

Contrast this with the plainsman.

Give a plainsman a problem, and they will gleefully leap about, showing it to everyone and anyone. It's amazing to watch one of these in action. Ideas fizz about them, and new approaches to tackle the problem are tried and discarded. You'll know that a plainsman is working on a problem because you'll see it. They might be vocal, they might be chatty on groups. Who knows? But you'll see and feel the heat of creation.

Eventually, the plainsman will come off the savannah and show you the end product. It may be the glorious artifact you hoped for, or, as with the caveman, they may have become distracted and hand over some sort of steampunk turd.

Of course, in the process of doing this, they may well have given their manager a few scares and worrisome moments --- progress may have appeared slow, or a deadend may have been investigated for too long. Their manager may well be ever-so-slightly balder than before. Stress does that.

Taking a step back, and only looking at the starting point and the end point, the two kinds of developer are identical. They're both given a problem, and they both sometimes solve it, and sometimes they royally screw it up. It turns out that they both probably follow the same techniques and processes to figure out how to build the software they're crafting. It's just that the caveman keeps this quiet, and doesn't like people to know how things are going until they're done, whereas a plainsman has never managed to figure out how to turn off the noise, or has consciously dialed up the volume.

Managers may actually prefer a caveman. If the end result is going to be the same anyway, it'd be nice to have a quiet life, free of stress, so that the manager can get on with the important work of whatever it is that managers do (Gantt Charts? Going to meetings? Browsing the web? Who knows --- they're a mystery to me, much like cats)

The manager is wrong, of course. It's infinitely preferable to work with a plainsman.

The reason is that it's a rare developer who has to work entirely alone and isolated. They tend to work in teams (as an aside, what is the collective noun for developers? A confusion? A multi-faceted-opinion?) Within that team, there's normally someone who needs the code that our hypothetical developer is working on. Being able to see progress allows others to prioritise their own work. And that moment, where the idea is dropped, and the turd picked up? That moment may not go unobserved in a group.

Managers know this too. I poke fun at them because I can. Not cats, though. Never poke fun at a cat.

Now, although I present this in black-and-white terms, it should go without saying (though I'll say it anyway) that it's a rare developer indeed who sits at one or other of these extremes. You can spot a caveman by the feedback from their peers. Things like "needs to work on communication", or "where the did this highly polished turd come from?" A plainsman might have feedback saying that they're noisy.

If you ever work with me, I'm a plainsman if you're within earshot. Ask anyone who's worked with me, that's quite a distance. However, if you're not on the IRC channel I'm on and out of earshot, I'm a pretty effective caveman. Which means that I should never be left on my own. Or fed after midnight. No. Hang on. That's Gremlins, right?

"I'm not trapped in here with them. They're trapped in here with me."

Update: I really like the terms "caveman" and "plainsman", mainly because I find that they're ones that people remember easily, and which fit the premise of the analogy well, but I'm aware that they're not gender-neutral. Suggestions for replacements have been "morlocks" and "eloi", or "troglodytes" and "herders", but both of those cast the caveman in a pretty negative light, which isn't really what I want to say. "Cavedweller" and "plains-dweller" are probably the best alternatives.

A Ranty and Dogmatic Troll Masquerading as Coding Guidelines

2014-01-24T13:55:00.000+00:00

This document represents a series of guidelines for writing code that will swiftly pass code review with my team. It doesn't attempt to be fair. It doesn't attempt to listen to your opinion. It does present a series of guidelines. You may chose to ignore those guidelines (after all, if they had to be obeyed, we'd have called them "laws") We may choose to point back to these when reviewing your code. We'll attempt to avoid being (passive) aggressive when we do so.

Hugs and cuddles.

Test first

We're working on a framework for writing automated tests. It's probably a good idea for us to lead by example and write some tests. It hasn't been proven — but it's a scientific fact — that writing tests after the fact is as boring as boring can be. So we write the tests first. This has the added advantage that we can think about the kind of functionality we want independent of how it's implemented.

KISS, YAGNI

Or "Keep It Stupidly Simple, You Ain't Gonna Need It". When writing new code, just write the code you need. Don't attempt to write some über-generic, ultra-lightweight, whizz-bang sub-framework for handling every conceivable edge case when you only need to do "this thing" once. It's another Scientific Fact that engineers are bloody awful at spotting patterns before the patterns emerge. Wait until the third or even fourth time you repeat something before attempting to extract a common API.

A good book for this? "Refactoring to Patterns".

Prefer composition over inheritance

Let's admit this up front: Java's a deeply flawed language. If I had to write this crap in vi or emacs, I'd be sitting in a corner, rocking backwards and forwards, weeping gently and pulling at my hair. Fortunately, we have IDEs, so It's All Okay. One area where Java is flawed is that it's really easy to inherit from a base class, and really clumsy to do proper composition. Nevertheless, as a rule of thumb: inheritance of interfaces is cool, subclassing a concrete base class is not.

Now, there will be times where it'd be Really Useful to be able to share some common functionality between things that appear to be related. For some reason, this always seems to happen with base test classes. It's a weird fetish, and it's one we should be disabused of. There are better ways to handle this, possibly through JUnit's Rules, or by extracting the common functionality into its own class and just newing it up on demand.

Point is: if you're using inheritance to share some common functionality between otherwise unrelated classes (and "it's a test" doesn't make them related) you're not doing inheritance right. You are, in fact, Doing It Wrong.

Role-based interfaces #ftw

Role-based interfaces: we like them, we use them, and we encourage other people to use them. Yes, this will mean that your classes might implement a lot of interfaces. That means that they collaborate with a lot of other classes. That tells you something. It probably tells you your class has too many roles or responsibilities.

Remember, kids, inside every fat class are at least two classes waiting to climb out.

Expose Collaborators

We take the use of Dependency Injection as an article of faith. This does not mean that we whole-heartedly embrace the need for a "DI Container", but it does mean that we're huge fans of exposing the collaborators of a class via its constructor (because "constructor-based DI")

Having said that, "new" is not a dirty word. It's totally pukka to use it, particularly for those handy utility classes we mentioned above. Having said that….

Utility Classes: Just Say No

A utility class represents a severe failure of imagination. They tend to become dumping grounds for only loosely related methods, which are typically static. If you have one of these, a fun exercise is to decompose it into what I like to refer to as "Objects" and use those instead. If you're having trouble with the traditional "noun-based" approach to identifying classes, try a bit of London School TDD and see what shakes loose.

Singletons? Static Methods? Also No

Singletons (in the traditional "implemented as a static field in a class" sense, not in the "ideally we'd only have one of these" sense) destroy our ability to have fun and write tests that can run in parallel, slashing our potential productivity. Also, it leads people to start using the Service Locator pattern instead of Dependency Injection, and we take DI as an article of faith (see above), mainly because it facilitates TDD by making collaborators clear, like we (also) said above.

Use strong types where possible

We're using a high ceremony language. Might as well embrace that properly. We dislike Stringly-typed code, and we like Tiny Types. Why do we like them? Because they allow our code to express intent as clearly as possible, and we can do things like "hang behaviour" off them as the need arises.

BTW, this means that we really should never be returning "WebElement" from a Page Object. Return a class that models the thing the user would expect to be returned, even if that leads to a class with nothing but a constructor.

Use the most abstract type that conveys intent for variables and fields

The most abstract type explains to the reader what you do. The concrete type is how you're going to do it. You should be able to change your mind about "how" without needing to change the "what". The most abstract type that conveys intent for a variable ("Map" instead of "HashMap", "WebDriver" instead of "DroidDriver")

I've been asked whether a method should return an ImmutableSet or just a Set when it returns a set of things that both sorted and immutable. Redundant as this question may seem, I'll still have a bash at answering it. Return the ImmutableSet, as that conveys the intent of the return value. If the caller doesn't care about the mutability of the set, they can assign it to a Set. Everyone's happy.

Use the right naming convention

The naming convention in our codebase is a hangover from the Android coding style, which was created by people who wrote C++ for a living (which also explains why there are so many static methods in the framework too) We don't write C++ for a living, and using a foreign language's coding conventions in Java code makes you look like a clown and an arsehole.

However, it appears we're perfectly happy to present ourselves as people who find it hard to get dressed in the morning without hurting ourselves. Consequently, when writing code that's just for us, use the coding standard of the rest of the codebase and choke back the waves of nausea. If you're writing code that we'll put in front of a public who believe we're competent engineers (that is, OSS) use the Google coding standard (effectively Oracle's, but with a two space indent)

Use the Ubiquitous Language, Luke

If everyone calls it a "Self Aggrandizsing Wattle", don't name the class "AndroidPoweredMultiflexViewPort". We want people to find the classes we write, and we want them to understand how they relate to other classes in the system. Using the names that people call things as class names is Totally Cool.

Also, if you've not done so, go and grab a copy of Domain Driven Design and attempt to wade through as much of it as you can bear. Then skip to the end and read the bit about Anti-Corruption Layers. That bit's good.

Naming Things

Design Patterns are a means of communication, not blueprints. Similarly, the thing that makes classes interesting isn't what pattern they happen to implement, it's the role that they play in our system. Leave the pattern name off the class name, ok? The exception to this is the "Builder" pattern. Everyone expects the "Builder of Thing" to be called "ThingBuilder". We might as well go with the flow on this one and buck our own contrarian ways.

Similarly, every concrete class is an implementation of something, so using the postfix "Impl" (presumably if you're too lazy to name something properly, you're too lazy to type "Implementation") as a class name is a Dumb Thing To Do. Name the class for the particular thing that makes it interesting within the system, or prefix the name with "Default" if there is genuinely nothing interesting about it. Try and avoid naming the class around some obscure implementation detail that no-one using the class cares about.

BTW, it's acceptable to append the name of the interface being implemented to the class name, but it's better to try and name the class for the role it plays in the system.

Keep It SOLID

Single Responsibility Principle
Open/Closed Principle
Liskov Substitution Principle
Interface Segregation Principle
Dependency Inversion Principle

Presumably you're a Software Engineer. If any of those are hard to understand, then I'd suggest using your favourite search engine to read up on them.

Document in Proper English That Which Needs Documenting

It's a safe assumption that anyone actually reading your code is a professional software developer. Telling them stuff that they can see just by reading your code in javadocs and comments is Not Helping, so don't do it. Use comments to explain the reasoning behind particular design decisions, or to alert people to odd corner cases that might actually need explanation.

Consider replacing one line comments of a block of code with a sensibly named method containing that block of code. After all, when it comes to maintaining this shit, only the most dedicated of developers will actually update the code and the docs.

It's cool to use correct grammar and punctuation. Please do so, and try and end sentences with a period cretaceous

On the Naming of Tests

2013-11-04T13:32:00.000+00:00

I'd thought that this was part of the automated testing canon already, but apparently not, so a quick note on the naming of tests appears to be in order. Well, how I think tests should be named. :)

When using an xUnit-style framework, the common pattern is to test class Foo in another class called FooTest. Within this test class, there are several methods. The principle I like to follow is that if you took the name of the test class, stripped off the "Test" postfix, and then listed the names of the tests as bullet points, you'd end up with a list of roles and responsibilities of the class under test. You'd end up with something like:

Foo
  * Should eat cheese
  * Should not consider cake as cheese
  * Should handle null cheese by throwing a SpecificException

And so on.

Put another way, if someone were to delete the class under test and the bodies of the tests, could they recreate something functionally identical to the class under test using just the test names?

Time Keeps Ticking

2013-09-09T14:15:00.001+01:00

A note so that I never forget again: the time used by a ZipEntry instance in Java appears to keep ticking.


ZipEntry entry = new ZipEntry("foo");
long expected = System.currentTimeMillis();
entry.setTime(expected);
Thread.sleep(3000);
long seen = entry.getTime()

// This fails
assertEquals(expected, seen);

Update: it turns out that the problem turns out to be that DOS timestamps only store seconds with a precision of 2 seconds. The above could be reduced to:


ZipEntry entry = new ZipEntry("foo");

// Note: we set the seconds to an odd number
long expected = Calendar.getInstance()
    .set(2013, SEPTEMBER, 10, 12, 14, 1)
    .getTimeInMillis();
entry.setTime(expected);
long seen = entry.getTime()

// This fails
assertEquals(expected, seen);

On Managing Time and Direct Emails About Selenium

2013-07-22T08:57:00.000+01:00

I tend to get a few requests a day asking for help with a problem with Selenium. I almost never reply to these. It's not (just!) because I'm an evil minded, grumpy so-and-so, but because it's not a great use of anyone's time.

I am just one person. I have a full time job, and family and friends that I like to spend time with. I work on Selenium as a volunteer, and that means fitting it in where I can. Fortunately, work are understanding about this, and support my role in the project, which means I do far more than if I only had the occasional evening free. Still, it does mean that I prioritise my time spent on the project. This is how I do so:

Writing code. That comes first.
Spend time on the IRC channel. I often just lurk here, but it's a handy way to talk to the core development team and keep an eye on what's going on.
Answer emails to the selenium-developers group. This is where we run the project and hold design discussions. It's not a good place to ask for help, unless that help is about implementing Selenium itself.
I scan the webdriver and selenium-users lists, answering questions where I can and provided I have time.

Now, the nice thing with this ordering is that the further down the list you go, the more people there are who are able to help you with your issue. Put another way: asking for help in the user lists means that you're far more likely to get the help you want. It also means that if someone else runs into the same problem, Google can come to their rescue. A private email doesn't have that benefit.

I know that may be frustrating for you. I know that it seems to make sense to contact prominent people on the project directly. I understand your particular issue is urgent and important to you. I really do, and that's why I don't answer your emails.

Being Part of a Distributed Team

2013-07-19T13:07:00.002+01:00

At work I'm part of a distributed team. Two colleagues and I are based in London, and the rest of the team is in Menlo Park, California. It's been reminding me of some lessons that I've learnt over the years about working as part of a team that's spread across time zones, and I thought it might be nice to share some of them. Without further ado:

1. Attempt to get code reviewed by the person most familiar with the area but also by someone who's on the same site as you. This suggests that singletons on a site of their own are a suboptimal thing. Which leads to....

2. ... be tolerant of clean up diffs. If the local reviewer approves a change in the code review tool the chances are high it'll be landed. The author of the code is responsible for not being a clown: if there are fundamental design decisions that are still unresolved then landing the change, even if the local reviewer is ok with it, isn't a winning move.

Code is a plastic thing and we have source control. We can fix things up. Taking advantage of that is a Good Thing.

One useful pattern I've seen is to check in a failing but ignored test. It a lovely way of moving things forward without jamming the works, and leaves a clear trail of intent.

Another example of being good at this: the code reviews which get approved with a list of nits and changes to make before landing. I think that shows great trust in your team, and I like it. It goes without saying that if you're using a system like Gerrit (update: which submits the code when it sees that a diff has been approved), then this "ok but please fix" approach won't work as well :)

3. Time zones are evil but we must live with them. A great habit for the Brits to get into is to have done code reviews for USian teammates by 3pm BST. For those teammates in America, reviewing code by the British first thing when you get to the office in the morning is an extremely helpful thing do. If you're working with colleagues in Australia, India, China or elsewhere on that side of the globe, be aware of when they come into the office and have your code reviews done by then.

The reasoning is clear: it gives everyone as much time as possible to turn code around and get it reviewed again, as this is when the team's hours overlap. That, in turn, helps the team as a whole move fast.

4. Bear in mind the hierarchy of communication:

nothing -> email group -> email -> IM -> VC -> in person

The further to the right you are, the lower the latency and the higher the bandwidth. The higher the bandwidth, the quicker misunderstandings can be resolved and design choices made.

Corollary 1: if a review is dragging on, hop on to VC, Skype or a Google Hangout, or go and chat to people.

Observation: the further to the left you are, the more asynchronous communication becomes. If time isn't of the essence, then head left.

5. There's nothing like being in the same place. Travel is bad because it means you're away from family, your regular routines and all the things you love about home. Travel is great because you get to go to new places, hang out with the locals and get to see how other offices work. Although it's probably more financially astute to make the smaller part of the team travel to the larger, it's also unfair to expect the travel to always be done by one part of the team. It turns out that a sense of fairness is the thing that will keep the spirits up in the team and keep everything ticking along nicely.

I'm sure there's more that needs to be covered. Things like scheduling meetings and alternating times so that people don't always need to stay late or get up stupidly early, or having a useful glossary of ambiguous terms ("let's table this discussion") and other issues and hiccups, but I've been writing for a while now and this is getting long. So I'll stop.

The UNIX Philosophy, WebDriver and HTTP Status Codes

2013-06-24T21:11:00.000+01:00

The UNIX philosophy can be described in many ways (and the Wikipedia page has plenty), but I've always admired its practical application in the wealth of shell commands available to a user. Rather than having a single command that Does Everything, the UNIX shell is a place of small commands focused on doing one thing well, yet which are easy to link together.

For example, I recently needed to compare the contents of two JAR files and remove class files that were duplicated from one of those jars. I ended up generating the list of shared files via:

comm -12 <(jar tf first.jar | sort | uniq) <(jar tf second.jar | sort | uniq) | grep -v META

I doubt very much whether the authors of any of those tools thought that this is what I'd be doing, yet because the tools are carefully focused and are easy to chain together, this is a trivial thing to do.

How, you may ask, does this apply to Selenium? And specifically issue 141? For those of you who can't be bothered to read the incredibly long list of comments on that issue (now at over 100), this is the one about being able to get HTTP status codes from the WebDriver API. The comments are split between those saying that this functionality doesn't belong in the API, and those who (occasionally very vociferously) claim that it does.

From a philosophical perspective, the WebDriver API is attempting to model a user interacting with their browser. We attempt to limit the APIs we offer to just those that meet this need, only allowing ourselves to extend it to those very clear cases where the browser is the Source of Truth about a particular thing (such as with cookies), or where there's no rational way to cleanly offer a facility (such as executing Javascript --- incidentally, something that I spent a lot of time keeping out of the API)

HTTP status codes don't fall into either category. The browser isn't the the source of truth about these codes, as that's the originating web server. The user may not be aware of them either; a 404 from a .js file? That'd most likely go unnoticed. A 500 from even the main page? That may be returned as a 200 by some app servers in certain configurations.

So that leaves our users out to dry, right? Well, it would if it wasn't for the UNIX Philosophy. You see, it's ridiculously simple to hook up a proxy that will capture this information for the user if you can't obtain the information by instrumenting the server. You can do it like this:

// Explain where your proxy lives
Proxy proxy = new Proxy();
proxy.setHttpProxy("your_proxy:8080");

// Now tell the webdriver instance about it
DesiredCapabilities caps = new DesiredCapabilities();
caps.setCapability(CapabilityType.PROXY, proxy);

WebDriver driver = new RemoteWebDriver(caps);

That's 5 lines of code in enormously verbose Java.

Separating the concerns of "browser automation" from "logging network" traffic allows the Selenium developers (most of whom are not paid to work on Selenium) to focus on the problem of driving the browser. It means that they're not working on writing their own HTTP proxy, which is a sufficiently taxing tax that there are many projects out there working to write something solid and stable.

Great options for users looking for a powerful and capable proxy include Fiddler and Charles. Another option is the BrowserMob Proxy, which started as being a fork from the original Selenium RC codebase (Oh! The irony!) but has since matured and grown. This is amazingly simple to integrate with a WebDriver instance, as shown in their docs. For brevity, the integration can be done like so:

ProxyServer server = new ProxyServer(4444);
server.start();

// get the Selenium proxy object
Proxy proxy = server.seleniumProxy();

Following the UNIX approach, we make it easy to use a proxy with the WebDriver API. That means that we're not implementing an API for getting HTTP status codes in the Selenium project not only because it's out of scope, but there are already people doing a great job of offering that capability elsewhere.

Why I Care About Automated Testing

2013-05-22T09:43:00.002+01:00

I was reading a blog the other day that highlighted the fact that I've only got a limited number of keystrokes to use up in my working life. I can use those keystrokes on anything: email, writing code, futzing around with the git command line, facebook status messages, posts on Plus, anything....

That got me thinking about why I think that testing is an important part of software development: not an afterthought, but something that's as vital as considering API design or how to structure methods. It's because I only have a certain number of keystrokes. I'd rather spend those working on new features and moving the bits of the world I care about forward rather than fixing bugs or chasing down regressions.

It's undeniable that writing a test and the code itself takes more time. I'm having to write more code. I'm burning keystrokes. But each of those tests may be helping to prevent regressions, or providing insight into the structure and usage of my code. And that additional insight, and those prevented regressions, mean that cumulatively I have more time to spend hacking on features, and that's what I love to do.

So that's why I care about automated tests. That's also why I think you should care too.

Speaking Engagements

2013-03-26T10:26:00.003+00:00

One of the things that I love most about working in tech is the chances I get to speak in public and share some of the knowledge I've somehow managed to accumulate (even, sometimes, about work that I've actually done myself!)

For someone who enjoys speaking in public, I find it hard to actually promote the fact that I'll be in places, but some friends have recently asked when and where I'll be, so without further ado here are my next confirmed appearances:

7th-10th April: DroidCon, Berlin
23rd-24th April: GTAC, NY
2nd May: Facebook Mobile Developer Conference (a brief stint in London!)
10th-12th June: Selenium Conf (tickets on sale!) (It looks like it'll be great fun!)

For DroidCon, GTAC and the Facebook MobDevConf, I'll be talking about various aspects of my day job at Facebook, though each conference, because of their different focuses, will get to hear about different parts of it!

Selenium Conf is something that I always look forward to. It's a great chance to meet the people who are using the tool that has meant so much to my professional career, and it's also a fantastic opportunity to meet up with the rest of the selenium developer team and buy them a steak dinner. We've yet to have a veggie join us as a core team member, but we'll figure out a suitable way to reward their efforts too one day! If you're interested in becoming that first vegetarian contributor, then you can always start by contributing some code!

I've also recently become Facebook's W3C AC representative, and will be attending the TPAC in November in China.