Monday 24 June 2013

The UNIX Philosophy, WebDriver and HTTP Status Codes

The UNIX philosophy can be described in many ways (and the Wikipedia page has plenty), but I've always admired its practical application in the wealth of shell commands available to a user. Rather than having a single command that Does Everything, the UNIX shell is a place of small commands focused on doing one thing well, yet which are easy to link together.

For example, I recently needed to compare the contents of two JAR files and remove class files that were duplicated from one of those jars. I ended up generating the list of shared files via:

comm -12 <(jar tf first.jar | sort | uniq) <(jar tf second.jar | sort | uniq) | grep -v META

I doubt very much whether the authors of any of those tools thought that this is what I'd be doing, yet because the tools are carefully focused and are easy to chain together, this is a trivial thing to do.

How, you may ask, does this apply to Selenium? And specifically issue 141? For those of you who can't be bothered to read the incredibly long list of comments on that issue (now at over 100), this is the one about being able to get HTTP status codes from the WebDriver API. The comments are split between those saying that this functionality doesn't belong in the API, and those who (occasionally very vociferously) claim that it does. 

From a philosophical perspective, the WebDriver API is attempting to model a user interacting with their browser. We attempt to limit the APIs we offer to just those that meet this need, only allowing ourselves to extend it to those very clear cases where the browser is the Source of Truth about a particular thing (such as with cookies), or where there's no rational way to cleanly offer a facility (such as executing Javascript --- incidentally, something that I spent a lot of time keeping out of the API)

HTTP status codes don't fall into either category. The browser isn't the the source of truth about these codes, as that's the originating web server. The user may not be aware of them either; a 404 from a .js file? That'd most likely go unnoticed. A 500 from even the main page? That may be returned as a 200 by some app servers in certain configurations. 

So that leaves our users out to dry, right? Well, it would if it wasn't for the UNIX Philosophy. You see, it's ridiculously simple to hook up a proxy that will capture this information for the user if you can't obtain the information by instrumenting the server. You can do it like this:

// Explain where your proxy lives
Proxy proxy = new Proxy();

// Now tell the webdriver instance about it
DesiredCapabilities caps = new DesiredCapabilities();
caps.setCapability(CapabilityType.PROXY, proxy);

WebDriver driver = new RemoteWebDriver(caps);
That's 5 lines of code in enormously verbose Java. 

Separating the concerns of "browser automation" from "logging network" traffic allows the Selenium developers (most of whom are not paid to work on Selenium) to focus on the problem of driving the browser. It means that they're not working on writing their own HTTP proxy, which is a sufficiently taxing tax that there are many projects out there working to write something solid and stable.

Great options for users looking for a powerful and capable proxy include Fiddler and Charles. Another option is the BrowserMob Proxy, which started as being a fork from the original Selenium RC codebase (Oh! The irony!) but has since matured and grown. This is amazingly simple to integrate with a WebDriver instance, as shown in their docs. For brevity, the integration can be done like so:
ProxyServer server = new ProxyServer(4444);

// get the Selenium proxy object
Proxy proxy = server.seleniumProxy();
Following the UNIX approach, we make it easy to use a proxy with the WebDriver API. That means that we're not implementing an API for getting HTTP status codes in the Selenium project not only because it's out of scope, but there are already people doing a great job of offering that capability elsewhere.