Your code needs GUTs

Note: This is another "from the archives" essay that I started two years ago.  It was inspired by this talk by Kevlin Henney as well as the general low quality of the unit test suite in the product I was working on at the time.  If you're browsing YouTube, I generally recommend any of Kevlin Henney's talks.  He's a bit like Uncle Bob Martin - insightful and entertaining with a deep well of experience to back it up.

Does your code have GUTs?  It should.  If you're a professional programmer, you really should have GUTs.  And by "GUTs", I mean "Good Unit Tests".

Unless you've been living under a rock for the last decade or two (or you're just new), the industry has decided that unit tests are a thing all "Serious Engineers" need to write.  For years I've been hearing about how you need to have unit tests, and if you don't you're not a real programmer.  So I decided to get with the program and start writing unit tests.

And you know what?  I failed miserably.  It was a serious train wreck.

Why did I fail?  Because I listened to those blog posts and podcast episodes that just tell you to write unit tests, and then I went out and just tried to write unit tests.  My problem was that I didn't know what a good unit test looked like.  So I wrote tests, but they were brittle, hard to understand, hard to maintain, and not even especially useful.  The result was that after a while, I concluded that these "unit test" people didn't know what they were talking about and had just latched onto the latest fad.  And then every year or two I would try again and reach the same conclusion.

So the real question is not "do you have unit tests?", but "do you have good unit tests?"  Because, as I alluded to above, bad unit tests are not necessarily better than no unit tests.  A bad unit test suite will break all the time and not necessarily give you any useful information, which makes maintaining it more costly than just dropping it.  

What do GUTs look like?

Good unit tests have three main properties:

  1. Readability
  2. Maintainability
  3. Durability

What does this mean?  Well, readability means just what it sounds like - that you can read and understand the test cases.  Without being an expert in the system under test, you should be able to look at a test case and be able to figure out what behavior it's testing.

Likewise, maintainability means the same thing as it does for production code - that you should be able to update and expand the test suite without undue effort or pain.

Durability simply means that your tests should not break willy-nilly.  Obviously there are lots of potential changes you can make to code that would break your tests, but the tests should not break unless they really need to.  So a durable test suite should not start failing because internal implementation details of the code under test were changed.

Guidelines for writing tests

So how to you write tests that have those three properties?  Well, I have a few guidelines and common practices that I use.  At work, I routinely find myself suggesting these things during code reviews.  Some people might disagree, but I've found these to be helpful and I'll try to justify my opinions.  Hopefully you'll find them useful.

Note: for the sake of simplicity and consistency, examples will be of a blogging system written in PHP, but these principles and ideas are by no means specific to such a system.  I just use that because I happen to have plenty of examples at hand so that I don't have to come up with fake ones.

Test names

Naming your tests is an important, but often-overlooked aspect of test writing.  Your test cases should have descriptive names.  And by "descriptive", I mean it should tell you three things:

  1. What behavior is being tested.
  2. What the conditions of the test are.
  3. What the expected outcome is.

If you use the "generate test cases" feature of your IDE, or something like that, you might end up with test names like testSaveEntry or testPublishEntry.  These are bad names.  For one thing, they only tell you the name of the function they're testing.  For another, they guide you into a convention of a one-to-one mapping of test classes and methods to production code classes and methods.  This is limiting and unnecessary.  You should have as many test classes and methods as you need.  I sometimes have an entire test class just to test one production method.  I don't recommend that as a general rule, but there are cases where it makes sense.

When in doubt, I recommend choosing a test naming convention.  I often use "test<name of method under test>_When<conditions of test>_<expected result>".  So, for example, if I was writing a test to check if the publishEntry() method throws an exception when the database update fails, I might name it testPublishEntry_WhenDbUpdateFails_Throws.  Or if I wanted to test that a text sanitizing function HTML encodes any angle brackets that it finds in the input, I might call it testSanitize_WhenDataContainsHtml_ReturnsDataWithTagsEscaped.  Obviously you can use a different convention, or no convention at all, but the point is that the test name tells you everything you need to know about what is being tested.

One thing to note here: the above names are kind of long.  I know - that's OK.  Remember that these are test names.  They're not methods that somebody is going to be calling in other code.  Nobody is ever going to have to type that name again.  So don't get hung up on the length.

Also of note, you should think about what the expected behavior is.  Things like testSaveEntryWorks are not enlightening.  What does "work" mean?  The same goes for things like testSaveEntryMakesExpectedDatabaseCalls.  OK...what are the expected database calls?  If you can't come up with something specific, that probably means you need to think more about your test, maybe even break it into multiple tests.

A good guideline to keep in mind is that it should be possible to write a pretty-printer that can read the names of all your test methods and print out a low-level specification for your system.  Ideally, you should be able to figure out how everything works just by looking at the test names.

Granularity

Big tests are bad.  Why?  Because the bigger the test is, the more things can break it.  This is the same reason that big functions and methods are bad - the more they do, the more opportunity there is for something to go wrong.  So if you can, it's better to keep things small.  Just as each function should do one thing, and do it well, each test should test one thing, and test it well.

And when I say "one thing", I really mean one thing.  Ideally, each test should have one, and only one, reason to fail.  The goal is that when a test fails, it should be pretty obvious what went wrong.  So if the only thing a test asserts is that X = Y, and it fails, then you can be pretty sure X isn't equal to Y.

On the other hand, if you're asserting a bunch of different things, it's harder to pinpoint the problem.  Furthermore, since most test frameworks will fail the test at the first failed assertion, you can end up with one failure masking another, i.e. the first of several assertions fails, so you fix it, and then the next assertion in that test fails, etc.  

So if you need to check a bunch of things, then write a bunch of tests.  Don't try to cram them all into one test.

Don't assert too much, but always assert something

A corollary to rule that a test should only test one thing is that a test should test at least one thing.  In code reviews for junior developers, I sometimes see tests that don't actually contain any assertions.  They have a bunch of setup code that configures mocks to make a method run, but no explicit assertions or call expectations on mocks.  All it really asserts is "this method doesn't throw an exception."

Needless to say, that's not a very strong assertion.  Heck, I can write methods that don't throw an exception all day long.  If the test fails, all you have to do is delete all the code in the method you're testing and there you go - it won't throw anymore.  Granted, your application won't do anything, but at least the test suite will pass.

So always make sure you're making some explicit assertion.  Otherwise, there's no point in writing the test.

Readability

Ideally, your tests should be a low-level specification - they should be clear and detailed enough that you could hand the tests to a developer and they could reverse engineer every part of your system based on that.  You probably won't actually accomplish that, but that's the goal we're shooting for in terms of coverage and clarity.

So given that, it's obviously very important that tests be readable.  A test that's hard to read isn't doing its job of communicating the intent of the system.  But what, exactly, does "readable" mean?

In my view, it's a combination of things.  Obviously, the action of the test should be clear.  You should be able to look at the test and easily be able to see what it's doing without having to search around through other code.  But you also want the test to be simple enough that you can easily understand it and not miss the forest for the trees.

One simple strategy to address this is to adopt a test structure convention.  This is typically an "arrange-act-assert" layout, where you group all of the setup code, action of the code-under-test, and the assertions into their own sections, so it's clear where each one starts and ends.  Where people sometimes get into trouble with this is when using mocking frameworks that require you to declare expectations and assertions on mocks up-front before they are called (you usually have to do "arrange-assert-act" in that case, but it's the same idea).  What I often see is people declare some a mock, declare some some assertion on it, then declare another mock with it's assertions farther down the test method, etc.  This makes the test harder to read because you have to hunt through all the setup code to determine which mocks have assertions on them.  There isn't a single place you can look at to see the expected result.

Another good strategy is the judicious use of utility functions.  This is especially the case if you have lots of similar tests with a substantial amount of setup code, which is not an uncommon case.  So, for example, if you have a method that calls a remote web service, you might want to have a bunch of different test methods that check all the different error conditions and return values, the mock object setup for many of those is probably very similar.  In fact, it might be the same except for one or two values.  You could handle that by just doing a copy-and-paste of the setup code and changing what you need, but then you end up with a sea of almost identical code and it becomes harder to see what the differences between the tests are.  The solution is to encapsulate some of that setup code in a utility method that you can just call from each of the tests.  This allows you to have your mocks be configured in one place and keep all those extraneous details out of the test body.

The one thing to watch out for with utility methods is that they should be as simple as possible.  When I can, I try to make them take no parameters and just give them a name that describes how they configure my mocks.  Remember that all the setup doesn't have to be encapsulated in a single function  - you can call multiple utility functions that configure different things.  In cases where it makes sense for the utility methods to take parameters, I try to limit how many they take, so as to keep the scope of what the function does clear.  And if you find yourself putting conditionals or loops in your utility functions, you might want to think about your approach.  Remember, the more logic there is in your tests, the more likely it is that your tests will have bugs, which is the last thing you want.

Conclusion

These are just some of the techniques and considerations that I employ when writing unit tests.  They've been working for me so far, but your mileage may vary.

My main goal here is just to give some ideas to people who are new to this unit testing thing.  Remember, writing unit tests is a skill that you have to learn.  This is especially the case when doing test-driven development.  Despite what you may have been led to believe, it's not a simple and obvious thing that you can just do cold.  So be patient, try different things, and take your time.  It's the journey, not the destination, that's the point.

Fluent-ish interfaces

This evening I was reading a post on Jason Gorman's blog about fluent assertions in unit tests and it made me smile.  Jason was updating some slides for a workshop and decided to illustrate fluent assertions by putting the "classical" assertion right next to the fluent version.  After doing that, he started to doubt the text on the slide that claimed the fluent version is easier to read than the classical one.

That made me feel good, because I've often wondered the same thing.  I've heard the the assertion that fluent assertions are easier to read many times - for instance, they said that in a Certified Scrum Developer class I took last year.  Apparently that's just "the direction the industry is going."  I've followed Jason's blog for a while and he seems like a pretty smart guy, so seeing him doubt the received wisdom gives me a little validation that it's not just me.  

Fluent assertions are...sort of fine, I guess.  I mean, they work.  They're not terrible.  I never really thought they were any easier to read than the old-fashioned assertions, though.  

From what I can tell, the claim is generally that fluent interfaces are easier to read because they naturally read more like a sentence.  And that's definitely true - they undeniably read more like a sentence.  Just look at some of Jason's examples (classical above, fluent below):

Assert.AreEqual(50, 50);
Assert.That(50, Is.EqualTo(50));

Assert.IsTrue(true);
Assert.That(true);

Assert.AreSame(payer, payee);
Assert.That(payer, Is.Not.SameAs(payee));

Assert.Contains(1, new ArrayList() {1, 2, 3});
Assert.That(new ArrayList() {1, 2, 3}, Contains.Item(1));

It's undeniable that "assert that 50 is equal to 50" is much closer to a real sentence than "assert are equal of 50 and 50" (or however you would read the classical version).  However, it would behoove us to remember that we're reading code here, not English prose.  We can't just assume that making your code read like an English sentence makes it more readable.  They're different things.

My theory is that we tend to think differently when we're looking at code than we do when we're reading text.  The fluent version might look closer to a sentence, but that's really just on the surface.  It's only more natural if you mentally strip out all the punctuation.  But that's not how we read code, because in code you can't just strip out the punctuation.  The punctuation is important.  Likewise, the order of things is important, and not necessarily the same as in English.

When I look at the fluent assertion, I don't just see "assert that 50 is equal to 50", I see all the pieces.  I mentally separate the "assert" from the "that", because "that" is a method, which usually means it contains the more immediately applicable information.  Likewise with "is equal to".  And, of course, I recognize Is.EqualTo(50) as a method call, which means that I have to mentally substitute the result of that into the second parameter.  But wait a minute - equal to what?  There's only one parameter to EqualTo, which doesn't make any sense.  Oh, wait, that's NUnit magic.  So we pass 50 and the result of EqualTo to That...what the heck does "that" mean anyway?  Wait - magic again.  We have to read that backwards.  So the actual assertion comes at the end and the "that" just ties things together.  OK, I've got it now.  So...what were we testing again?

OK, maybe that's a little silly, but you get the idea.  The point is that code is written in a certain way, and it's not the way English is written.  When you're reading code, you get into a certain mindset and you can't easily just switch to a different one because the next line just happens to call a fluent interface.  The old-fashioned Assert.AreEqual(50, 50) might not be sexy or natural looking, but it's short, simple, and perfectly clear to any developer worth his salt.  Why switch to an interface that's wordier and less aligned with how the rest of our code is written?  If it ain't broke, don't fix it.  

JavaScript unit testing

Today I watched a very interesting talk on good and bad unit testing practices by Roy Osherove, entitled "Unit Testing Good Practices & Horrible Mistakes in JS".  It really was an exceptional talk, with some good, concrete advice and examples.  I highly recommend it.

I actually recognized a couple of the examples in the talk.  The big one was OpenLayers.  I still remember being horrified by their test suite when I was using that library at my last job.  We had a number of extensions to OpenLayers and I was thinking about adding some tests into their test suite.  Then I ran the tests and saw a bunch of failures.  Assuming our code must be bad, I tried a fresh copy of the OpenLayers release.  A bunch of tests still failed.  And that's when I gave up and decided to just pretend their tests didn't exist.

If nothing else, the talk provided some nice validation that my JavaScript unit testing practices aren't completely wrong and stupid.  I've been writing unit tests for my JavaScript as part of my standard process for the last six months or so, and I was actually watching the video waiting for the hammer to drop, for Roy to describe an anti-pattern that I'd been blissfully propagating.  I was pleased and somewhat surprised when that moment didn't come.

Quickie TDD with Jasmine and Komodo

I'm currently on my annual "this time I'm really going to start doing test-driven development (or at least something close to it)" kick.  And it seems to be going pretty well, so hopefully this time it will actually stick. 

As I said, I do this every year and testing usually ends up falling by the wayside.  Granted, this is partly just due to a lack of discipline and commitment on my part.  Getting started with a new practice takes effort, and sometimes it's hard to justify that effort when "I've been doing just fine without it for years."  But there's also the matter of the type of projects I work on.  Most of the projects I've worked on have not had an establish test suite or a culture of testing.  Whether it's a work-related project or an old personal project for before I heard the gospel of TDD, the norm has been a sizable, years-old code-base that has few if any tests and isn't really designed to be testable in the first place. 

Getting into TDD with a project like that can be daunting.  Picture PHP code littered with static method calls, system calls, and direct class instantiations at multiple levels.  "You need to inject a mock object?  Yeah, good luck with that."  The classes may be well-defined, but there's not much compartmentalization of responsibilities within them, so inserting test doubles is not always straight-forward.  That leaves you in the unenviable position of either having to rewrite a bunch of existing code to make it unit-testable or set up some real test data and turn the unit test into an integration test.  The first option is obviously preferable, but can be much more risky, especially since you don't already have tests to validate the behavior of the code you need to change.  And while the second approach is certainly better than no tests at all, integration tests are slow to run, cumbersome to set up, and much more prone to breakage when things change.  Faced with a mess like that, it doesn't seem that unreasonable to say, "You know what?  I've got things I need to get done.  I'll get back to those tests later."  But, of course, you never actually do.

This time, I'm doing things a little differently.  For starters, I'm not writing new tests for old code.  I'm just doing the new code.  I'll get to the old code when/if I get around to refactoring it.  That means that I don't have to worry about untestable code, which makes the entire enterprise about a thousand times simpler.  I'm also not working with PHP code for the time being.  I'm trying to do TDD on two projects — one for work that's in JavaScript and a personal one in Python.  For Python, I'm using unittest and mock which, so far, I'm finding to be less of a pain in the neck than PHPUnit.

For the JavaScript project, I'm using Jasmine, which brings me to the title of this post.  Since I'm trying to do TDD (or rather BDD), I quickly tired of alt+TABbing to a browser window and then hitting ctrl+R to reload the page with the test runner.  Sure, it's not a big deal, but it's just one more thing that gets in the way.  I wanted to do what I could do in Python and PHP, which is just hit a hotkey and have Komodo run the test suite right in the same window. 

Turns out, that was actually pretty easy to set up.  I just banged out a quick macro that opens up the Jasmine test runner HTML file in a browser tab or refreshes it if it's already opened.  I bound that to a hotkey and I'm good to go.  Sure, it doesn't use the Komodo IDE testing framework, but that's not the end of the world — I just move it to a split pane and it works out pretty well.  I even added some CSS to the spec runner to make it match my Komodo color scheme.

Here's a screenshot of the side-by-side runner and some (unrelated but public) code:

And here's the macro itself:
(function() {
    var uri = 'file:///path/to/test_runner.html',
        view = ko.views.manager.getViewForURI(uri);
    if (view === null) {
        ko.open.URI(uri, 'browser');
    } else if (view.reload) {
        view.reload();
    } else {
        view.viewPreview();
    }
})();

Better IE testing

It seems that Microsoft has decided to be a bit nicer about testing old versions of Internet Explorer. I just found out about modern.IE, which is an official Microsoft site with various tools for testing your web pages in IE.

The really nice thing about this is that they now provide virtual machines for multiple platforms. That means that you can now get your IE VM as a VMware image, or a VirtualBox image, or one of a few other options.

When I was using Microsoft's IE testing VMs a couple of years ago, they were only offered in one format - Virtual PC. Sure, if you were running Windows and using Virtual PC, that was great. For everyone else, it was anywhere from a pain in the butt to completely useless. This is a much better arrangement and a welcome change. Nice work, Microsoft!

The @depends annotation in PHPUnit 3.4

OK, I'm not crazy! At least, not for the reason I thought.

I was playing with adding some dependencies to my Selenium PHPUnit tests. See, PHPUnit 3.4 has this handy little test dependency feature, the @depends annotation. It's basically a PHPDoc comment that includes a test name. You add that to a test and, if PHPUnit hasn't run the dependency yet, it skips the current test.

Well, I'm working on Windows. And every time I added an @depends to a test, PHPUnit would skip it. Even if the dependency had run! Even on the example from the documentation! I thought I was losing my mind.

However, it turns out that it's a known bug in PHPUnit. Basically just a line ending issue. The bug ticket has a 2 character patch that fixes it, or you can just save your files with UNIX line endings. Go figure.

Update: Wow, talk about weird timing. It turns out this issue was fixed in PHPUnit 3.4.1, which was released the day after I posted this.

A little Selenium and PHPUnit oddity

Here's a weird little "feature" I came across the other day. It seems that Selenium discriminates between spacing in source and rendered HTML. I suppose this shouldn't be that surprising, but I just hadn't thought to account for it.

Here's what happened: I was writing a Selenium test case using PHPUnit's Selenium extension and I kept getting a failure on a simple assertTextPresent() call. I checked the page manually, and the text was definitely there. I selected the text and right-clicked to try the assertion in Selenium IDE, and it worked there. Then I tried copying the expected string out of my PHP test case and checking that in the IDE, and that's where it failed.

The only difference: one extra space in the PHP string. The real kicker is that that space is present in the page source. But when I stopped to think about it, it make perfect sense. Selenium interacts with the page through the DOM, and multiple spaces get rendered as one in HTML. The source is irrelevant - it's the rendered code that counts. I'll have to remember that for next time.