Your code needs GUTs

Note: This is another "from the archives" essay that I started two years ago.  It was inspired by this talk by Kevlin Henney as well as the general low quality of the unit test suite in the product I was working on at the time.  If you're browsing YouTube, I generally recommend any of Kevlin Henney's talks.  He's a bit like Uncle Bob Martin - insightful and entertaining with a deep well of experience to back it up.

Does your code have GUTs?  It should.  If you're a professional programmer, you really should have GUTs.  And by "GUTs", I mean "Good Unit Tests".

Unless you've been living under a rock for the last decade or two (or you're just new), the industry has decided that unit tests are a thing all "Serious Engineers" need to write.  For years I've been hearing about how you need to have unit tests, and if you don't you're not a real programmer.  So I decided to get with the program and start writing unit tests.

And you know what?  I failed miserably.  It was a serious train wreck.

Why did I fail?  Because I listened to those blog posts and podcast episodes that just tell you to write unit tests, and then I went out and just tried to write unit tests.  My problem was that I didn't know what a good unit test looked like.  So I wrote tests, but they were brittle, hard to understand, hard to maintain, and not even especially useful.  The result was that after a while, I concluded that these "unit test" people didn't know what they were talking about and had just latched onto the latest fad.  And then every year or two I would try again and reach the same conclusion.

So the real question is not "do you have unit tests?", but "do you have good unit tests?"  Because, as I alluded to above, bad unit tests are not necessarily better than no unit tests.  A bad unit test suite will break all the time and not necessarily give you any useful information, which makes maintaining it more costly than just dropping it.  

What do GUTs look like?

Good unit tests have three main properties:

  1. Readability
  2. Maintainability
  3. Durability

What does this mean?  Well, readability means just what it sounds like - that you can read and understand the test cases.  Without being an expert in the system under test, you should be able to look at a test case and be able to figure out what behavior it's testing.

Likewise, maintainability means the same thing as it does for production code - that you should be able to update and expand the test suite without undue effort or pain.

Durability simply means that your tests should not break willy-nilly.  Obviously there are lots of potential changes you can make to code that would break your tests, but the tests should not break unless they really need to.  So a durable test suite should not start failing because internal implementation details of the code under test were changed.

Guidelines for writing tests

So how to you write tests that have those three properties?  Well, I have a few guidelines and common practices that I use.  At work, I routinely find myself suggesting these things during code reviews.  Some people might disagree, but I've found these to be helpful and I'll try to justify my opinions.  Hopefully you'll find them useful.

Note: for the sake of simplicity and consistency, examples will be of a blogging system written in PHP, but these principles and ideas are by no means specific to such a system.  I just use that because I happen to have plenty of examples at hand so that I don't have to come up with fake ones.

Test names

Naming your tests is an important, but often-overlooked aspect of test writing.  Your test cases should have descriptive names.  And by "descriptive", I mean it should tell you three things:

  1. What behavior is being tested.
  2. What the conditions of the test are.
  3. What the expected outcome is.

If you use the "generate test cases" feature of your IDE, or something like that, you might end up with test names like testSaveEntry or testPublishEntry.  These are bad names.  For one thing, they only tell you the name of the function they're testing.  For another, they guide you into a convention of a one-to-one mapping of test classes and methods to production code classes and methods.  This is limiting and unnecessary.  You should have as many test classes and methods as you need.  I sometimes have an entire test class just to test one production method.  I don't recommend that as a general rule, but there are cases where it makes sense.

When in doubt, I recommend choosing a test naming convention.  I often use "test<name of method under test>_When<conditions of test>_<expected result>".  So, for example, if I was writing a test to check if the publishEntry() method throws an exception when the database update fails, I might name it testPublishEntry_WhenDbUpdateFails_Throws.  Or if I wanted to test that a text sanitizing function HTML encodes any angle brackets that it finds in the input, I might call it testSanitize_WhenDataContainsHtml_ReturnsDataWithTagsEscaped.  Obviously you can use a different convention, or no convention at all, but the point is that the test name tells you everything you need to know about what is being tested.

One thing to note here: the above names are kind of long.  I know - that's OK.  Remember that these are test names.  They're not methods that somebody is going to be calling in other code.  Nobody is ever going to have to type that name again.  So don't get hung up on the length.

Also of note, you should think about what the expected behavior is.  Things like testSaveEntryWorks are not enlightening.  What does "work" mean?  The same goes for things like testSaveEntryMakesExpectedDatabaseCalls.  OK...what are the expected database calls?  If you can't come up with something specific, that probably means you need to think more about your test, maybe even break it into multiple tests.

A good guideline to keep in mind is that it should be possible to write a pretty-printer that can read the names of all your test methods and print out a low-level specification for your system.  Ideally, you should be able to figure out how everything works just by looking at the test names.

Granularity

Big tests are bad.  Why?  Because the bigger the test is, the more things can break it.  This is the same reason that big functions and methods are bad - the more they do, the more opportunity there is for something to go wrong.  So if you can, it's better to keep things small.  Just as each function should do one thing, and do it well, each test should test one thing, and test it well.

And when I say "one thing", I really mean one thing.  Ideally, each test should have one, and only one, reason to fail.  The goal is that when a test fails, it should be pretty obvious what went wrong.  So if the only thing a test asserts is that X = Y, and it fails, then you can be pretty sure X isn't equal to Y.

On the other hand, if you're asserting a bunch of different things, it's harder to pinpoint the problem.  Furthermore, since most test frameworks will fail the test at the first failed assertion, you can end up with one failure masking another, i.e. the first of several assertions fails, so you fix it, and then the next assertion in that test fails, etc.  

So if you need to check a bunch of things, then write a bunch of tests.  Don't try to cram them all into one test.

Don't assert too much, but always assert something

A corollary to rule that a test should only test one thing is that a test should test at least one thing.  In code reviews for junior developers, I sometimes see tests that don't actually contain any assertions.  They have a bunch of setup code that configures mocks to make a method run, but no explicit assertions or call expectations on mocks.  All it really asserts is "this method doesn't throw an exception."

Needless to say, that's not a very strong assertion.  Heck, I can write methods that don't throw an exception all day long.  If the test fails, all you have to do is delete all the code in the method you're testing and there you go - it won't throw anymore.  Granted, your application won't do anything, but at least the test suite will pass.

So always make sure you're making some explicit assertion.  Otherwise, there's no point in writing the test.

Readability

Ideally, your tests should be a low-level specification - they should be clear and detailed enough that you could hand the tests to a developer and they could reverse engineer every part of your system based on that.  You probably won't actually accomplish that, but that's the goal we're shooting for in terms of coverage and clarity.

So given that, it's obviously very important that tests be readable.  A test that's hard to read isn't doing its job of communicating the intent of the system.  But what, exactly, does "readable" mean?

In my view, it's a combination of things.  Obviously, the action of the test should be clear.  You should be able to look at the test and easily be able to see what it's doing without having to search around through other code.  But you also want the test to be simple enough that you can easily understand it and not miss the forest for the trees.

One simple strategy to address this is to adopt a test structure convention.  This is typically an "arrange-act-assert" layout, where you group all of the setup code, action of the code-under-test, and the assertions into their own sections, so it's clear where each one starts and ends.  Where people sometimes get into trouble with this is when using mocking frameworks that require you to declare expectations and assertions on mocks up-front before they are called (you usually have to do "arrange-assert-act" in that case, but it's the same idea).  What I often see is people declare some a mock, declare some some assertion on it, then declare another mock with it's assertions farther down the test method, etc.  This makes the test harder to read because you have to hunt through all the setup code to determine which mocks have assertions on them.  There isn't a single place you can look at to see the expected result.

Another good strategy is the judicious use of utility functions.  This is especially the case if you have lots of similar tests with a substantial amount of setup code, which is not an uncommon case.  So, for example, if you have a method that calls a remote web service, you might want to have a bunch of different test methods that check all the different error conditions and return values, the mock object setup for many of those is probably very similar.  In fact, it might be the same except for one or two values.  You could handle that by just doing a copy-and-paste of the setup code and changing what you need, but then you end up with a sea of almost identical code and it becomes harder to see what the differences between the tests are.  The solution is to encapsulate some of that setup code in a utility method that you can just call from each of the tests.  This allows you to have your mocks be configured in one place and keep all those extraneous details out of the test body.

The one thing to watch out for with utility methods is that they should be as simple as possible.  When I can, I try to make them take no parameters and just give them a name that describes how they configure my mocks.  Remember that all the setup doesn't have to be encapsulated in a single function  - you can call multiple utility functions that configure different things.  In cases where it makes sense for the utility methods to take parameters, I try to limit how many they take, so as to keep the scope of what the function does clear.  And if you find yourself putting conditionals or loops in your utility functions, you might want to think about your approach.  Remember, the more logic there is in your tests, the more likely it is that your tests will have bugs, which is the last thing you want.

Conclusion

These are just some of the techniques and considerations that I employ when writing unit tests.  They've been working for me so far, but your mileage may vary.

My main goal here is just to give some ideas to people who are new to this unit testing thing.  Remember, writing unit tests is a skill that you have to learn.  This is especially the case when doing test-driven development.  Despite what you may have been led to believe, it's not a simple and obvious thing that you can just do cold.  So be patient, try different things, and take your time.  It's the journey, not the destination, that's the point.

You can reply to this entry by leaving a comment below. This entry accepts Pingbacks from other blogs. You can follow comments on this entry by subscribing to the RSS feed.

Add your comments #

A comment body is required. No HTML code allowed. URLs starting with http:// or ftp:// will be automatically converted to hyperlinks.