LBlog Refactoring Step 1: Publishing

The first part of my LnBlog refactoring is kind of a large one: changing the publication logic.  I'll start by giving an overview of the old design, discussing the new design, and then looking at some of the actual project data.

History

LnBlog is a very old system.  I started it back in 2005 and did most of the work on it over the next two years.  It was very much an educational project - it was my first PHP project (in PHP 4, no less) and only my third web-based project.  So I really didn't know what I was doing.

As you might expect, the original design was very simple.  In the first version, "publishing an entry" really just meant creating a directory and writing a few files to the disk.  As the system grew more features, more and more steps were added to that process.  The result was a tangle of code where all of the logic lived in the "controller" (originally it was a server-page) with some of the actual persistence logic encapsulated in the domain objects.  So, roughly speaking, the BlogEntry object knew how to write it's metadata file to the appropriate location, but it didn't know how to properly handle uploaded files, notifications, or really anything else.  

Originally, LnBlog used a server-page model, with each URL being a separate file and code being shared by including common configuration files and function libraries all over the place.  Shortly prior to starting this project, I had consolidated the logic from all those pages into a single, massive WebPages class, with one public method for each endpoint - a monolithic controller class, for all intents and purposes.  This still isn't great, but it gave me a common entry point to funnel requests through, which means I can better control common setup code, handle routing more easily, and generally not have the pain of dealing with a dozen or more endpoints.

Anyway, this WebPages class served up and edited entries by directly manipulating domain objects such as BlogEntry, Blog, BlogComment, etc.  The domain objects knew how to save themselves to disk, and a few other things, but the majority of the logic was right in the WebPages class.  This worked fine for the main use-case of creating an entry from the "edit entry" page, but it made things very awkward for the less-used locations, such as the Metaweblog API endpoint or scheduled publication of drafts.

Furthermore, the publication logic in the WebPages class was very hairy.  All types of publication flowed through a single endpoint and used a common code path.  So there were flags and conditions all over the place, and it was very difficult to tell what needed to be updated when making changes.  Bugs were rampant and since there was no test automation to speak of, testing any changes was extremely laborious and error prone.

The New Design

There were two main goals for this refactoring:

  1. Create a new Publisher class to encapsulate the publication logic.  The idea was to have a single entity that is responsible for managing the publication state of blog entries.  Given a BlogEntry object, it would know how to publish it as a regular entry or a static article, unpublish it, handle updating or deleting it, etc.  This would give us a single entity that could own all the steps involved in publishing.
  2. Create unit tests around the publication process.  The logic around the entire process was more complicated than you'd think and the old code was poorly structured, so it broke with disturbing regularity.  Hence I wanted some automated checking to reduce the risk of bugs.

So the design is actually pretty straight-forward: create a "publisher" class, give it operations for each of the things we do with blog entries (create, update, delete, etc.), copy the existing logic for those cases into the corresponding methods, update the endpoints to use this new class, and call it a day.  

So it was mostly just a reorganization - there wasn't any significant new logic that needed to be written.  Simple, right?  What could go wrong?

Results

While I was happy with the result, this project turned out to be a much larger undertaking than I'd assumed.  I knew it was going to be a relatively large task, but I was off by a factor of more than two.  Below is a table summarizing the project statistics and comparing them to the original estimates (from PSP data I captured using Process Dashboard). 

Planned and actual project statistics
  Planed Actual
Total Hours  29:01 64:00
LOC added and modified  946  3138
LOC/Hour  32.6 49.0
Total Defects  82.1  69.0
Defects/KLOC 86.8 21.7

When I originally did the conceptual design and estimate, I had planned for relatively modest changes to the Article, BlogEntry, and WebPages classes and the creators.php file.  I also planned for new Publisher and BlogUpdater classes, as well as associated tests and some tests for the WebPages class.  This came to 29 hours and 946 new or changed lines of code across nine source files.  Definitely a big job when you consider I'm working in increments of two hours or less per day, whenever I get around to it.

In actuality, the changes were much larger in scope.  I ended up changing 27 additional files I hadn't considered, and ended up creating two other new utility classes (although I did ditch the BlogUpdater class - I no longer even remember what it was supposed to do).  The resulting 3138 lines of code took me 64 hours spread over five months.

Effects of Testing

I did test-driven development when working on this project.  I've found TDD to be useful in the past and it was very helpful to me here.  It was also very effective in meeting my second goal of building a test suite around the publication logic.  PHPUnit reports statement coverage for the Publisher tests at 97.52% and close to 100% coverage for the tested methods in the WebPages class (I only wrote tests for the endpoint that handles creating and publishing entries).

More importantly, using TDD also helped me to untangle the logic of the publication process.  And it turns out there was actually a lot to it.  I ended up generating about 2000 lines of test code over the course of this project.  It turns out that the design and unit test phase occupied 65% of the total project time - about 41 hours.  Having a comprehensive test suite was immensely helpful when I was rebuilding the publication logic across weeks and months.  It allowed me to have an easy check on my changes without having to load all of the code I worked on three weeks ago back into my brain.

Sadly, the code was not such that I could easily write tests against the existing code.  In fact, many of the additional changes came from having to break dependencies in the existing code to make it possible to unit test.  Luckily, most of them were not hard to break, e.g. using an existing file system abstraction layer, but the work still adds up.  It would have been very nice to have an existing test suite to prevent regressions in the rewrite.  Unfortunately, even integration tests would have been awkward, and even if I could have written them, it would have been very hard to get the level f coverage I'd need to be confident in the refactor.

Conclusion

In terms of the results, this project worked out pretty well.  It didn't really go according to plan, but I got what I was looking for in the end - a better publication design and a good test suite.  However, it was a long, slow slog.  Maybe that was too big a slice of work to do all at once.  Perhaps a more iterative approach could have kept things moving at a reasonable pace.  I'll have to try that on the next big project.

You can reply to this entry by leaving a comment below. You can send TrackBack pings to this URL. This entry accepts Pingbacks from other blogs. You can follow comments on this entry by subscribing to the RSS feed.

Add your comments #

A comment body is required. No HTML code allowed. URLs starting with http:// or ftp:// will be automatically converted to hyperlinks.