Birthday book acquisitions

This is my more-than-slightly belated birthday post. cool

This year, my birthday fell on the last day of our vacation in Cape Cod.  We've been there several times, so we've seen most of the attractions we're interested in at least once.  Thus we had a nice, restful, laid-back week.

I'm not usually much for shopping, but one of my traditional activities on the cape is to go shopping for used books.  (Actually, I do that pretty much everywhere.)  However, this year I was very disappointed to find that my favorite used book shop on Main St. in downtown Hyannis had closed! I'd shopped there every time we've been to the cape and they had a really nice selection.  I define a "nice selection" at a used book store as one that includes a wide range of unusual books on a wide variety of topics (such as this one, which I bought at that shop two years ago and spent my vacation week reading), as opposed to used book stores that carry mostly mass-market paperbacks, usually in the mystery and romance genres.  

Fortunately, I was able to find another good used book shop - Parnassus Book Services in Yarmouth.  We always stay in Hyannis, so it's not as convenient as the other one was, but they've got a great selection.  In fact, it's a bit daunting - the shelves literally go up to the ceiling and they're completely packed.

Yet even with the abbreviated browsing time I had when shopping with a small child, I still managed to find several interesting volumes, including:

  1. Islam by Karen Armstong
  2. A History of the Vikings by Gwyn Jones
  3. The Book of J by Bloom and Rosenberg
  4. Dinosaur Lives by John Horner
  5. Aristophanes' The Forgs
  6. Exodus: The True Story by Ian Wilson 
  7. Discovering Dinosaurs by Norell, Gaffney, and Dingus
  8. Fundamentals of Elcetronic Data Processing: An Introduction To Business Computer Programming, edited by Rice and Freidman

My birthday book haul

As you can see, my book tastes are a bit eclectic.  Two themes that are obvious are my interest in studying religion and Dinosaurs.  The former is a legacy of my BA in Philosophy, while the latter is a result of being the father of a six-year-old boy - I loved dinosaurs when I was young too, and my son's interest in them re-sparked mine.

I couldn't resist the one on the bottom, though: Fundamentals of Electronic Data Processing: An Introduction to Business Computer Programming.  It's from the 1960's, so basically a piece of computing history.  I've got a couple of other computing books of similar vintage sitting on my shelf, though this is certainly the oldest.  One of these days I'll have to read them all the way through and write up some reviews, like I did with my copy of The Mythical Man-Month.  It's always interesting to see just how much has stayed the same despite the radical changes in technology.

I give up - switching to GItHub

Well, I officially give up.  I'm switching to GitHub.

If you read back through this blog, you might get the idea that I'm a bit of a contrarian.  I'm generally not the type to jump on the latest popular thing.  I'd rather go my own way and do what I think is best than go along with the crowd.  But at the same time, I know a lost cause when I see it and I can recognize when it's time to cut my losses.

For many years, I ran my own Mercurial repository on my web host, including the web viewer interface, as well as my own issue tracker (originally MantisBT, more recently The Bug Genie).  However, I've reached the point where I can't justify doing that anymore.  So I'm giving up and switching over to GitHub like everybody else.

I take no real pleasure in this.  I've been using Git professionally for many years, but I've never been a big fan of it.  I mean, I can't say it's bad - it's not.  But I think it's hard to use an more complicated than it needs to be.  As a comment I once saw put it, Git "isn't a revision control system, it's more of a workflow tool that you can use to do version control."  And I still think the only reason Git got popular is because it was created by programming celebrity Linus Torvalds.  If it had been created by Joe Nobody I suspect it would probably be in the same boat as Bazaar today.

That said, at this point it's clear that Git has won the distributed VCS war, and done so decisively.  Everything supports Git, and nothing supports Mercurial.  Heck, even BitBucket, the original cloud Mercurial host, is now dropping Mercurial support.  For me, that was kind of the final nail in the coffin.  

That's not the only reason for my switch, though.  There are a bunch of smaller things that have been adding up over time:

  • There's just more tool support for Git.  These days, if a development tool has any VCS integration, it's for Git.  Mercurial is left out in the cold.
  • While running my own Mercurial and bug tracker installations isn't a huge maintenance burden, it is a burden.  Every now and then they break because of my host changing some configuration, or they need to be upgraded.  These days my time is scarce and it's no longer fun or interesting to do that work.
  • There are some niggling bugs in my existing environment.  The one that really annoys me is that my last Mercurial upgrade broke the script that integrates it with The Bug Genie.  I could probably fix it if I really wanted to, but the script is larger than you'd expect and it's not enough of an annoyance to dedicate the time it would take to become familiar with it.
  • My web host actually now provides support for Git hosting.  So I can actually still have my own repo on my own hosting (in addition to GitHub) without having to do any extra work.
  • Honestly, at this point I've got ore experience with Git than Mercurial, to the point that I find myself trying to run Git commands in my Mercurial repos.  So by using Mercurial at home I'm kind of fighting my own instincts, which is counterproductive.

So there you have it.  I'm currently in the process of converting all my Mercurial repos to Git.  After that, I'll look at moving my issue tracking in to GitHub.  In the long run, it's gonna be less work to just go with the flow.

LnBlog Refactoring Step 3: Uploads and drafts

It's time for the third, slightly shorter, installment of my ongoing series on refactoring my blogging software.  In the first part, I discussed reworking how post publication was done and in the second part I talked about reworking things to add Webmention support.  This time, we're going to talk about two mini-projects to improve the UI for editing posts.

This improvement is, I'm slightly sad to say, pretty boring.  It basically involves fixing a "bug" that's really an artifact of some very old design choices.  These choices led to the existing implementation behaving in unexpected ways when the workflow changed.

The Problem

Originally LnBlog was pretty basic and written almost entirely in HTML and PHP, i.e. there was no JavaScript to speak of.  You wrote posts either in raw HTML in a text area box, using "auto-markup", which just automatically linkified things, or using "LBCode", which is my own bastardized version of the BBCode markup that used to be popular on web forums.  I had implemented some plugins to support WYSIWYG post editors, but I didn't really use them and they didn't get much love.

The old LnBlog post editor

Well, I eventually got tired of writing in LBCode and switched to composing all my posts using the TinyMCE plugin.  That is now the standard way to compose your posts in LnBlog.  The problem is that the existing workflow wasn't really designed for WYSIWYG composition.

In the old model, the idea was that you could compose your entire post on the entry editing page, hit "publish", and it would all be submitted to the server in one go.  There's also a "review" button which renders your post as it would appear when published and a "save draft" button to save your work for later.  These also assume that submitting the post is an all-or-nothing operation.  So if you got part way done with your post and decided you didn't like it, you could just leave the page and nothing would be saved to the server.

At this point it is also worth noting how LnBlog stores its data.  Everything is file-based and entries are self-contained.  That means that each entry has a directory and that directory contains all the post data, comments, and uploaded files that are belong to that entry.

What's the problem with this?  Well, to have meaningful WYSIWYG editing, you need to be able to do things like upload a file and then be able to see it in the post editor.  In the old workflow, you'd have to write your post, insert an image tag with the file name of your picture (which would not render), add your picture as an upload, save the entry (either by saving the draft or using the "preview", which would have trigger a save if you had uploads), and then go back to editing your post.  This was an unacceptably workflow clunky.

On top of this, there was a further problem.  Even after you previewed your post, it still wouldn't render correctly in the WYSIWYG editor.  That's because the relative URLs were inconsistent.  The uploaded files got stored in a special, segregated draft directory, but the post editor page itself was not relative to that directory, so TinyMCE didn't have the right path to render it.  And you can't use an absolute URL because the URL will change after the post is published.

So there were two semi-related tasks to fix this.  The first was to introduce a better upload mechanism.  The old one was just a regular <input type="file"> box, which worked but wasn't especially user-friendly.  The second one was to fix things such that TinyMCE could consistently render the correct URL for any files we uploaded.

The solution - Design

The actual solution to this problem was not so much in the code as it was in changing the design.  The first part was simple: fix the clunky old upload process by introducing a more modern JavaScript widget to do the uploads.  So after looking at some alternatives, I decided to implement Dropzone.js as the standard upload mechanism.

The new, more modern LnBlog post editor.

The second part involved changing the workflow for writing and publishing posts.  The result was a somewhat simpler and more consistent workflow that reduces the number of branches in the code.  In the old workflow, you had the following possible cases when submitting a post to the server:

  1. New post being published (nothing saved yet).
  2. New post being saved as a draft (nothing saved yet).
  3. Existing draft post being published.
  4. Existing draft post being saved.
  5. New (not yet saved) post being previewed with attached files.
  6. Existing draft post being previewed with attached files.

This is kind of a lot of cases.  Too many, in fact.  Publishing and saving were slightly different depending on whether or not the entry already existed, and then there were the preview cases.  These were necessary because extra processing was required when an entry was previewed with new attachments because, well, if you attached an image, you'd want to see it.  So this complexity was a minor problem in and of itself.

So the solution was to change the workflow such that all of these are no longer special cases.  I did this by simply issuing the decree that all draft entries shall always already exist.  In other words, just create a new draft when we first open the new post editor.  This does two things for us:

  1. It allows us to solve the "relative URL" problem because now we can make the draft editing URL always relative to the draft storage directory.
  2. It eliminates some of those special cases.  If the draft always exists, then "publish new post" and "publish existing draft" are effectively the same operation.  When combined with the modern upload widget, this also eliminates the need for the special "preview" cases.

The implementation - Results

I won't get into the actual implementation details of these tasks because, frankly, they're not very interesting.  There aren't any good lessons or generalizations to take from the code - it's mostly just adapting the ideosyncratic stuff that was already there.

The implementation was also small and went fairly smoothly.  The upload widget was actually the hard part - there were a bunch of minor issues in the process of integrating that.  There were some issues with the other part as well, but less serious.  Much of it was just integration issues that weren't necessarily expected and would have been hard to foresee.  You know, the kind of thing you expect from legacy code.  Here's some stats from Process Dashboard:

Project File Upload Draft always exists
Hours to complete (planned): 4:13 3:00
Hours to complete (actual): 7:49 5:23
LOC changed/added (planned): 210 135
LOC changed/added (actual): 141 182
Defects/KLOC (found in test): 42.6 27.5
Defects/KLOC (total): 81.5 44.0

As you can see, my estimates here were not great.  The upload part involved more trial and error with Dropzone.js than I had expected and ended up with more bugs.  The draft workflow change went better, but I ended up spending more time on the design than I initially anticipated.  However, these tasks both had a lot of unknowns, so I didn't really expect the estimates to be that accurate.

Take Away

The interesting thing about this project was not so much what needed to be done but why it needed to be done. 

Editing posts is obvious a fundamental function of a blog, and it's one that I originally wrote way back in 2005.  It's worth remembering that the web was a very different place back then.  Internet Explorer was still the leading web browser; PHP 5 was still brand new; it wasn't yet considered "safe" to just use JavaScript for everything (because, hey, people might not have JavaScript enabled); internet speeds were still pretty slow; and browsing on mobile devices was just starting to become feasible.  In that world, a lot of the design decisions I made at the time seemed pretty reasonable.

But, of course, the web evolved.  The modern web makes it much easier for the file upload workflow to be asynchronous, which offers a much nicer user experience.  By ditching some of the biases and assumptions of the old post editor, I was more easily able to update the interface.

One of the interesting things to note here is that changing the post editing workflow was easier than the alternatives.  Keeping the old workflow was by no means impossible.  I kicked around several ideas that didn't involve changing it.  However, most of those had other limitations or complications and I eventually decided that they would ultimately be more work.  

This is something that comes up with some regularity when working with an older code-base.  It often happens that the assumptions baked into the architecture don't age well as the world around the application progresses.  Thus, when you need to finally "fix" that aspect of the app, you end up having to do a bit of cost-benefit analysis.  Is it better to re-vamp this part of the application?  Or should you shim in the new features in a kinda-hacky-but-it-works sort of way?

While as developers, our first instinct is usually to do the "real" fix and replace the old thing, the "correct" answer is seldom so straight-forward.  In this case, the "real" fix was relatively small and straight-forward.  But in other cases, the old assumptions are smeared through the entire application and trying to remove them becomes a nightmare.  It might take weeks or months to make a relatively simple change, and then weeks or months after that to deal with all the unforeseen fallout of that change.  Is that worth the effort?  It probably depends on what the "real" fix buys you.

I had a project at work once that was a great example of that.  On the surface, the request was a simple "I want to be able to update this field", where the field in question was data that was generally but not necessarily static. In most systems, this would be as simple as adding a UI to edit that field and having it update the datastore.  But in this case, that field was used internally as the unique identifier and was used that way across a number of different systems.  So this assumption was everywhere.  Everybody knew this was a terrible design, but it had been that way for a decade and was such a huge pain to fix that we had been putting it off for years.  When we finally bit the bullet and did it right, unraveling the baked-in assumptions about this piece of data took an entire team over a month.  At an extremely conservative estimate, that's well over $25,000 to fix "make this field updatable".  That's a pretty hefty price tag for something that seems so trivial.

The point is, old applications tend to have lots of weird, esoteric design decisions and implementation-specific issues that constrain them.  Sometimes removing these constraints is simple and straight-forward.  Sometimes it's not.  And without full context, it's often hard to tell when one it will be.  So whenever possible, try to have pity on the future maintenance programmer who will be working on your system and anticipate those kind of issues.  After all, that programmer might be you.

Spam filters suck

Author's note: Here's another little rant that's been sitting in my drafts folder for years. Twelve years, to be precise - I created this on March 28, 2007. That was toward the end of my "government IT guy" days.

I'd forgotten how much of a pain the internet filtering was. These days, I hardly think about it. The government job was the last time I worked anyplace that even tried to filter the web. And e-mail filtering hasn't been something I've worried about in a long time either. These days, the filtering is more likely to be too lax than anything else. And if something does get incorrectly filtered, you generally just go to your junk mail folder to find it. No need for the rigamarole of going back and forth with the IT people. It's nice to know that at least some things get better.

I'm really starting to hate spam filters. Specifically, our spam filters at work. And our web filters. In fact, pretty much all the filters we have here. Even the water filters suck. (Actually, I don't think there are any water filters, which, if you'd tasted the municipal water, you would agree is a problem.)

I asked a vendor to send me a quote last week. I didn't get it, so I called and asked him to send it again. I checked with one of our network people, and she tells me it apparently didn't get through our first level of filters. So she white-listed the sender's domain and I asked the guy to send it again. It still didn't get through.

As I've mentioned before, our web filters also block Wikipedia.

On opinions and holding them

Strong Opinions Loosely Held Might be the Worst Idea in Tech.  I think the name of that article pretty much says it all.

This is essentially something I've thought since I heard about he concept of "strong opinions loosely held".  I can see how it could work in certain specific cases, or be mindfully applied as a technique for refining particular ideas.  However, that only works when everyone in the conversation agrees that that's the game they're playing, and that's not usually how I've seen the principle presented anyway.  Rather, it's usually described in terms more like "be confident in your opinions, but change your mind if you're wrong."  And that's fine, as far as it goes.  But it's far from clear that this works out well as a general principle.

To me, "strong opinions loosely held" always seemed kind of like an excuse to be an jerk.  Fight tooth and nail for your way until someone and if someone proves you wrong, oh well, you weren't that attached to the idea anyway.  It seems to fly in the face of the Hume's dictum that "a wise man proportions his belief to the evidence."  Why fight for an idea you don't care that much about?  If you're not sure you're right, why not just weigh the pros and cons of your idea with other options?

I suppose the thing that bothers me the most about it is that "strong opinions loosely held" just glorifies the Toxic Certainty Syndrome, as the article's author calls it, which already permeates the entire tech industry.  Too often, discussions turn into a game of "who's the smartest person in the room?"  Because, naturally, in a discussion between logical, intelligent STEM nerds, the best idea will obviously be the one that comes out on top (or so the self-serving narrative goes).  But in reality, nerds are just like any group, and getting people to do things your way is orthogonal to actually having good ideas.  So these conversations just as often degrade into "who's the biggest loud-mouth jerk i the room?"

I'm not sure how I feel about the article's specific "solution" to preface your assertions with a confidence level, but I do empathize with the idea.  My own approach is usually to just follow the "don't be a jerk" rule.  In other words, don't push hard for something you don't really believe in or aren't sure about, don't act more sure about a position than you are, and be honest about how much evidence or conviction you actually have.  It's like I learned in my Philosophy 101 class in college - to get to the truth, you should practice the principles of honesty and charity in argument.  Our industry already has enough toxic behavior as it is.  Don't make it worse by contributing to Toxic Certainty Syndrome.