Amazon MP3

Author's note: Welcome to another edition of "From the Archives", where I post some crappy, half-finished thing that's been sitting in my drafts folder for the last 10 years!

This articles is one I started on September 14, 2008, and apparently made some edits to on October 4, 2016 (don't ask me what they were). It's about Amazon Music, or as it used to be called Amazon MP3. Nobody cares about MP3 or any other specific format anymore, though. In fact, at this point I think most people just stream music through some app that doesn't even tell them what format it's in.

But back in the 2000's, it was all about MP3s. The only other format going was AAC, and only because that's what iTunes used. People would download these files and copy them to dedicated music playing devices. Yes kids, that was a thing. I had a bunch of those devices. They were fine at the time, but it's important to realize that this was in the days before high-speed data was ubiquitous and phones had tens of gigabytes of usable storage available.

Anyway, Amazon MP3 has since become Amazon Music and now focuses more on streaming than downloading. Fortunately, you can still download MP3s from Amazon Music, you now just have to do it through their desktop app. It's not too bad, but I actually don't like it as much as the download experience for the old AmazonMP3 version of the service. The app isn't really focused on that and they keep changing the interface.
And yes, I do still care about downloading the music I buy - that's why I have a Plex instance. I like to feel like I have some measure of control over digital products I buy, even if much of it is an illusion these days.

But anyway, that's enough from now. Back to 2008. Enjoy!

I've really gotten to like Amazon's MP3 download service. I've bought a number of songs and albumns through it in the last couple of months, and it's quite nice. In fact, it's what a music download service should be.

The big win for Amazon, of course, is selection. They might not have everything, but they come damn close. Nearly every other download service I've seen over the years had a limited selection. Great if you're into discovering new artists in particular genres, but they never had mainstream stuff.

The other main selling point for Amazon is price. You can buy individual songs for $0.99 or $0.89 (just as cheap as iTunes) and entire albumns at a discount. No subscriptions or other commitments required.

Aside from those obvious issues, the service is actually very well designed. For starters, it's web-friendly, which already puts it ahead of iTunes in my book. The searching and browsing works well and they have the usual Amazon suggestions and reviews. There's a nice little Flash app for song previews and Amazon's trademark one-click purchasing. It even works well in Opera for Linux, which is notorious for questionable Flash support.

The one non-web-friendly thing about AmazonMP3 is the download app. Instead of an actual MP3, you download a .amz file, which is handed off to this download app. It queues up the files for download and drops them in appropriately organized folders. Apparently it can also import them into iTunes and WMP too. That's about it, though. It's invoked by the browser as the file handler for .amz files and, really, that's the only way you'd ever run it. I mean, other than download files, it really doesn't do anything.

amazonmp3.png

On the up side, the download app is widely supported and failry inocuous. It's available for Windows, Mac, Debian, Ubuntu, Fedora, and Open SuSE, so Linux people aren't left out in the cold. It's a small program, too. The Ubuntu package is a grand total of 772KB uncompressed. Hardly the massive 60MB iTunes package.

I don't get NFT

I only recently learned about the existence of NFTs as a "thing".  If you are also unfamiliar, NFT stands for "Non-Fungible Token".  CNN has an explanation here, but t's essentially a unique digital token, stored on a blockchain, that represents some digital asset.  Apparently NFTs have recently become the latest hot trend, with some of them selling for hundreds of thousands of dollars.  

However, this Slashdot post points out that there are some potential issues with NFTs.  Basically, NFTs don't actually store the asset you're purchasing.  So if you buy an NFT for an image, the image doesn't actually live on the blockchain.  You've only bought a reference to the image, which is actually hosted someplace else.  So if whatever server or service is hosting that image goes away, you will no longer be able to access it, regardless of owning the NFT.

So I guess what I'm confused about is: what's the point?  I mean, what are you really getting when you buy an NFT?  In a theoretical sense, I can see how an NFT is better than "conventional" digital assets in that it's not tied to a particular service.  Your ownership of that item is recorded in you crypto wallet which is independent of any particular service an can be used with multiple NFT marketplaces.  And that's a good thing. 

But when you look at it functionally, there's not really much difference.  The actual asset still exists completely independent of the blockchain, so it's not like a physical asset - there might only be one token, but you can still infinitely duplicate the asset.  And as far as I can tell, buying an NFT doesn't actually mean you're purchasing the copyright for the asset.  So you're just buying a copy and there's nothing to stop anyone from making other copies.  And because the asset isn't stored on the blockchain, if you want to ensure you always have access then you need to download a copy of it.  So...how is this different from buying a service-specific digital asset?

It seems like the point is less the actual asset and more that NFT is the same thing as Bitcoin - that it's just a different way to speculate on blockchain stuff.  Especially when you're talking about something like spending $2.5 million for jack Dorsey's first tweet, it's hard to see any other rational explanation.  But even for less absurd cases, it's not clear to me what the practical benefit is.  The main reason that blockchain "works" for cryptocurrency is because the thing on the blockchain is the thing you're transferring.  As soon as you introduce a disconnect between the thing being traded and the thing on the blockchain, it seems like you lose a lot of the benefit of the blockchain.  

Refactoring LnBlog

Author's note:  Happy new year!  I thought I'd start the year off with another old post that's been sitting in my drafts folder since March 24, 2013.  This time, though, I'm going to provide more inline commentary.  As usual, the interjections will be italicized and in parentheses.

You see, this post is on LnBlog, and specifically what is (or was) wrong with it.  If you don't know, LnBlog is the software that runs this website - I wrote it as a "teach yourself PHP" project starting back around 2005.  I've been improving it, on and off, ever since.  So in this post, I'm going to show you what I thought of it back in 2013 and then discuss what I think now and what has changed.  Hopefully it will be somewhat enlightening.  Enjoy!


The year is (relatively) new and it's time for some reflection. In this case, reflection on past code - namely LnBlog, the software that runs this site.

I've come a long way from LnBlog, which as my first "teach yourself PHP" project. I've now been doing full-time professional PHP development since 2007 and can reasonably claim to have some expertise in it. And looking back, while the LnBlog codebase is surprisingly not horrifying for someone who had a whopping two months of web development experience going into it, it's still a mess. So it's time to start slowly refactoring it. And who knows? Blogging my thought process might be useful or interesting to others.

(Back to now: I actually did blog some of this stuff, but not until 2017 or so.  And I still agree with that initial assessment.  The code had plenty of problems then and it still does.  If I were starting fresh today, I'd probably do almost everything differently.  But on the other hand, I've seen much worse in much newer code.  And in the last three years or so I've been making slow and steady improvements.)

The Issues

There are lot of things about LnBlog that need changing. A few of them are functional, but it's mostly maintenance issues. By that I mean that the code is not amenable to change. It's not well organized, it's too hard to understand, and it's too difficult to make updates. So let's go over a few of the obvious difficulties.

1. The plugin system

I have to face it - the plugin system is an unholy mess. The entire design is poorly thought out. It's built on the premise that a "plugin" will be a single PHP file, which makes things...painful. Any plugin with significant functionality or a decent amount markup starts to get messy very quickly. The "single file" limitation makes adding styles and JavaScript ugly as well.

On the up side, the event-driven aspect works reasonably well. The code for it is a bit nasty, but it works. The main problem is that there aren't really enough extension points. It needs a bit more granularity, I think. Or perhaps it just needs to be better organized.

(Back to now: I still agree with most of this, except perhaps the thing about extension points.  So far, the only place where that's been a real problem is when it comes to inserting markup mid-page.  But yeah, the whole "a plugin is one file" thing was ill-conceived.  The good news is that it's totally fixable - I just need to figure out some design conventions around splitting things out, which hasn't been a priority so far.)

2. The templating system

This one is also an unholy mess. The idea isn't bad - allow any file in a theme to be over-ridden. However, I tried to abstract the template files too much. The files are too big and contain too much logic. Also, the simple template library I'm using is more a hindrance than a help. I'd be better off just ditching it.

I've also been thinking of getting rid of the translation support. Let's face it - I'm the only person using this software. And I'm only fluent in one language. Granted, the translation markers don't cause any harm, but they don't really do anything for me either, and accounting for them in JS is a bit of a pain.

(Back to now: The only thing I still agree with here is that the existing templates are a mess.  But that has nothing to do with the template system - I just did a bad job of implementing the template logic.  I'm working on fixing that - for instance, I added some Jinja-like block functionality to the template library.  I had considered re-writing the templates in Twig or something, but it quickly became obvious that that would be a huge amount of work, that it would be difficult to do in a piece-wise fashion, and it's not clear that the payoff would be worth it.  Likewise with the translation markers - taking them out would be a bunch of work for almost zero payoff and the JS thing isn't really that big a deal.  Besides, if I ever changed my mind again it's WAY more work to put them back in.)

3. The UI sucks

Yeah, my client-side skills have come a long way since I built LnBlog. The UI is very Web 1.0. The JavaScript is poorly written, the style sheets are a mess, the markup is badly done, and it's generally "serviceable" at best.

As I realized the other day, the style sheets and markup are probably the worst part. Trying to update them is difficult at best, which is exactly the opposite of what you want in a theme system. In retrospect, my idea to replace files wholesale rather than overriding seems misguided. They're too fragmented. When it comes to the style sheets and JavaScript, this also hurts performance, because there are a lot of files and everything is loaded in the page head.

(Back to now: This is pretty much still accurate.  I've been slowly improving the UI, but it's still not looking particularly "modern".  That's not such a big deal, but the templates and CSS are still a pain-point.  Really, what I need to do is rework the theme system so that I can easily make lighter-weight themes, i.e. I should be able to just create one override CSS file and call it good.  I have the framework for that in place, but I have yet to actually go through the existing themes and make that work.)

4. Too much compatibility

When I first started writing LnBlog, I had a really crappy shared web hosting account. And by "really crappy", I mean it offered no database server and had safe-mode and the various other half-baked PHP "security measures" enabled by default. So I actually built LnBlog to be maximally compatible with such an environment.

These days, you can get decent hosting pretty cheap. So unless you can't afford to pay anything, there's no need to settle for such crappy hosting. And again, let's be honest here - I don't even know anyone other than me who's using this software. So supporting such crappy, hypothetical configurations is a waste of my time.

In addition, I really put an absolutely ridiculous number of configuration settings into LnBlog. The main config file is extensively documented and comes to over 700 lines. That's completely nuts and a pain to deal with. It contains a lot of settings that are pointless and that hardly anyone would ever want to override. And most of those could be moved into a GUI rather than having to edit a file. There's absolutely no reason for many of those settings.

(Back to now: This is also still true.  I've been looking at redoing the config system, but that's another one of those things that is a big change because it has tendrils all through the code.  I have been moving some stuff out of the main blogconfig.php file, and I've been avoiding adding to it, but there's still a lot there.  For the most part, it's not a huge issue, since most of the things you would want to configure are through the UI, but still....)

5. No real controller structure

I knew nothing of MVC or design patterns when I first wrote LnBlog. As a result, the "glue" code is in the form of old-style procedural pages. They're messy, poorly organized, and hard to maintain. A more modern approach would make things much easier to deal with.

(Back to now: The old "pages" are dead in all but name.  A handful of them still exist, but they're three-liners that just delegate to a controller class.  The bad news is that it's pretty much just two monolithic controller classes with all the old logic dumped into them.  So that sucks.  But they have dependency injection and some unit test coverage, so this is still an improvement.  And I've at least got a little routing groundwork laid so that I could start breaking off pieces of functionality into other classes in the future.)

The Problem

While I'd like to fix all this stuff in one shot, there are three big problems here:

  1. That's a lot of stuff, both in terms of the number of tasks and the amount of code involved.
  2. I no longer have the kind of free time I did when I first wrote this.
  3. I'm actually using this software.

Of course, those are two sides of the same coin.  LnBlog isn't huge, but it isn't tiny either - the codebase is upwards of 20,000 lines.  That wouldn't be a big deal if I were working on it as my full-time job, but this is a side-project and I can devote maybe a couple hours a day to it sometimes.  So major surgery is pretty much out.  And the third factor means that I need to be careful about breaking changes - not only do I not want to break my own website, but I also want to avoid having to do a lot of migration work because writing migration scripts is not my idea of a fun way to spend my free time.

(Back to now: This is always a problem with open-source and side projects.  Nothing has changed here except, perhaps, my development process.  After that year I spent learning about the Personal Software Process, I started using some of those methods for my personal projects.  The main change was that, when making any kind of a big change or feature addition, I actual do a semi-formal process with a requirements and design phase and review phases.  It sounds kind of silly for a personal project, but it's actually extremely useful.  The main benefit is just in having my thoughts documented.  Since I might be going a week or more between coding sessions on any particular feature, it's insanely helpful to have documentation to refer back to.  That way I don't have to remember or waste time figuring things out again.  And by having design- and code-review phases as part of my development process, I have a built-in reminder to go back and check that I actually implemented all those things I documented.  Having the whole thing written out just makes it much easier when you have long gaps in between work sessions.)


General commentary from the present: So as you can see from the above comments, I've fixed or am fixing a lot of the things that bothered me about LnBlog eight years ago.  In the last two or three years I've put a lot of work into this project again.  Part of it is because I actually use it and want it to be better, but part of it is also "sharpening the saw".  I've been using LnBlog as an exercise in building my development skills.  It's not just coding new features, like the flurry of development in the first two years or so that I worked on LnBlog, it's cleaning up my past messes, adding quality assurance (in the form of tests and static analysis), updating the documentation and figuring out how to balance responsible project management with limited resources).  It's an exercise in managing legacy code.

To me, this is a useful and important thing to practice.  As a professional developer, you will have to deal with legacy code.  In my day job, I've had to deal with code that was written by our CEO 10+ years ago when he started the company.  Software is a weird combination of things that live a week and things that live forever, and there's seldom any good way to tell which group the code will be in when you're writing it.  So while it's important to know how to write code correctly the first time, it's also important to know how to deal with the reality of the code you have.  And no, "let's rewrite it" is not dealing with reality.  And when you have a code-base that's 15 years old, that you're actively using, and that you originally wrote, it's a great opportunity to experiment and build your skills in terms of modernizing legacy code.

And that's just what I'm doing.  Slowly but surely, LnBlog is getting better.  I've implemented a bunch of new features, and in the process I've worked on my design and analysis skills, both at a product level and at a technical level.  I've fixed a bunch of bugs, which makes my life easier.  I've implemented additional tests and static analysis, which also makes my life easier by finding bugs faster and giving me more confidence in my code.  I've improved the design of the system, which again makes my life easier because I can now do more with less effort.  Sure, there's still plenty do to, but I've made lots of progress, and things are only getting better.

Blogging APIs

Author's note: It's officially the holiday season and I'm feeling lazy. So guess what - it's another episode of "From the Archives"! That's right, it's time for more of the series where I trot out something that's been sitting in my drafts folder for ten years because I can't muster the energy to write something new.

This article is from way back on March 18, 2007. It actually appears to be finished, so I'm not sure why I never published it. Perhaps I was filled with crippling self doubt that my analysis was actually stupid and people would make fun of me, so I never hit the publish button. That was actually a thing I did for many years. These days, I'm more casual about it. One of the benefits of getting older is that you're more experienced and able to have a more objective view of the quality of your work. Another is that you have a wider perspective on what really matters in life. Another way to put that is "you don't care as much what other people think."

Anyway, this article is about my attempts at implementing the various blogging APIs in LnBlog. This included the Blogger, MetaWeblog, and MoveableType APIs. I guess that seemed like a good idea at the time. In retrospect, it was a nice educational exercise, but not especially useful. I mean, the idea of a generic API that third-party clients can use to post to your blog is great. But in practice, it doesn't seem like that's a thing that many people actually need. I certainly never found a good use-case for it. Maybe that's why the APIs never really got fleshed out.

But that's enough out of present-day me. Let's hear from 2007 me. Enjoy!

Last month, I finally got around to doing some actual testing on the MetaWeblog and Blogger API implementations for LnBlog. By that, I mean that rather than testing it with my own code, I actually installed a few free blogging clients. I learned a few interesting lessons from this.

The Clients

I tested four blogging clients. The first is Deepest Sender 0.7.9, a Firefox extension that supports LiveJournal, Blogger, Wordpress, MSN Spaces, and generic MetaWeblog blogs. The second is KBlogger 0.6.2, a simple KDE panel applet that supports the MetaWeblog and Blogger APIs. Third is QTM 0.4.0, a C++/Qt4 application that support the Blogger, MetaWeblog, and MovableType APIs. Last is BloGTK 1.1, a Python/GTK+ application that also supports Blogger, MW, and MT.

My results were mixed. KBlogger worked basically as advertised. In fact, it's the only one of the clients that seemed to understand the APIs in the same way that I do. The only problem is that it's a bit short on features.

BloGTK seemed to work pretty well. However, it worked best when set to use the MoveableType API. When using the MetaWeblog API, I had problems editing posts. It also has a few weird little bugs, such as things getting messed up when switching accounts.

While it has a nice interface, QTM simply would not work with LnBlog. For the record, this is not my fault but rather due to the fact that this version of QTM did not correctly implement the APIs. When sending calls to the server, it sent blog IDs, post IDs, category IDs, etc. as integers, whereas the specification calls for them to be strings. While the difference may be academic for some servers, LnBlog really does use strings as IDs, so the requests raise errors in the XML-RPC library. (Note: this seems to have been corrected in CVS.)

And as for Deepest Sender, I just can't get it to work as advertised. I don't know why. It can post entries, but editing them results in a hung edit window, the "active blog" box is shrunk down to an unrecongizable control, and I have yet to even see a category selection box.

Server Problems

The first problem I encountered with LnBlog's implementation of the Blogger 1.0 and MetaWeblog APIs was my silly assumption that, just because they are two separate API specifications, I could implement them separately. And so, that's exactly what I did: I wrote one PHP script to implement the Blogger 1.0 API and a different one to implement the MetaWeblog API.

Oh, what a fool I was!

While that attitude made perfect sense when looking just at the specs, it just doesn't work that way in practice. Of the four clients I tested, KBlogger was the only one that worked when the MetaWeblog server didn't implement the Blogger 1.0 API at the same URI. The others all blithely assumed that the same URI would implement the Blogger, MetaWeblog, and MovableType APIs. I guess few people even stopped to consider that a server might have independent implementations of the different APIs. Or perhaps it's just that KBlogger is designed to support clients that only understand Blogger 1.0 while the others assume MetWeblog support. It's hard to tell.

Placing the blame

After going back to look at the specs, I believe much of the blame for this situation rests with the MetaWeblog API specification itself. The problem is that it's just a bad specification. In fact, it's really more a sketch of a specification than an actual spec. It's too vauge, too confusing, and leaves too much open to interpretation.

For instance, take the metaWeblog.getCategories method. According to the specification, this method returns a struct, with each member being a struct with the description, HTML URL, and RSS URL for each category.

For non-programmers, "struct" is short for "structure," and is simply set of key/value pairs. In this case, having member structs with keys for description and URLs makes perfect sense.

However, putting all of these in a struct doesn't make sense. The spec says that the category structs are to be returned in a struct, but says nothing about the key names of this container struct. But the entire point of having a struct is to associate key names with values. A struct with no particular key names is meaningless. It's like writing a book where the index lists words, but not page numbers - if you can't pick the word (key) you want and go straight to the corresponding page (value), then the entire exercise is pointless.

Another shortcoming of the API is that it does not clearly specify a way to identify blog posts. For example, the API includes the metaWeblog.editPost and metaWeblog.getPost methods, both of which take a post ID. It also includes a metaWeblog.getRecentPosts method to get an array of the most recent posts for a blog. You would think that you could call getRecentPosts, let the user pick a post, edit it, and then call editPost to commit the changes to the server. But you can't.

That's because the API does not specify how to get the post ID. The metaWeblog.getPost and metaWeblog.getRecentPosts methods return a struct and an array of structs respectively, and the spec states that the members of these post structs are the members of RSS items. But there is no mention of where the post ID comes in. RSS certainly has no concept of a post ID, so it's not clear which member should be used for that purpose. Presumably, this is why the MovableType extensions to MetaWeblog include a postId field in the post structs.

Of course, RSS does provide a GUID (Globally Uniquie Identifier) field, which seems a natural fit for the post ID. The problem is that the RSS spec does not require that this field be present. It could also be argued that the GUID has a meaning distinct from a blog post ID. But either way, if the MetaWeblog spec meant that the GUID should be the post ID, then it should have said so.

Judging from the MetaWeblog spec, the only place clients can count on getting a postID is from the return value of metaWeblog.newPost. That's fine if the client can assume it is making all posts to a blog, but it is insufficient if there is also, say, a web interface. If your blogging client can only edit posts it created, you've just cut its usefulness in half.

Missing Links

The MetaWeblog API depends heavily on the Blogger 1.0 API. By itself, it is missing too much to be truly useful for the development of rich blogging clients. If nothing else, this is clear from the absence of something resembling blogger.getUsersBlogs.

Actually, that's not entirely fair. There was an RFC to add the Blogger methods to MetaWeblog, so the spec has been amended to correct this shortcoming. Or has it? I actually only learned about this by reading the code for the WordPress MetaWeblog implementation. The "official" MetaWeblog spec doesn't actually mention this or contain a link to the new RFC. That seems rather odd considering that the spec does contain notes regarding other updates. So has the spec been ammended, superceded, or was this just a "Request For Comment" that was never actually adopted?

Bottom Line for Implementers

So what does all this mean for those enterprising individuals who want to try their hand at writing blogging software? It means you've got an up-hill battle.

Don't get me wrong - it's not that implementing the various specifications is difficult. The APIs are actually pretty simple. The problem is that you can't trust them.

For starters, if you want to write a server that is compatible with existing rich blogging clients, you will have to implement the Blogger 1.0, MetaWeblog, and MovableType APIs, and you will have to do it all at the same URI. This isn't really a problem, so much as an inconvenience, as you can't simply work from a single specification at a time, but have to jump back and forth between three of them just to get a workable MetaWeblog implementation.

If you're writing a client, things are just as annoying. As previously mentioned, there's the post ID problem to deal with. Handling that is not difficult, but you have to rely on the good will of the server to send you a sensible post struct, since it is not required to.

If you want to support MovableType, there's also their brain-damaged category handling to deal with. Rather than using MetaWeblog categories, MT has separate mt.getPostCategories and mt.setPostCategories methods, which deal with category IDs rather than textual categories. Again, this is not hard to deal with, but it means you have to implement category handling twice - once for MT, and once for servers that use MW categories. But on the up side, at least MT gives you an explicit postId field.

Conclusion

All in all, the old blogging APIs suck. They're imprecise, lacking in features, and tend not to be so platform-agnostic. I think they can be best described as "just barely adequate."

I have yet to look at the Atom API. I'm hoping it will turn out to be better, but I'm not going to hold out a lot of hope. At the very least, I suppose it can't be any worse than the old APIs.

Disappearing knowledge

I saw an interesting article on Slashdot recently about the vanishing of online scientific journals.  The short version is that some people looked at online open-access academic journals and found that, over the last decade or so, a whole bunch of them have essentially disappeared.  Presumably the organizations running them either went out of business or just decided to discontinue them.  And nobody backed them up.

In case it's not already obvious, this is a bad thing.  Academic journals are supposed to be where we publish new advances in human knowledge and understanding.  Of course, not every journal article is a leap forward for human kind.  In fact, the majority of them are either tedious crap that nobody cares about, of questionable research quality, or otherwise not really that great.  And since we're talking about open-access journals, rather than top-tier ones like Nature, lower-quality work is probably over-represented in those journals.  So in reality, this is probably not a tragedy for the accumulated wisdom of mankind.  But still, there might have been some good stuff in there that was lost, so it's not good.

To me, this underscores just how transient our digital world is.  We talk about how nothing is ever really deleted from the internet, but that's not even remotely true.  Sure, things that go viral and are copied everywhere will live for a very long time, but an awful lot of content is really just published in one place.  If you're lucky, it might get backed up by the Internet Archive or Google's cache, but for the most part, if that publisher goes away, the content is just gone.

For some content, this is a real tragedy.  Fundamentally, content on the Internet isn't that different from offline content.  Whether it's published on a blog or in a printed magazine, a good article is still a good article.  A touching personal story is no more touching for being recorded on vinyl as opposed to existing as an MP3 file.  I know there's a lot of garbage on the web, but there's also a lot of stuff that has genuine value and meaning to people, and a lot of it is not the super-popular things that get copied everywhere.  It seems a shame for it to just vanish without a trace after a few short years.

I sometimes wonder what anthropologists 5000 years from now will find of our civilization.  We already know that good quality paper can last for centuries.  How long will our digital records last?  And if the media lasts 5000 years, what about the data it contains?  Will anthropologists actually be able to access it?  Or are they going to have to reverse-engineer our current filesystems, document, and media formats?  Maybe in 5000 years figuring out the MPEG-4 fomat from a binary blob on an optical disk will be child's play to the average social science major, who knows?  Or maybe the only thing they'll end up with the archival-quality print media from our libraries.  But then again, given what the social media landscape looks like, maybe that's just as well....

Adding bookmarklets to mobile chrome

Author's note: I started the draft of this article way back on July 1, 2013. Sadly, it's still pretty relevant.

So I'm built myself a nice little web-based bookmarking app. I wanted something that would both give me some insight into how I use my bookmarks and also save me from worrying about syncing bookmarks between multiple browsers on multiple devices. And since I've regained my distrust of "the Cloud" with the demise of Google Reader, I decided to write my own. (Note from the future: Yes, I know seven years ago, but I still don't really trust cloud services.) If you're interested, check out the GitHub page. Maybe one day I'll make a real, official release of it. I call in Lnto, with "ln" being the UNIX "link" command and a tie-in to "LnBlog", the software that runs this site. (For LnBlog, the tie-in was to "ln" for the natural logarithm, i.e. "natural bLOG". Get it? I swear I thought it was funny at the time.)

One must-have feature for such an app is a "bookmark this page" feature. With native browser bookmarks, this is built in. With a web app...not so much. So the solution is to either write an extension for every browser I want to support (which is a lot of work), or just write a bookmarklet - a little piece of JavaScript that you can bookmark and run with a single click. Since this is a personal project that I'm doing in my limited free time, the latter seemed like the obvious choice.

There's just one problem here - mobile browsers. In addition to my laptop and my desktop, I have an Android phone and a Kindle Fire that I want to support. And while the actual bookmarklet code works just fine on all of those devices, actually bookmarking it isn't quite so easy. Because they're mobile browsers, you can't just drag the link to the toolbar as you would on the desktop.

Until recently, Firefox Mobile handled this well. (Author's note: We're back to the current time now, not 2013.) It would allow you to bookmark a bookmarklet like a normal bookmark. You just had to press the link and select a "bookmark this link" item from the menu. Then you could just bring up the bookmark screen when you were on a page and it would run the bookmarklet. However, with the updates for Firefox Mobile 81, that doesn't work anymore - the javascript: URL scheme doesn't seem to get executed when you invoke the bookmark. And other browsers don't seem to support bookmarking the bookmarklet in the first place. This link suggests that it's possible using bookmark syncing, but I'm not sure if that still works and I don't really want to turn on syncing anyway.

What I eventually did was just create a page that I can paste a URL into and it will do the same thing as the bookmarklet. It's not great, but it's serviceable. At some point, maybe I'll get around to creating an Android app. Then I'll have some native integration options to work with.

What Is RSS and Why Should I Care?

Author's Note: This entry from my archives was written on March 18, 2007 and has been sitting in my drafts folder ever since.  Not sure why I didn't publish it at the time.  I think I was going to add more, but never got around to it.  At any rate, this was back then RSS feeds were a trendy, new-ish thing and this article was supposed to be a less technical discussion of what they are and why they're good.

These days, of course, RSS is passé, and when people refer to a "feed", it's usually coming from Facebook, Twitter, Instagram, or whatever vendor-locked service the kids are using this week.  I find this sad.  The idea of the open web was so promising, but not that much really came of it.  Instead of being spoon-fed our information and entertainment by big media companies via broadcast and print media, we're now spoon-fed our information and entertainment via the internet by big tech companies.  And this time, the content is not selected by gate-keeping editors, but by AI algorithms that are tuned to feed us whatever will keep us clicking, with little to no regard for whether it's true, useful, or even remotely good for us.

For the record, I still use RSS feeds all the time.  I use the Tiny Tiny RSS aggregator, which is quite nice, to read the various blogs and news that I'm interested in following.  I have accounts with a few of the big social media platforms, but I rarely ever read them and never post anything.  I find them to be a huge time-sink and not especially conducive to good mental health, and so better off avoided.  Of course, your mileage may vary, but just keep in mind that you don't need to look at these sites - if anything truly important happens, someone will tell you about it.  I mean, unless you're a shut-in with no friends or family.  In that case, maybe social media is a good thing for you.  

At any rate, these were my thoughts in 2007.  Perhaps they'll be interesting or enlightening.  Or perhaps entertaining in their naivete.  Enjoy!


If you frequent tech sites or weblogs, you've probably seen the RSS icon RSS feed icon or the XML feed icon XML feed icon.  You may also have seen other icons or text links referring to XML feeds, RSS, or even podcasts.  In fact, if you're using a web browser other than Internet Explorer, you may have seen one of these icons pop up in the address bar or status bar.  In this article, I will try to explain, in layman's terms, what RSS is and why it is so useful and important to the future of the internet.

What is RSS?

RSS stands for Really Simple Syndication.  As this name suggests, it is a syndication format.  

By "syndication," we mean essentially the same thing as when we talk about syndicated television shows.  A syndicated TV show is one that is shown on multiple independent channels at the same time, as opposed to being exclusive to a single network.  So for example, a syndicated show in the United States might be broadcast by NBC, Fox, the Sci-Fi channel, and USA all at the same time.

RSS works the same way.  An RSS file, usually referred to as a feed, contains a list of recent updates to a site.  The site operators publish this file on the web site and allow other people to subscribe to it, which is really just a fancy way of saying they automatically download it on a regular basis.  These people can then "republish" the information, either by incorporating it into their own sites or simply reading it into a desktop application.  The idea is that if the site's operators update the RSS feed every time they update the site, anyone who subscribes to it will automatically get the next "episode" the next time he downloads the file.

But what would I do with it?

If you are not already familiar with RSS, you may be wondering why anyone would bother with this.  After all, if you just want to read the updates of a site, isn't it just as easy to read the home page?  

At this point, you may be thinking that this doesn't sound much different from just visiting the web site in question.  After all, why would you want to bother with this RSS thing when you can just go to the site's home page like you've been doing for years?  

You wouldn't be wrong to think that.  If you're talking about just one site, with one set of updates to track, then RSS doesn't make any sense.  It would just be a different way of doing the same thing.

The beauty of RSS is that, unlike a web page, you can easily write a program to break up the information and organize it in a useful way.  For example, you can have a script on your web site that takes RSS news feeds from CNN, the BBC, and others and puts them all together in a news ticker on your home page.  You can also have programs such as RSS aggregators, which pull together news items from multiple sites and display them together so that you can browse them quickly and easily.  

I will discuss some other uses of RSS, including the trendiest of them all, Podcasting, later in this article.  (Note from the future: I never actually did that.)  But before that, we need to cover why RSS is useful and separate the fact from the hype.

A brief technical digression

What makes RSS so useful and so widely applicable is that it is a standard format.  It is an application of XML, the eXtensible Markup Language, which is an industry standard markup language for use with structured information.  I won't bore you with a description of XML, but the upshot of this is that RSS files all contain a certain set of standard information which is always marked with the same standard tags.  This means that a program can easily go through the file and pick out particular pieces of information, like the title of a particular news item, without having to pay any attention to what the title actually says or how it is formatted for display.  And because RSS is based on XML, there is already a wide array of programming tools that can be used to create and manipulate the files.

This is in stark contrast to web pages.  Although HTML, the markup language used to build web pages, has a standard set of tags, there is no standard for how a page is structured.  So while there are fixed ways of defining lists, tables, and paragraphs in HTML, there is no agreed upon way to say, "This is a news item, this is its title, and this is the link to its page in the archives."  (Note from the future: With the advent of HTML 5 this is no longer technically true.  However, semantic markup is not universally or consistently applied, so it's still close enough.)  So while a human being can easily look at a page full of items and determine where one ends and the next begins, there is no simple and general way for a computer to do that.  Because everyone is free to pick how they want to denote those things, a program would have to analyze the content of the page and figure out what the author of each page was thinking.  Needless to say, this kind of mind-reading is not something computers are particularly good at.

Getting past the hype

You know what the computing industry is best at producing?  It's not software, hardware, or anything else you can buy in the local office supplies store.  It's hype.  And to really appreciate how good RSS is, you have to get past the hype generated breathless pundits who seem to think it will cure cancer, feed the starving, and bring peace to the world.  (Note from the future: Nobody gives two hoots about RSS or XML anymore.  Hype has a very limited life span.)

From a technical standpoint, there is absolutely nothing revolutionary about RSS.  It's just a standard way of formatting a text file.  You could even create an RSS file in Windows Notepad if you really wanted to.  

And when you think about it, using RSS feeds is just a fancy name for putting a text file on your web site and then letting people download it and mess around with the information it contains.  How is that revolutionary?  We could do that 20 years ago.  

However, it is important to remember that "revolutionary" and "innovative" are not the same as "good."  RSS is good because it provides a standard way of transmitting certain kinds of information in a way that puts all the power in the hands of the consumer, not the publisher.  It's not the technology itself that's revolutionary, but rather the way people have chosen to apply it.

The real reason RSS and the whole "Web 2.0" thing are important is not because of the technology, but because of the paradigm.  That paradigm is: the user is king.  In an era where Hollywood and the music industry are trying to tell you when and where you're allowed to use the music and movies you paid for, open standards put that power in your hands.  (Note from the future: People have forgotten about that aspect, so now "Web 2.0" means "the site uses AJAX."  Also, substitute "Hollywood" for "big tech companies.")

Old people and legacy support

Lauren Weinstein had an interesting post a earlier this year discussing software developers' attitudes toward the elderly.  His main point is that developers tend not to think at all about the issues that older people have when working with computers. These include things like reluctance to or difficulty with learning new programs or ways of working; old hardware which they can't afford to upgrade; isolation and lack of access to help; and physical limitations, such as poor eyesight or reduced manual dexterity.

Of course, this is obviously not true of all developers (like Lauren, for example), but if we apply it to the general zeitgeist of the community, at least as you see it online, then there does seem to be something to this.  As a group, developers are very focused on "the coming thing", as Brisco County Jr. would say.  We all want to be ahead of the curve, working with the cool new technology that's going to take over the world.  We want to be on greenfield projects that are setting the standard for how to do things.  That's why otherwise intelligent programmers do or suggest crazy things like rewriting their conventional LAMP-based site in Go and ReactJS.  Of course, it's long been established that rewriting from scratch is almost always stupid and wasteful, but the fact is that while PHP might pay the bills, it isn't cool.

Of course, it isn't just because they want to be cool that developers like newer technologies.  There are plenty of other reasons.  Intellectual curiosity, for one.  Many of us got into this line of work because we enjoy learning new things, and there are always interesting new technologies coming out to learn.  Learning old things can be interesting as well, but there are a few problems with that:

  1. Older technologies are less marketable.  Learning new tech takes a lot of time and effort, and if the tech is already on the way out, the odds of seeing a return on that investment of time, whether financial or just in terms of re-using that knowledge, are significantly lower.
  2. Older tech involves more grunt work.  In other words, older programming technologies tend to work at a lower level.  Not always, but the trend is to increasing levels of abstraction.  That means that it will likely take more effort and/or code to do the same thing that you might get more or less for free with newer tech.
  3. The problems are less fun.  This particularly applies to things like "supporting Internet Explorer", which Lauren mentions.  When you have to support both the old stuff and the new stuff, you generally have lots of problems with platform-specific quirks, things that are supposed to be compatible but really aren't, and just generally trying to work around limitations of the older tech.  These are the kind of problems that can be difficult, but not in a good way.  They're less like "build a better mousetrap" and more like "find a needle in this haystack".

So, in general, developers aren't usually super enthusiastic about working with or supporting old tech.  It's not really as bad as some people make it sound, but it's not really where most of us want to be.

Another factor is the way websites are developed.  The ideal is that you'd have somebody who is trained and experienced in designing user experiences and who is capable of considering all the use-cases and evaluating the site based on them.  That person could communicate that information to the designers and developers, who could incorporate it into their work and produce sites that are easy to use, compatible with assistive technologies, degrade gracefully when using less capable hardware or software, etc.  The reality is that this rarely happens.  In my experience:

  1. Many teams (at least the ones I have experience with) have no UX designer.  If you're lucky, you'll have a graphic designer who has some knowledge or awareness of UX concerns.  More likely, it will be left up to the developers, who are typically not experts.  And if you're very unlucky, you'll have to work with a graphic designer who is fixated on pixel-perfect fidelity to the design and is completely indifferent to the user experience.
  2. Most developers are on the young side.  (There are plenty of older developers out there, but the field has been growing for years and the new people coming in are almost all young.)  They're also able-bodied, so they really have any conception of the physical challenges that older people can have.  And it's hard to design for a limitation that you didn't think of and don't really understand.
  3. While it's easy in principle, progressive enhancement and graceful degradation can be very tricky to actually pull off.  The main reason is that it's extremely easy to accidentally introduce a change that doesn't play well with some browser, doesn't display properly at some resolution, or what have you.
  4. And let's not forget testing.  Even if you can build a site with progressive enhancement, proper accessibility, and attention to the needs of less technical users with little support available, you still need to test it.  And the more considerations, use-cases, and supported configurations you have, the more your testing space expands.  That makes it much harder and more time-consuming to make sure that all these things are actually present and working as intended for all users.

So what am I trying to say here?  I do agree with Lauren that supporting elderly users, disabled users, etc. is an important thing.  It's a thing that, as an industry, we should do.  But it's hard.  And expensive (at least compared to the way most shops work now).  That's not an excuse for not doing it - more like an explanation.

Every shop needs to find a balance between supporting a diversity of users and doing what they need to do within a reasonable budget of time and money.  While it's important to think about security and support new standards, I think that in recent years the industry has probably been a little too quick to abandon old, but still widely used technologies.  If nothing else, we should at least think more about our target user base and whether we're actually serving them or ourselves by introducing $COOL_JS_FRAMEWORK or dropping support for Internet Explorer.  I'm sure that in many (but not all) cases, dropping the old stuff is the right choice, but that shouldn't be the default assumption.

Holy crap, Let's Encrypt is super easy!

Well, I just set up Let's Encrypt on my home server for the first time.  When I was finished, my first thought was, "Damn, that was awesome!  Why didn't I set that up a long time ago?"

Let's Encrypt logoIf you're not familiar with Let's Encrypt, it's a non-profit project of the Internet Security Research Group to provide website operators with free SSL certificates.  The idea is to make it easy for everyone to have SSL properly enabled for their website, as opposed to the old days when you had to either buy an SSL certificate or use a self-signed one that browsers would complain about.

I didn't really know much about Let's Encrypt until recently, other than then fact that they provide free SSL certs which are actually trusted by browsers.  And really, that was all I needed to know to be interested.  So I decided to try it out on my home server.  I was already using them on this website, that that was a slightly different situation: my web host integrated Let's Encrypt into their control panel, so all I had to do to set up a cert for one of my subdomains was click a button.  Super convenient, but not really any learning process there.

It turns out that setting up my home server to use the Let's Encrypt certs was pretty painless.  The recommended method is to use certbot, which is a tool developed by the EFF.  It basically automates the entire process of setting up the certificate.  Seriously - the entire process.  It's actually way easier to set up a Let's Encrypt cert with certbot than it is to make your own self-signed cert.  You just need to run a command, answer a couple of questions, and it will get the certs for each of your sites, install them, and keep them updated.  The only catch is that you need root shell access and your web server has to be accessible via port 80 (for verification purposes).

Compared to the old self-signed cert I was using, this is way easier.  You don't have to generate any keys, or create a CSR (Certifiate Signing Request), or edit your server config files.  Running certbot takes care of everything for you.  So if you haven't tried Let's Encrypt and you're running a site that could use some SSL, I definitely recommend it.

LnBlog Refactoring Step 3: Uploads and drafts

It's time for the third, slightly shorter, installment of my ongoing series on refactoring my blogging software.  In the first part, I discussed reworking how post publication was done and in the second part I talked about reworking things to add Webmention support.  This time, we're going to talk about two mini-projects to improve the UI for editing posts.

This improvement is, I'm slightly sad to say, pretty boring.  It basically involves fixing a "bug" that's really an artifact of some very old design choices.  These choices led to the existing implementation behaving in unexpected ways when the workflow changed.

The Problem

Originally LnBlog was pretty basic and written almost entirely in HTML and PHP, i.e. there was no JavaScript to speak of.  You wrote posts either in raw HTML in a text area box, using "auto-markup", which just automatically linkified things, or using "LBCode", which is my own bastardized version of the BBCode markup that used to be popular on web forums.  I had implemented some plugins to support WYSIWYG post editors, but I didn't really use them and they didn't get much love.

The old LnBlog post editor

Well, I eventually got tired of writing in LBCode and switched to composing all my posts using the TinyMCE plugin.  That is now the standard way to compose your posts in LnBlog.  The problem is that the existing workflow wasn't really designed for WYSIWYG composition.

In the old model, the idea was that you could compose your entire post on the entry editing page, hit "publish", and it would all be submitted to the server in one go.  There's also a "review" button which renders your post as it would appear when published and a "save draft" button to save your work for later.  These also assume that submitting the post is an all-or-nothing operation.  So if you got part way done with your post and decided you didn't like it, you could just leave the page and nothing would be saved to the server.

At this point it is also worth noting how LnBlog stores its data.  Everything is file-based and entries are self-contained.  That means that each entry has a directory and that directory contains all the post data, comments, and uploaded files that are belong to that entry.

What's the problem with this?  Well, to have meaningful WYSIWYG editing, you need to be able to do things like upload a file and then be able to see it in the post editor.  In the old workflow, you'd have to write your post, insert an image tag with the file name of your picture (which would not render), add your picture as an upload, save the entry (either by saving the draft or using the "preview", which would have trigger a save if you had uploads), and then go back to editing your post.  This was an unacceptably workflow clunky.

On top of this, there was a further problem.  Even after you previewed your post, it still wouldn't render correctly in the WYSIWYG editor.  That's because the relative URLs were inconsistent.  The uploaded files got stored in a special, segregated draft directory, but the post editor page itself was not relative to that directory, so TinyMCE didn't have the right path to render it.  And you can't use an absolute URL because the URL will change after the post is published.

So there were two semi-related tasks to fix this.  The first was to introduce a better upload mechanism.  The old one was just a regular <input type="file"> box, which worked but wasn't especially user-friendly.  The second one was to fix things such that TinyMCE could consistently render the correct URL for any files we uploaded.

The solution - Design

The actual solution to this problem was not so much in the code as it was in changing the design.  The first part was simple: fix the clunky old upload process by introducing a more modern JavaScript widget to do the uploads.  So after looking at some alternatives, I decided to implement Dropzone.js as the standard upload mechanism.

The new, more modern LnBlog post editor.

The second part involved changing the workflow for writing and publishing posts.  The result was a somewhat simpler and more consistent workflow that reduces the number of branches in the code.  In the old workflow, you had the following possible cases when submitting a post to the server:

  1. New post being published (nothing saved yet).
  2. New post being saved as a draft (nothing saved yet).
  3. Existing draft post being published.
  4. Existing draft post being saved.
  5. New (not yet saved) post being previewed with attached files.
  6. Existing draft post being previewed with attached files.

This is kind of a lot of cases.  Too many, in fact.  Publishing and saving were slightly different depending on whether or not the entry already existed, and then there were the preview cases.  These were necessary because extra processing was required when an entry was previewed with new attachments because, well, if you attached an image, you'd want to see it.  So this complexity was a minor problem in and of itself.

So the solution was to change the workflow such that all of these are no longer special cases.  I did this by simply issuing the decree that all draft entries shall always already exist.  In other words, just create a new draft when we first open the new post editor.  This does two things for us:

  1. It allows us to solve the "relative URL" problem because now we can make the draft editing URL always relative to the draft storage directory.
  2. It eliminates some of those special cases.  If the draft always exists, then "publish new post" and "publish existing draft" are effectively the same operation.  When combined with the modern upload widget, this also eliminates the need for the special "preview" cases.

The implementation - Results

I won't get into the actual implementation details of these tasks because, frankly, they're not very interesting.  There aren't any good lessons or generalizations to take from the code - it's mostly just adapting the ideosyncratic stuff that was already there.

The implementation was also small and went fairly smoothly.  The upload widget was actually the hard part - there were a bunch of minor issues in the process of integrating that.  There were some issues with the other part as well, but less serious.  Much of it was just integration issues that weren't necessarily expected and would have been hard to foresee.  You know, the kind of thing you expect from legacy code.  Here's some stats from Process Dashboard:

Project File Upload Draft always exists
Hours to complete (planned): 4:13 3:00
Hours to complete (actual): 7:49 5:23
LOC changed/added (planned): 210 135
LOC changed/added (actual): 141 182
Defects/KLOC (found in test): 42.6 27.5
Defects/KLOC (total): 81.5 44.0

As you can see, my estimates here were not great.  The upload part involved more trial and error with Dropzone.js than I had expected and ended up with more bugs.  The draft workflow change went better, but I ended up spending more time on the design than I initially anticipated.  However, these tasks both had a lot of unknowns, so I didn't really expect the estimates to be that accurate.

Take Away

The interesting thing about this project was not so much what needed to be done but why it needed to be done. 

Editing posts is obvious a fundamental function of a blog, and it's one that I originally wrote way back in 2005.  It's worth remembering that the web was a very different place back then.  Internet Explorer was still the leading web browser; PHP 5 was still brand new; it wasn't yet considered "safe" to just use JavaScript for everything (because, hey, people might not have JavaScript enabled); internet speeds were still pretty slow; and browsing on mobile devices was just starting to become feasible.  In that world, a lot of the design decisions I made at the time seemed pretty reasonable.

But, of course, the web evolved.  The modern web makes it much easier for the file upload workflow to be asynchronous, which offers a much nicer user experience.  By ditching some of the biases and assumptions of the old post editor, I was more easily able to update the interface.

One of the interesting things to note here is that changing the post editing workflow was easier than the alternatives.  Keeping the old workflow was by no means impossible.  I kicked around several ideas that didn't involve changing it.  However, most of those had other limitations or complications and I eventually decided that they would ultimately be more work.  

This is something that comes up with some regularity when working with an older code-base.  It often happens that the assumptions baked into the architecture don't age well as the world around the application progresses.  Thus, when you need to finally "fix" that aspect of the app, you end up having to do a bit of cost-benefit analysis.  Is it better to re-vamp this part of the application?  Or should you shim in the new features in a kinda-hacky-but-it-works sort of way?

While as developers, our first instinct is usually to do the "real" fix and replace the old thing, the "correct" answer is seldom so straight-forward.  In this case, the "real" fix was relatively small and straight-forward.  But in other cases, the old assumptions are smeared through the entire application and trying to remove them becomes a nightmare.  It might take weeks or months to make a relatively simple change, and then weeks or months after that to deal with all the unforeseen fallout of that change.  Is that worth the effort?  It probably depends on what the "real" fix buys you.

I had a project at work once that was a great example of that.  On the surface, the request was a simple "I want to be able to update this field", where the field in question was data that was generally but not necessarily static. In most systems, this would be as simple as adding a UI to edit that field and having it update the datastore.  But in this case, that field was used internally as the unique identifier and was used that way across a number of different systems.  So this assumption was everywhere.  Everybody knew this was a terrible design, but it had been that way for a decade and was such a huge pain to fix that we had been putting it off for years.  When we finally bit the bullet and did it right, unraveling the baked-in assumptions about this piece of data took an entire team over a month.  At an extremely conservative estimate, that's well over $25,000 to fix "make this field updatable".  That's a pretty hefty price tag for something that seems so trivial.

The point is, old applications tend to have lots of weird, esoteric design decisions and implementation-specific issues that constrain them.  Sometimes removing these constraints is simple and straight-forward.  Sometimes it's not.  And without full context, it's often hard to tell when one it will be.  So whenever possible, try to have pity on the future maintenance programmer who will be working on your system and anticipate those kind of issues.  After all, that programmer might be you.

Spam filters suck

Author's note: Here's another little rant that's been sitting in my drafts folder for years. Twelve years, to be precise - I created this on March 28, 2007. That was toward the end of my "government IT guy" days.

I'd forgotten how much of a pain the internet filtering was. These days, I hardly think about it. The government job was the last time I worked anyplace that even tried to filter the web. And e-mail filtering hasn't been something I've worried about in a long time either. These days, the filtering is more likely to be too lax than anything else. And if something does get incorrectly filtered, you generally just go to your junk mail folder to find it. No need for the rigamarole of going back and forth with the IT people. It's nice to know that at least some things get better.

I'm really starting to hate spam filters. Specifically, our spam filters at work. And our web filters. In fact, pretty much all the filters we have here. Even the water filters suck. (Actually, I don't think there are any water filters, which, if you'd tasted the municipal water, you would agree is a problem.)

I asked a vendor to send me a quote last week. I didn't get it, so I called and asked him to send it again. I checked with one of our network people, and she tells me it apparently didn't get through our first level of filters. So she white-listed the sender's domain and I asked the guy to send it again. It still didn't get through.

As I've mentioned before, our web filters also block Wikipedia.

At least it's a resume builder

Note: This is a short entry that's been sitting in my drafts folder since March 2010, i.e. from half a career ago. My "new" job at the time was with an online advertising startup. It was my first and only early-stage startup experience. In retrospect, it was useful because it exposed me to a lot of new thing, in terms of not only technology, but people, processes, and ways of approaching software development (looking at things from the QA perspective was particularly eye-opening). It was not, however, enjoyable. I've also worked for later-stage startups and found that much more enjoyable. Sure, you don't get nearly as much equity when you come in later, but there's also less craziness. (And let's face it, most of the time the stock options never end up being worth anything anyway.)

Wow. I haven't posted anything in almost six months. I'm slacking. (Note: If only I'd known then that I wouldn't publish this for nine years...)

Actually, I've been kind of busy with work. I will have been at the "new" job for a year next month. The first six months I was doing the QA work, which was actually kind of interesting, as I'd never done that before. I did some functional test automation, got pretty familiar with Selenium and PHPUnit, got some exposure to an actual organized development process. Not bad, overall.

On the down side, the last six months have been a bit more of a cluster file system check, if you get my meaning. Lots of overtime, throwing out half our existing code-base, etc. On the up side, I've officially moved over to development and we're using Flash and FLEX for our new product, which are new to me.

The good part: FLEX is actually not a bad framework. It's got its quirks, but it's pretty powerful and, if nothing else, it beats the pants off of developing UIs in HTML and JavaScript. And while it's not my favorite language in the world, ActionScript 3 isn't bad either. It's strongly typed, object oriented, and generally fairly clean.

The bad part: Flash is not so nice. It wasn't quite what I was expecting. I guess I assumed that "Flash" was just design environment for ActionScript programming. Actually, it's more of an animation package that happens to have a programming language bolted onto it. The worst part is that our product requires that we do the Flash portion in ActionScript 2, which seriously sucks. I mean, I feel like I'm back in 1989. And the code editor in Flash CS4 is...extremely minimal. As in slightly less crappy than Windows Notepad. I am sersiously not enjoying the Flash part.

(Note: On the up side, none of this matters anymore because Flash is now officially dead.)

Running my own calendar server

Note: This is an article I started in October of 2012 and never finished. Fortunately, my feelings on the issue haven't changed significantly. So I filled it out into a real entry. Enjoy!

As I alluded to in a (not so) recent entry on switching RSS readers, I'm anti-cloud.

Of course, that's a little ambiguous. The fact is, "cloud" doesn't really mean anything anymore. It's pretty much come to refer to "doing stuff on somebody else's server." So these days we refer to "having your e-mail in the cloud" rather than "using a third-party webmail service," like we did 15 years ago. But really it's exactly the same thing - despite all the bells and whistles, GMail is not fundamentally different than the Lycos webmail account I used in 1999. It still amounts to relying entirely on some third-party's services for your e-mail needs.

And if the truth were known, I'm not even really against "the cloud" per se. I have no real objection to, say, hosting a site on an Amazon EC2 or Windows Azure instance that I'm paying for. It's really just the "public cloud." You know, all those "cloud services" that companies offer for free - things like GMail and Google Calendar spring to mind.

And it's not even that I object to using these servies. It's just that I don't want to rely on them for anything I deem at all important. This is mostly because of the often-overlooked fact that users have no control over these services. The providers can literally cut you off at a moment's notice and there's not a thing you can do about it. With a paid service, you at least have some leverage - maybe not much, but they generally at least owe you some warning.

There are, of course, innumerable examples of this. The most recent one for me is Amazon Music. They used to[i] offer a hosting service where you could upload your personal MP3 files to the Amazon cloud and listen to them through their service. I kinda liked Amazon Music, so I was considering doing that. Then they terminated that service. So now I use Plex and/or Subsonic to listen to my music straight from my [i]own server, than you very much!

As a result, I have my own implementation of a lot of stuff. This includes running my own calendar server. This is a project that has had a few incarnations, but that I've always felt was important for me. Your calendar is a window into your every-day life, a record of every important event you have. Do you really want to trust it to some third party? Especially one that basically makes its money by creating a detailed profile of everything you do so that they can better serve ads to you? (I think we all know who I'm thinking of here....)

For several years I used a simple roll-your-own CalDAV server using SabreDAV. That worked fine, but it was quite simple and I needed a second application to provide a web-based calendar (basically a web-based CalDAV client). So I decided to switch to something a little more full-featured and easier to manage.

So these days, I just run my own OwnCloud instance. At it's core, OwnCloud is basically a WebDAV server with a nice UI on top of it. In addition to nice file sync-and-share support, it gives me web-based calendar and contact apps with support for CalDAV and CardDAV respectively. It also has the ability to install additional apps to provide more features, such as an image gallery, music players, and note-taking apps. Most of the more impressive apps are for the enterprise version only, or require third-party services or additional servers, but all I really wanted was calendar and contact support.

To get the full experience, I also use the OwnCloud apps on my laptop and phone to sync important personal files, as well as the DAVx5 app on my phone to synchronize the Android calendar and contacts database with my server. Overall, it works pretty well and doesn't really require much maintenance. And most important, I don't have to depend on Google or Amazon for a service that might get canned tomorrow.

LnBlog Refactoring Step 2: Adding Webmention Support

About a year and a half ago, I wrote an entry about the first step in my refactoring of LnBlog.  Well, that's still a thing that I work on from time to time, so I thought I might as well write a post on the latest round of changes.  As you've probably figured out, progress on this particular project is, of necessity, slow and extremely irregular, but that's an interesting challenge in and of itself.

Feature Addition: Webmention

For this second step, I didn't so much refactor as add a feature.  This particular feature has been on my list for a while and I figured it was finally time to implement it.  That feature is webmention support.  This is the newer generation of blog notification, similar to Trackback (which I don't think anyone uses anymore) and Pingback.  So, basically, it's just a way of notifying another blog that you linked to them and vice versa.  LnBlog already supported the two older versions, so I thought it made sense to add the new one.

One of the nice things about Webmention is that it actually has a formal specification that's published as a W3C recommendation.  So unlike some of the older "standards" that were around when I first implemented LnBlog, this one is actually official, well structured, and well thought out.  So that makes things slightly easier.

Unlike the last post, I didn't follow any formal process or do any time tracking for this addition.  In retrospect I kind of wish I had, but this work was very in and out in terms of consistency and I didn't think about tracking until it was too late to matter.  Nevertheless, I'll try to break down some of my process and results.

Step 1: Analysis

The first step, naturally, was analyzing the work to be done, i.e. reading the spec.  The webmention protocol isn't particularly complicated, but like all specification documents it looks much more so when you put all the edge cases and optional portions together.  

I actually looked at the spec several times before deciding to actually implement it.  Since my time for this project is limited and only available sporadically, I was a little intimidated by the unexpected length of the spec.  When you have maybe an hour a day to work on a piece of code, it's difficult to get into the any kind of flow state, so large changes that require extended concentration are pretty much off the table.

So how do we address this?  How do you build something when you don't have enough time to get the whole thing in your head at once?

Step 2: Design

Answer: you document it.  You figure out a piece and write down what you figured out.  Then the next time you're able to work on it, you can read that and pick up where you left off.  Some people call this "design".

I ended up reading through the spec over several days and eventually putting together UML diagrams to help me understand the flow.  There were two flows, sending and receiving, so I made one diagram for each, which spelled out the various validations and error conditions that were described in the spec.

Workflow for sending webmentions Workflow for receiving webmentions

That was really all I needed as far as design for implementing the webmention protocol.  It's pretty straight-forward and I made the diagrams detailed enough that I could work directly from them.  The only real consideration left was where to fit the webmention implementation into the code.

My initial thought was to model a webmention as a new class, i.e. to have a Webmention class to complement the currently existing TrackBack and Pingback classes.  In fact, this seemed like the obvious implementation given the code I was working with.  However, when I started to look at it, it became clear that the only real difference between Pingbacks and Webmentions is the communication protocol.  It's the same data and roughly the same workflow and use-case.  It's just that Pingback goes over XML-RPC and Webmention uses plain-old HTTP form posting.  It didn't really make sense to have a different object class for what is essentially the same thing, so I ended up just re-using the existing Pingback class and just adding a "webmention" flag for reference.

Step 3: Implementation

One of the nice things about having a clear spec is that it makes it really easy to do test-driven development because the spec practically writes half your test cases for you.  Of course, there are always additional things to consider and test for, but it still makes things simpler.

The big challenge was really how to fit webmentions into the existing application structure.  As I mentioned above, I'd already reached the conclusion that creating a new domain object for the was a waste of time.  But what about the rest of it?  Where should the logic for sending them go?  Or receiving?  And how should sending webmentions play with sending pingbacks?

The first point of reference was the pingback implementation.  The old pingback implementation for sending pingbacks lived directly in the domain classes.  So a blog entry would scan itself for links, create a pingback object for each, and then ask the pingback if its URI supported pingbacks, and then the entry would sent the pingback request.  (Yes, this is confusing.  No, I don't remember why I wrote it that way.)  As for receiving pingbacks, that lived entirely in the XML-RPC endpoint.  Obviously none of this was a good example to imitate.

The most obvious solution here was to encapsulate this stuff in its own class, so I created a SocialWebClient class to do that.  Since pingback and webmention are so similar, it made sense to just have one class to handle both of them.  After all, the only real difference in sending them was the message protocol.  The SocialWebClient has a single method, sendReplies(), which takes an entry, scans its links and for each detects if the URI supports pingback or webmention and sends the appropriate one (or a webmention if it supports both).  Similarly, I created a SocialWebServer class for receiving webmentions with an addWebmention() method that is called by an endpoint to save incoming mentions.  I had originally hoped to roll the pingback implementation into that as well, but it was slightly inconvenient with the use of XML-RPC, so I ended up pushing that off until later.

Results

As I mentioned, I didn't track the amount of time I spent on this task.  However, I can retroactively calculate how much code was involved.  Here's the lines-of-code summary as reported by Process Dashboard:

Base:  8057
Deleted:  216
Modified:  60
Added:  890
Added & Modified:  950
Total:  8731

For those who aren't familiar, the "base" value is the lines of code in the affected files before the changes, while the "total" it the total number of lines in affected files after the changes.  The magic number here is "Added & Modified", which is essentially the "new" code.  So all in all, I wrote about a thousand lines for a net increase 700 lines.

Most of this was in the new files, as reported by Process Dashboard below.  I'll spare you the 31 files that contained assorted lesser changes (many related to fixing unrelated issues) since none of them had more even 100 lines changed.  

Files Added: Total
lib\EntryMapper.class.php 27
lib\HttpResponse.class.php 60
lib\SocialWebClient.class.php 237
lib\SocialWebServer.class.php 75
tests\unit\publisher\SocialWebNotificationTest.php 184
tests\unit\SocialWebServerTest.php 131

It's helpful to note that of the 717 lines added here, slightly less than half (315 lines) is unit test code.  Since I was trying to do test-driven development, this is to be expected - the rule of thumb is "write at least as much test code as production code".  That leaves the meat of the implementation at around 400 lines.  And of those 400 lines, most of it is actually refactoring.

As I noted above, the Pingback and Webmention protocols are quite similar, differing mostly in the transport protocol.  The algorithms for sending and receiving them are practically identical.  So most of that work was in generalizing the existing implementation to work for both Pingback and Webmention.  This meant pulling things out into new classes and adjusting them to be easily testable.  Not exciting stuff, but more work than you might think.

So the main take-away from this project was: don't underestimate how hard it can be to work with legacy code.  Once I figured out that the implementation of Webmention would closely mirror what I already had for Pingback, this task should have been really short and simple.  But 700 lines isn't really that short or simple.  Bringing old code up to snuff can take a surprising amount of effort.  But if you've worked on a large, brown-field code-base, you probably already know that.

Security, passwords, and end users

Note: I started this entry two years ago and it's been sitting in my drafts folder ever since.  However, while the links might not be news anymore, the underlying issue is the same.  So I cleaned it up for another From The Archives entry.

A while back, there was a story going around about how the guy who invented the password strength rules that you see all over the web now regrets it.  That made me think about how we approach these kinds of issues and the advice we give to non-technical users.

Security is one of those areas of computing where there are a lot of cargo cults.  Relatively few people, even among IT professionals, seem to have a good handle on how to secure their systems.  So they rely on guidelines like these from the "experts", often following them blindly without any real understanding of the rationale.

And you can't really blame them - security is hard.  Even knowing what you need to defend against can be a tall order.  And with some of the biggest companies in the world being compromised left and right (for example, the Equifax hack, which should scare the heck out of you if it doesn't already), it's clear that this is not a resource problem that you can just buy your way out of.  Not even big tech companies are immune, so what chance does the average user have?

Well, unfortunately, for things like the Equifax breach, the average user doesn't have much to say about it.  Once a third-party has your personal information, you really have no choice but to rely on them to secure it.  And if they don't do a good job, well...you're sorta just out of luck.  I mean, you can always sue them, but let's be realistic: for private individuals, the amount of time and money required to do that is prohibitive.  It's cheaper and less painful to just absorb the loss and get on with your life.

Passwords are a different story, though.  Those are one of the few pieces of security that are (mostly) under the control of the user.  So we as professionals can offer some guidance there.  And if the top passwords revealed from various database breaches are any indication, we should offer some.

These days, there's really only one piece of advice that matters: get a password manager and use it for everything.  I like KeePass, but 1Password, LastPass, and a number of other good programs are available.  These days we all have more website logins than we can realistically remember, so it's impossible to follow the old advice of using strong passwords AND not reusing them AND not writing them down.  By using a password manager, we compromise on the "not writing it down" part and write down our passwords securely so that we can keep them strong and unique without making our lives difficult.

Of course, there will always be a few passwords that you just need to remember.  For instance, the master password for your password manager.  For these, the standard advice is to use long passwords containing number, letters, and special characters.  Probably the easiest way to do this and still keep the password memorable is to use a passphrase.  So rather than one word, use a short phrase containing several words and insert some punctuation or other special characters.  For example, the password Bob has _17_ "Cats"! isn't that hard to remember, but it's 20 characters long and  contains letters, numbers, capital and lower-case letters, punctuation, and spaces.  Yeah, it's harder to type and remember than "12345", but it's way easier than something like "UD09BhbjH7" and it fulfills the complexity requirements.

For more important accounts, you can also do things like enabling two-factor authentication, which adds another layer of security.  Typically this involves sending a code to your phone via text message or an app like Google Authenticator and entering that when you log in.  Even this isn't fool-proof (see SIM swapping scams), but it's one more hoop that someone trying to access your account has to jump through.

So forget the annoying rules about changing passwords every month and things like that.  Pick a handful of good passwords for the stuff you really need to type out and just use a password manager for everything else.  There's no reason to remember a bajillion obscure bits of information if you don't need to.  After all, that's why we have computers in the first place.

LnBlog Refactoring Step 1: Publishing

The first part of my LnBlog refactoring is kind of a large one: changing the publication logic.  I'll start by giving an overview of the old design, discussing the new design, and then looking at some of the actual project data.

History

LnBlog is a very old system.  I started it back in 2005 and did most of the work on it over the next two years.  It was very much an educational project - it was my first PHP project (in PHP 4, no less) and only my third web-based project.  So I really didn't know what I was doing.

As you might expect, the original design was very simple.  In the first version, "publishing an entry" really just meant creating a directory and writing a few files to the disk.  As the system grew more features, more and more steps were added to that process.  The result was a tangle of code where all of the logic lived in the "controller" (originally it was a server-page) with some of the actual persistence logic encapsulated in the domain objects.  So, roughly speaking, the BlogEntry object knew how to write it's metadata file to the appropriate location, but it didn't know how to properly handle uploaded files, notifications, or really anything else.  

Originally, LnBlog used a server-page model, with each URL being a separate file and code being shared by including common configuration files and function libraries all over the place.  Shortly prior to starting this project, I had consolidated the logic from all those pages into a single, massive WebPages class, with one public method for each endpoint - a monolithic controller class, for all intents and purposes.  This still isn't great, but it gave me a common entry point to funnel requests through, which means I can better control common setup code, handle routing more easily, and generally not have the pain of dealing with a dozen or more endpoints.

Anyway, this WebPages class served up and edited entries by directly manipulating domain objects such as BlogEntry, Blog, BlogComment, etc.  The domain objects knew how to save themselves to disk, and a few other things, but the majority of the logic was right in the WebPages class.  This worked fine for the main use-case of creating an entry from the "edit entry" page, but it made things very awkward for the less-used locations, such as the Metaweblog API endpoint or scheduled publication of drafts.

Furthermore, the publication logic in the WebPages class was very hairy.  All types of publication flowed through a single endpoint and used a common code path.  So there were flags and conditions all over the place, and it was very difficult to tell what needed to be updated when making changes.  Bugs were rampant and since there was no test automation to speak of, testing any changes was extremely laborious and error prone.

The New Design

There were two main goals for this refactoring:

  1. Create a new Publisher class to encapsulate the publication logic.  The idea was to have a single entity that is responsible for managing the publication state of blog entries.  Given a BlogEntry object, it would know how to publish it as a regular entry or a static article, unpublish it, handle updating or deleting it, etc.  This would give us a single entity that could own all the steps involved in publishing.
  2. Create unit tests around the publication process.  The logic around the entire process was more complicated than you'd think and the old code was poorly structured, so it broke with disturbing regularity.  Hence I wanted some automated checking to reduce the risk of bugs.

So the design is actually pretty straight-forward: create a "publisher" class, give it operations for each of the things we do with blog entries (create, update, delete, etc.), copy the existing logic for those cases into the corresponding methods, update the endpoints to use this new class, and call it a day.  

So it was mostly just a reorganization - there wasn't any significant new logic that needed to be written.  Simple, right?  What could go wrong?

Results

While I was happy with the result, this project turned out to be a much larger undertaking than I'd assumed.  I knew it was going to be a relatively large task, but I was off by a factor of more than two.  Below is a table summarizing the project statistics and comparing them to the original estimates (from PSP data I captured using Process Dashboard). 

Planned and actual project statistics
  Planed Actual
Total Hours  29:01 64:00
LOC added and modified  946  3138
LOC/Hour  32.6 49.0
Total Defects  82.1  69.0
Defects/KLOC 86.8 21.7

When I originally did the conceptual design and estimate, I had planned for relatively modest changes to the Article, BlogEntry, and WebPages classes and the creators.php file.  I also planned for new Publisher and BlogUpdater classes, as well as associated tests and some tests for the WebPages class.  This came to 29 hours and 946 new or changed lines of code across nine source files.  Definitely a big job when you consider I'm working in increments of two hours or less per day, whenever I get around to it.

In actuality, the changes were much larger in scope.  I ended up changing 27 additional files I hadn't considered, and ended up creating two other new utility classes (although I did ditch the BlogUpdater class - I no longer even remember what it was supposed to do).  The resulting 3138 lines of code took me 64 hours spread over five months.

Effects of Testing

I did test-driven development when working on this project.  I've found TDD to be useful in the past and it was very helpful to me here.  It was also very effective in meeting my second goal of building a test suite around the publication logic.  PHPUnit reports statement coverage for the Publisher tests at 97.52% and close to 100% coverage for the tested methods in the WebPages class (I only wrote tests for the endpoint that handles creating and publishing entries).

More importantly, using TDD also helped me to untangle the logic of the publication process.  And it turns out there was actually a lot to it.  I ended up generating about 2000 lines of test code over the course of this project.  It turns out that the design and unit test phase occupied 65% of the total project time - about 41 hours.  Having a comprehensive test suite was immensely helpful when I was rebuilding the publication logic across weeks and months.  It allowed me to have an easy check on my changes without having to load all of the code I worked on three weeks ago back into my brain.

Sadly, the code was not such that I could easily write tests against the existing code.  In fact, many of the additional changes came from having to break dependencies in the existing code to make it possible to unit test.  Luckily, most of them were not hard to break, e.g. using an existing file system abstraction layer, but the work still adds up.  It would have been very nice to have an existing test suite to prevent regressions in the rewrite.  Unfortunately, even integration tests would have been awkward, and even if I could have written them, it would have been very hard to get the level f coverage I'd need to be confident in the refactor.

Conclusion

In terms of the results, this project worked out pretty well.  It didn't really go according to plan, but I got what I was looking for in the end - a better publication design and a good test suite.  However, it was a long, slow slog.  Maybe that was too big a slice of work to do all at once.  Perhaps a more iterative approach could have kept things moving at a reasonable pace.  I'll have to try that on the next big project.

LnBlog: Blogging the redesign

Today, we're going to talk a little about design and refactoring.  As a case-study, we're going to use a little blogging application called LnBlog.  You probably haven't heard of it - it's not very popular.  However, you have used it, at least peripherally, because it's running this site.  And you also have at least a passing familiarity with the author of that application, because it's me. 

Motivation

Software is an "interesting" field.  The cool new technologies, frameworks, and languages get all the press and they're what everybody wants to work with.  But let's be honest: it's generally not what makes the money.  I mean, how could it be?  It just came out last week!

No, if you have the good fortune to work on a grown-up, profitable product, it's almost certainly going to be the "old and busted" tech.  It might not be COBOL or Fortran, but it's almost certainly "legacy code".  It might be C++ or Java or, in our case, PHP, but it's probably old, poorly organized, lacking unit tests, and generally hard to work with.

I work on such a product for my day job.  It's a 10-year-old PHP codebase, written in an old-fashioned procedural style.  There are no unit tests for the old code, and you couldn't really write them even if you wanted to.  Sure, there's a newer part with proper design, tests, etc., but the old code is the kind of stuff that "works for the most part", but everybody is afraid to touch it because it's so brittle and tightly coupled that God alone knows what will break when you make a change.

This also applies to LnBlog.  It was my very first PHP application.  I started it way back in early 2005, in the bad old days of PHP 4.  Over the next two or three years, I managed to turn it into something that was relatively functional and full-featured.  And for the last ten years or so, I've managed to keep it working.

Of course, it hasn't gotten a whole lot of love in that time.  I've been busy and, for the most part, it worked and was "good enough".  However, I occasionally need to fix bugs or want to add features, and doing that is a truly painful process.  So I would very much like to alleviate that pain.

The Issue

Let me be honest: I didn't really know what I was doing when I wrote LnBlog.  I was about four years out of school and had only been coding for about six or seven years total.  And I was working mostly in Visual Basic 6 at the time, which just barely counts.  It was also only my third web-based project, and the first two were written in classic ASP and VBScript, which also just barely counts.

As a result, it contains a lot of questionable design decisions and overly-complicated algorithms.  The code is largely procedural, kind of buggy, and makes poor use of abstraction.  So, in short, it's not great.

But, in fairness to myself, I've seen worse.  In fact, I've seen a lot worse.  It does have a class hierarchy for the domain objects (though it's a naive design), an abstraction layer for data access (though it's inconsistently used), and a templating system for separating markup from domain logic (though the templates are an ungodly mess).  And it's not like I had a role model or mentor to guide me through this - I was figuring out what worked on my own.  So while it's not great, I think it's actually surprisingly good given the circumstances under which it was built.

The Goal - Make the Code "Good"

So I want to make LnBlog better.  I've thought about rewriting it, but decided that I wouldn't be doing myself any favors by going that route.  I also hold no illusions of a grand re-architecture that will fix all the problems and be a shining beacon of design perfection.  Rather, I have a relatively modest list of new features and bug fixes, and I just want to make the code good enough that I can make changes easily when I need to and be reasonably confident that I'm not breaking things.  In other words, I want to do a true refactoring.

If you haven't read Martin Fowler's book, the word "refactoring" is not a synonym for "changing code" or "rewriting code".  Rather, it has a very specific meaning: improving the internal design of code without changing the external behavior.  In other words, all you do is make the code easier to work with - you don't change what it does in any way.  This is why people like Bob Martin tell you that "refactor X" should never be an item in your Scrum backlog.  It is purely a design and "code cleanliness" activity, not a "feature" you can deliver.

So my goal with LnBlog is to gradually reshape it into what it should have been in the first place.  This is partially to make changing it easier in the future.  But more importantly, it's a professional development goal, an exercise in code craftsmanship.  As I mentioned above, I've worked professionally on many systems that are even more messed up than LnBlog.  So this is a study in how to approach refactoring a system.  

And So It Begins...

My intention is to write a number of articles describing this process.  I've already completed the first step, which is rewriting some of the persistence and publication logic.  I'm using the PSP to track my planned and actual performance, so I'll have some actual data to use in my discussion of that process.  Hint: so far, the two are very different.

With any luck, this project will enable me to draw some useful or interesting conclusions about good and bad ways to approach reworking legacy systems.  Maybe it will also enlighten some other people along the way.  And if nothing else, I should at least get a better codebase out of it.

Access to local storage denied

I ran into an interesting issue today while testing out an ownCloud installation on a new company laptop running Windows 10.  When trying to open the site in IE11, JavaScript would just die.  And I mean die hard.  Basically nothing on the page worked at all.  Yet it was perfectly fine in Edge, Firefox, and Chrome.

The problem was an "Access denied" message on a seemingly innocuous line of code.  It was a test for local storage support.  The exact line was:

if (typeof localStorage !== "undefined" && localStorage !== null) {

Not much that could go wrong with that line, right?  Wrong!  When I double-checked in the developer console and it turns out that typeof localStorage was returning "unknown".  So the first part of that condition was actually true.  And attempting to actually use localStorage in any way resulted in an "Access denied" error.

A little Googling turned up this post on StackOverflow.  It turns out this can be caused by an obscure error in file security on your user profile.  Who knew?  The problem was easily fixed by opening up cmd.exe and running the command:
icacls %userprofile%\Appdata\LocalLow /t /setintegritylevel (OI)(CI)L

Site outage

Well, that sucked.  My domain was MIA today.  Fixed what I could, but it's not totally working yet.

I discovered the problem this morning, when I was thwarted in my regularly scheduled RSS feed checking.  The page didn't load.  And neither did any of the other page.  Or the DNS alias that I had pointed to my home server.  So after confirming that my hosting provider was in fact up, I checked my DNS.

Somehow - I'm still not 100% sure how - my domain's nameservers got reset to defaults for my registrar.  I'm not sure if I fat-fingered something while confirming my contact info in the domain manager, or if there was some change on their end, or what.  At any rate, they were wrong.  I was able to change the settings back to my hosting provider's nameservers without any issues, but that still requires waiting for the change to finish propagating.  What a pain.

Using RewriteBase without knowing it

So here's an interesting tidbit that I discovered this afternoon: you can use RewriteBase in a .htaccess file without knowing the actual base URL.  This is extremely useful for writing a portable .htaccess file.

In case you don't know, the RewriteBase directive to Apache's mod_rewrite is used to specify the base path used in relative rewrite rules.  Normally, if you don't specify a full path, mod_rewrite will just rewrite the URL relative to the current directory, i.e. the one where the .htaccess file is.  Unfortunately, this isn't always the right thing to do.  For example, if the .htaccess file is under an aliased directory, then mod_rewrite will try to make the URL relative to the filesystem path rather than the path alias, which won't work.

Turns out that you can account for this in four (relatively) simple lines:

RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond $1#%{REQUEST_URI} ([^#]*)#(.*)\1$
RewriteRule ^(.*)$ %2index.php [QSA,L]

All you need to do is substitute in your rewrite target for "index.php" and it "just works".  No changes to server configuration required and no need to edit the RewriteBase for the specific server.

Tired of Cloud Drive

You know what?  Amazon Cloud Drive is a little bit of a pain.  I think it's time to start moving away from it. I think I'm going to give DropBox another look.

To be clear, I don't plan to stop using Cloud Drive altogether. For one thing, the Kindle integration is actually pretty nice.  And for another, they give Amazon Prime members a pretty decent amount of space for free.  And I find that the desktop Cloud Drive app actually works pretty well.  It's just the mobile app and the website that suck.

The mobile app

For starters, let me voice my new complaint with the mobile app: the semantics of syncing your pictures are rather poorly defined.  Or, rather, they're not defined — at least, not to my knowledge.  By that I mean: what happens when you delete a bunch of files from your Cloud Drive, but not from your phone's main picture storage?  I'll tell you what: the app tries to re-add them to your Cloud Drive. 

This has happened to me two or three times in the last week or two.  A couple of weeks ago I went through my Cloud Drive and deleted some of the pictures that were duplicated, out of focus, etc.  Now, at semi-random intervals, the app wants to re-upload them.

In fairness, this could be due in part to the fact that Amazon's apps all seem to share authentication and it seems to be semi-broken on my phone.  I've had this problem several times recently.  The Kindle app on my Galaxy Nexus will stop synchronizing and the settings will report that it is "Registered to .".  No, that's not a typo, my name is just missing.  And when this happens, I can't authenticate with the Cloud Drive app or the Cloud Player app either — they just reject my credentials.  So far, the only fix I've found is to deregister my device in the Kindle app settings and then re-register it.  That fixes the Kindle app as well as authentication in the other apps. 

The website

I've blogged about the Cloud Drive website and it's useless "share" feature before.  Well, despite that, the other day I decided to give that "share" feature a try.  Hey, maybe it won't really be so bad, right?

Good Lord, it was even worse than I'd imagined.

So my task was to share one or two dozen pictures of my son with my relatives.  Seems simple enough, right?  The pictures are already in my Cloud Drive, so I just need to send out some links to them.  As I noted in the last entry, Cloud Drive doesn't actually support sharing folders or sharing multiple files at once, so the only way to do this is by sharing one file at a time and copying the URL into an e-mail. 

As bad as this is, it turns out it's made even worse by the fact that Cloud Drive is now reloading the file list after you complete the "share" operation.  And to add insult to injury, the reload is really slow.  Granted, I have about 1500 images in that directory, but that shouldn't matter because the reload really doesn't need to happen at all.  I mean, nothing has changed in the folder view.  All that happens is that the UI locks up for a few seconds and then I get a "file has been shared" message.

So this only confirms my opinion that the "share" feature in Cloud Drive is unusable.  I mean, if sharing were just a matter of popping up the "share" dialog that holds the URL, copying it, pasting to another window, and then closing the dialog, that would be one thing.  It would suck, but I could deal with it.  But the brain-dead UI locking up for five seconds after each share just makes it too painful to even try.  I got through maybe half a dozen pictures before giving up in disgust.

Solution - DropBox?

So, clearly, I need another platform for sharing my pictures.  So I looked around and found myself mighily confused.  My requirements were pretty simple.  I wanted something that:

  1. Had a free account tier,
  2. Had an easy way to share by just sending someone a link,
  3. And provided a simple uploading option.

I briefly considered using the Google+ "Photos" feature, but gave up on that when I realized the whole visibility/sharing thing wasn't completely obvious.

For now, I'm trying out DropBox.  I've had a free DropBox account for a couple of years, but never really used it for anything, so that was one less thing to sign up for.  The desktop app is pretty nice and allows me just drag files between Explorer windows and shows an icon of the file's sync status.  And sharing things via the web UI is dead simple, which is exactly what I was looking for.  So we'll see how that goes.  Worst-case scenario, I won't have lost anything.

Chrome debugging victory

This post is about a Chrome bug that I fixed a few months ago in new product I was working on. Normally, this would not be noteworthy, but this one was a real mind-bender.

The product I was working on at the time is an add-on to one of my company's services. To give a little background, our bread and butter is high-resolution aerial imagery - think satellite images, but with better resolution and with oblique shots taken from multiple directions in addition to the overheads. Our biggest customers for these fly-overs are county governments, who use the imagery for a variety of purposes. For example, when I was working for a county government, we used Pictometry's images for 911 dispatching, so that dispatchers could give detailed information on a property to police and other first responders.

Anyway, counties will typically order a new fly-over every year or every few years so that their images remain up to date. One of the add-on services we offer with this is a "change finder" service, where we'll analyze the images from two different years and provide the customer with a data set highlighting the structures that have changed. The product I was working on is actually an add-on to that service. It's a web-based application targeted at local assessors that gives them a simple, no-setup-needed way to access their change data and provides a basic workflow for analyzing it and making determinations on property values. So rather than spending an hour driving out to inspect a location, the assessor can use a rich web UI to see that the property now has garage that wasn't there last year and estimate how big it is. Saves them time and keeps the tax rolls up-to-date.

As for the application itself, it's based on our next-generation internal development framework. The back-end is built on PHP and PostgreSQL/Oracle with the front-end being built on Ext JS. It features a dual-pane "before and after" map viewer that's built on OpenLayers and integrates with the customer GIS data. There is a set of search tools that lets the customer narrow down and navigate their data set. In addition, the images are overlaid with outlines to highlight what properties have changed.

But enough exposition - on to the bug! This one was a real head-scratcher. Basically, sometimes the UI would just freak out. All the controls outside the map area would just turn into one solid gray field, effectively rendering them invisible. They still worked, if you knew where to click, but you couldn't see them. They'd come back for a second when you panned the map image, but then disappear when you stopped panning. And it wasn't a transient JavaScript thing - reloading the page didn't help. And the real kicker was that this bug only happened in Chrome (we didn't test other Webkit-based browsers, as Chrome is the only one we officially support).

This bug had been languishing in our queue for weeks. It wasn't consistently reproducible. We found that it happened less if you turned off Chrome's hardware acceleration, but it didn't go away entirely. Sometimes it would happen consistently, whereas other times you could go a full week and never see it. We eventually determined that it was related to the number of images overlayed in the map area, because hiding enough of them made the issue go away. We also found that it happened more at larger window sizes. In retrospect, that was probably because that allows room for more images.

Since the non-map portion of the UI was what was getting messed up, I hypothesized that it might be due do some weird interaction with the styes used by Ext JS. However, that didn't pan out. It turned out that there was nothing remarkable about Ext's styles and every "discovery" I made ended up reducing to "more images in the map". The one insight I was able to glean was that the issue was the number of images visible in the viewport, not just the number present in the map area DOM.

I eventually found the source of this bug through sheer brute force, i.e. process of elimination. I opened up the Chrome developer tools and starting hiding and removing DOM nodes. I started outside the map area and gradually pruned the DOM down to the smallest number of elements that still caused the bug.

It was at that point that I finally found a suspicious CSS rule. It was a CSS3 transform coming from our client-side image viewing component. Apparently the rule was originally put there to trigger hardware acceleration in some browsers. I have no idea why it caused this bug, though - I don't think we ever did figure that out. But it was an easy fix and I was rather proud of myself for having figured it out. I suppose that just goes to show that sometimes persistence is more important than brilliant insight when it comes to bug fixing.

Flask on ICDSoft

A year or two ago, I decided to update my skill set a little and brush up on my Python. In particular, I wanted to do a little web development in Python, which I hadn't done in nearly five years. Since I wanted to start with something fairly basic, I decided to go with the Flask micro-framework. I ended up using that to build Lnto, which I've mentioned before and I swear will get its own page someday.

One big problem with this project was was that my hosting is kind of bare-bones. Don't get me wrong - the service is fine. I use ICDSoft, whom I've been with for several years. I have no complaints about the service I've received - in fact I'm quite happy with it. However, it's mostly focused on PHP and MySQL and there's no shell access. But on the other hand, I'm paying less than $5 a month with only a one-year commitment, so I don't have much to complain about.

Anyway, the problem with running Flask apps, or pretty much anything other than PHP, is that they have no documentation for that whatsoever. There's a FAQ, but it says absolutely nothing about Python other than that they "support" it. As far as I can tell, that just means that they have a Python interpreter installed and Apache CGI handler is configured to run *.py files. There's certainly no mention of running things using WSGI, which seems to be the recommended method for running most Python frameworks.

Another problem is actually installing the Flask libraries. The documentation for, well, pretty much every Python framework or app out there tells you the best way to install it is with pip or easy_install. But, of course, you need shell access to run those, assuming they're even installed on the server. (And I did check - they're not installed.)

Fortunately, getting around these problems was relatively easy using using virtualenv, which I'd nearly forgotten existe). This creates a virtual Python environment which is isolated from the rest of the system. A side-benefit of this is that virtualenv creates a copy of pip in your virtual environment.

You can use virtualenv directly from the source distribution. This was required in my scenario, since I lack any shell access, let alone root privileges. I simply extracted the virtualenv source archive, uploaded it to my host, and ran the following command (I used PHPsh, a web-based shell emulator, but copying them into a PHP script would have worked just as well):
python /path/to/virtualenv-1.11.4/virtualenv.py /path/to/myapp/venv

This create a virutal environment in the /path/to/venv directory. You can then install packages into that environment using the "activate" script to configure the shell, like this:
. /path/to/venv/bin/activate && pip install Flask

That was easy. I now have a Python environment with Flask installed. All I need do at this point is configure my application code to use it. That's accomplished with a few lines to initialized the virtualen environment and start up Flask as a CGI app:
activate_this = '/path/to/venv/bin/activate_this.py'
execfile(activate_this, dict(__file__=activate_this))
from myapp import app
from wsgiref.handlers import CGIHandler
CGIHandler().run(app)

I just re-ran that entire process using the latest version of virtualenv and it's actually quite painless.

And as a side-note, the reason I did that was because I noticed the other day that Lnto had suddenly stopped working - the server was just returning 500 errors. Which was odd because I hadn't changed anything with the app or the database in weeks. However, the answer was found on the virtualenv PyPi page:

Python bugfix releases 2.6.8, 2.7.3, 3.1.5 and 3.2.3 include a change that will cause "import random" to fail with "cannot import name urandom" on any virtualenv created on a Unix host with an earlier release of Python 2.6/2.7/3.1/3.2, if the underlying system Python is upgraded.

When I created that virtualenv environment, the server was running Python 2.6. But when I checked yesterday, the Python version was 2.7. So apparently ICDSoft upgraded their servers at some point. No big deal - just recreated the environment and I was good to go!

Going WYSIWYG

I must be getting old or something.  I finally went and did it - I implemented a WYSIWYG post editor for LnBlog (that's the software that runs the blog you're reading right now).

I've been holding out on doing that for years.  Well, for the most part.  At one point I did implement two different WYSIWYG plugins, but I never actually used them myself.  They were just sort of there for anybody else who might be interested in running LnBlog.  I, on the other hand, maintained my markup purity by writing posts in a plain textarea using either my own bastardized version of BBCode or good, old-fashioned HTML.  That way I could be sure that the markup in my blog was valid and semantically correct and all was well in the world.

The LnBlog post editor using the new TinyMCE plugin.

If that sounds a little naive, I should probably mention that I came to that conclusion some time in 2005.  I had only been doing web development for a few months and only on a handful of one-man projects.  So I really didn't know what I was talking about.

Now it's 2014.  I've been doing end-to-end LAMP development as my full-time, "I get paid for this shit" job for almost seven years.  I've worked for a couple of very old and very large UGC sites.  I now have a totally different appreciation for just how difficult it is to maintain good markup and how high it generally does and should rank on the priority scale.

In other words, I just don't care anymore.

Don't get me wrong - I certainly try not to write bad markup when I can avoid it.  I still wince at carelessly unterminated tags, or multiple uses of the same ID attribute on the same page.  But if the markup is generally clean, that's good enough for me these days.  I don't get all verklempt if it doesn't validate and I'm not especially concerned if it isn't strictly semantic.

I mean, let's face it - writing good markup is hard enough when you're just building a static page.  But if you're talking about user-generated content, forget it.  Trying to enforce correct markup while giving the user sufficient flexibility and keeping the interface user-friendly is just more trouble than it's worth.  You inevitably end up just recreating HTML, but with an attempt at idiot-proofing that end up limiting the user's flexibility in an unacceptable way.  And since all the user really cares about is what a post looks like in the browser, you end up either needing an option to fall back to raw HTML for those edge-cases your idiot-proof system can't handle, which completely defeats the point of building it in the first place, or just having to tell the user, " Sorry, I can't let you do that."

"But Pete," you might argue, "you're a web developer.  You know how to write valid, semantic HTML.  So that argument doesn't really apply here."  And you'd be right.  Except there's one other issue - writing HTML is a pain in the butt when you're trying to write English.  That is, when I'm writing a blog post, I want to be concentrating on the content or the post, not the markup.  In fact, I don't really want to think about the markup at all if I can help it.  It's just a distraction from the real task at hand.

Hence the idea to add a WYSIWYG editor.  My bastardized BBCode implementation was a bit of a pain, I didn't want to fix it (because all BBCode implementations are a pain to use), and I didn't want to write straight HTML.  So my solution was simply to resurrect my old TinyMCE plugin and update it for the latest version.  Turned out to be pretty easy, too.  TinyMCE even has a public CDN now, so I didn't even have to host my own copy.

So there you have it - another blow stricken against tech purity.  And you know what?  I'm happier for it.  I've found that "purity" in software is tends not to be a helpful concept.  As often as not, it seems to be a cause or excuse for not actually accomplishing anything.  These days I tend to lean toward the side of "actually getting shit done."

Cloud Drive can't share

Like many geeks, I have an Amazon Prime subscription. In addition to free shipping on many orders (which may well pay for the subscription by itself), Prime comes with a number of other features, such as streaming of selected Amazon Instant Video titles (which also paid for my subscription the last two years by having all of the Stargate series), and 5GB of free storage on Amazon Cloud Drive, which will be my topic for today.

Now, I actually kinda like some aspects of Could Drive. It's not terribly expensive - it's $0.50/GB per year and they offer six teirs from 20GB to 1TB. It integrates with my Kindle Fire HD and they have nice, unobtrusive apps for Windows, Mac, and Android that keep your files in sync. The Android app even has a nice little option to automatically sync any pictures you take to your cloud drive. So it works pretty well for me.

The one problem with Cloud Drive is sharing. Normally, that's not something I care a great deal about. However, there's that photo uploading feature of the Android app that I mentioned. And let's face it - if you have a smart phone, then it is your camera. And if your camera is automatically uploading pictures to your Cloud Drive, well then it makes perfect sense that you would share the pictures from your Cloud Drive when you want to send them to your friends and family.

Except you can't.

Well, to be fair, you actually can. You just won't want to because it sucks too much.

Maybe it would be easier to actually show you the problem. First, here's how you share a file in Cloud Drive:
One file selected, ready to share!
Nice and simple - you select a file and click the "share" item in the menu. Self explanatory, isn't it? That'll give you a dialog that looks like this:
The Cloud Drive share dialog
It's got a nice preview and the share URL and everything. Sweet!

Now, here's another picture. See if you can spot what's different.
Two files selected, sharing disbled!
Did you catch them both? One was that there were two files selected this time instead of one. The other one is that the SHARE ITEM IS FREAKING DISABLED!

So I can't share multiple files? Well, that's OK - I'll just move them both into a folder and share the entire folder. Except that I can't because sharing is FREAKING DISABLED ON FOLDERS!

So what's the upshot of this? Well, if I want to share all those nice photos that my phone has so helpfully auto-uploaded, I have to do it one at a friggin' time. That means my workflow looks like this:

  1. Select a photo in Cloud Drive.
  2. Click the "share" menu item.
  3. Copy the share URL from the dialog.
  4. Paste that URL into an e-mail or something.
  5. Dismiss the dialog.
  6. Return to step 1 and repeat for the next item.

Sure, for a handful of items, this isn't a big deal. But what happens when you have a couple dozen photos you want to share? Does Amazon seriously think I'm going to sit there and painstakingly share each individual file, noting down the share URLs? Not a chance in hell. I'm going to take advantage of the fact that Cloud Drive desktop app syncs them to my PC to bypass Amazon and just upload my pictures to another site with non-crappy sharing features. (I've been using Sta.sh, mainly because I'm familiar with it, what with having spent nearly two years helping to build it when I worked for deviantART.)
A public folder in Sta.sh.  We've had those for over two years now.
The thing that really bothers me is that this isn't a new problem. It's not like Amazon just rolled out the sharing feature and it's going to be improved very soon. If this were just a temporary stop-gap then I could forgive it. But it's been like this for a while. I don't remember when I first noticed this, but I think it's been a year or more.

And the worst part is that this is just such insultingly bad product design. Seriously, has nobody at Amazon even tried to use this freaking thing? Is the product manager just so oblivious that it never occurred to him that maybe people might want to share more than two pictures at a time? And did nobody above or below him bother to question it? Or is it just a case of trying to shut up some of their more vocal customers who demanded this by giving them a crappy, mostly-broken version of what they asked for in the hopes that they'll go away?

There are loads of other ways to do this that would solve the problem. And I get that there may be some non-obvious technical limitations to some of the possible solutions. I've seen such issues myself when I was working on Sta.sh. For instance, maybe there are back-end issues that make sharing folders actually much harder to implement than it would seem. That's fair.

But even so, is there really any excuse for this? After all, Cloud Drive is all AJAX-based, so is there any reason why they couldn't have done the UI in a way that would allow multi-sharing? I know I certainly can't think of one. It wouldn't even be hard. All you'd need to do is enable the "share" menu for multiple selections, have it fire off one AJAX request per item to get the share URLs, and then aggregate them in a text area so that you can copy and paste them en masse. I'm not saying that would be ideal, but it would be better than what they have now. Plus it would be easy. Heck, I even think I could implement that in a Greasemonkey script if I really wanted to - I looked at the AJAX requests and they're not complicated.

So seriously, Amazon, get your act together. I don't know who signed off on this half-assed feature, but they need a smack in the head. It's almost 2014 - this crap doesn't fly anymore.

Setting up a remote Mercurial repository

Note: This post is mostly a note to myself. I don't do this often and I always forget a step when I need to do it again.

I have Mercurial set up on my hosting provider. I'm using hgweb.cgi and it works well enough. However, Mercurial does not seem to support pushing new repositories remotely when using this configuration. That is, you can't just run an hg push on the client or anything like that - you need to do some manual setup.

The steps to do this are as follows:

1) On the client, do you hg init to create a new repository.
2) Copy this directory to your server in the appropriate location.
3) On the server, edit your new_repo_dir/.hg/hgrc to something like this (if you have one that you copied form step 1, just nuke it):

[web]
allow_push = youruser
description = Some description
name = your_repo_dir

4) Add a line like this to your hgweb.config:
your_repo_dir = /path/to/repo/your_repo_dir

Assuming that the rest of the server is already set up, that should do it. I always keep forgetting one of the last two steps, for some reason (probably because I only do this once in a while).

How the web changes

I spent some time looking through some old links the other day. I imported all my bookmarks into Lnto (which I really need to release one of these days) and I was browsing through some of the ones that I've had hanging around forever. Some of them date back to when I was in college.

Turns out quite a few of them were dead. Some of these were not unexpected. There were a few links to cjb.net, members.tripod.com, and suchlike sites that are now defunct. There were also several links to university web pages, many presumably belonging to students who have long since graduated.

Several of them were also domains that had changed hands. Most were parked and covered with ads. One was an anime fan site that now redirects to the official site of the distributor.

The most interesting one was a Final Fantasy fan site that is now an "escort service" site. Out of curiosity, I looked the site up in the Wayback Machine and found that this is actually a fairly recent development. Apparently the fan site was in existence until 2009. In 2010, the archived copies are just mostly empty directory listings. These continue into 2011, and then there's one copy that appears to be a broken and/or spammy blog. There are no archived pages from 2012, and then in 2013, there's a GoDaddy parked domain page in June, followed by the escort service site in July.
archive-skepticats.png
It's strange how the web works. Despite the talk about how digital content lasts forever and how it's virtually impossible to completely delete anything you put online, the truth is that content on the web is surprisingly ephemeral. Sites regularly disappear with no explanation; content gets modified with no indication whatsoever to readers; sites get reorganized, breaking every external link and just redirecting them to the front page. It's a wonder people manage to find anything at all!

This has been on my mind anyway, since I've been meaning to get back to refactoring LnBlog (which is a topic for another post). As part of that, I was going to work on a nicer URL structure. That piece is easy, but I'm committed to keeping all the old links valid. That's less easy, but not unmanageable. (It's actually further complicated by the fact that I'm considering moving off of subdomains so that )

The thing is, I've owned this domain for nearly ten years and URLs are something I never really put a great deal of thought into. But it seems obvious that I need to start thinking seriously about the best way to manage them. I want the content on my site to have true permalinks - I want the college kids who bookmark a blog entry today to still be able to visit that link when their kids are in college.

This will require some planning and future-proofing. And I'm not just about the URLs themselves - those are the easy part - but conventions for different types of content, what constitutes "permanent" content, and how I'm going to maintain all this stuff across potentially many changes in hosting and underlying technology. If I'm going to have this site until I die (and that is the plan), I'm eventually going to have an awful lot of content, and it would pay to have a plan for how to deal with that.

Opera 15 is out - I'm switching to Chrome

Well, the new version of Opera is now in stable release - Opera 15! This is the first version based on Webkit instead of Presto, Opera's in-house rendering engine. After using it for a week or so on OSX, I have to say that I'm both pleased and disappointed.

I'm pleased because, let's face it, Webkit is pretty great. Especially if you're a web developer. It's fast, supports most of the emerging standards, and has really great development tools. And while I've been an Opera fan for a while, the old JavaScript and HTML rendering engines have been showing their age for some time. They're slow by comparison to Webkit-based browsers and have a lot of quirks. So I'm very happy that I can now go to Sta.sh in Opera 15 and actually use it - in Opera 12 it's almost unusably slow.

I'm also pleased about the appearance. The new Opera is definitely pretty. The old version wasn't bad, but you can tell that they've had designers putting some time in on the new version. It looks very smooth and polished.The new Opera Discover feature

I'm less pleased about everything else. You see, Opera 15 introduces some fairly major UI changes. And by "major UI changes", I mean they've basically scrapped the old UI and started over from scratch. Many of the familiar UI elements have been removed. Here's a quick run-down of the ones I noticed:

  • The status bar is gone.
  • Ditto for the side panel.
  • The bookmark bar is also gone.
  • In fact, bookmarks are gone altogether.
  • Opera Mail has been moved to a separate application (not that I miss it).
  • The RSS reader has disappeared (maybe it went with Opera Mail).
  • Opera Unite is MIA.
  • Tab stacking has disappeared.
  • Page preview in tabs and on tab-hover are gone.
  • Numeric shortcuts for Speed Dial items have been removed (i.e. they took away what made it "Speed Dial" in the first palce).
  • The content blocker is gone (which is just as well - it was kind of half-baked anyway).


And that's just the things I noticed. But the worst part is that those UI features were the only reason I still used Opera. I feel like I'm no longer welcome in Opera land. They've literally removed everything from the browser that I cared about.

And what have they given me in return? Well, Webkit, which is no small thing. But if that's all I wanted, I'd just use Chrome or Safari. It's nice, but it doesn't distinguish Opera from the competition.

So what else is there? Well, Speed Dial has gotten a facelift. In fact, Speed Dial has sort of turned into the Opera 15 version of bookmarks. You can now create folders full of items in your Speed Dial and there's a tool to import your old bookmarks as Speed Dial items. I guess that's nice, but I'm not seeing where the "speed" comes in. It seems like they've just changed bookmarks to take up more screen real estate and be more cumbersome to browse through.

They've also introduced something called Stash, which, as far as I can tell, is also just another version of bookmarks. But instead of tile previews like Speed Dial, it uses really big previews and stacks them vertically. They're billing it as a "read later" feature, but as far as I can tell it's functionally equivalent to a dedicated bookmark folder. I guess that's nice, but I don't really get the point.

And, last and least, there's the new Discover feature. This is basically a whole listing of news items, right in your browser. Yeah, 'cause there aren't already 42,000 tools or services that do that. One that's directly integrated into the browser is just the killer feature to capture the key demographic of people who like to read random news stories and are too stupid to use one of the gajillion other established ways of getting them. Brilliant play, Opera!

Now, I'll grant you - visually, the new Speed Dial, Stash, and even Discover look fantastic. They're very pretty and make for really nice screenshots. However, I just don't see the point. I can imagine some people liking them, but they're just not new or innovative - they've just re-invented the wheel in a way that's more convoluted, but not visibly better.Opera's new Stash feature

Overall, I get the feeling that Opera 15 was designed by a graphic designer. Not a UI designer, but a graphic designer. In other words, it was built to be pretty rather than functional. I know I've sometimes had that experience when working with a graphic designer to create a UI - you get mock-ups that look beautiful, but were clearly created with little or no thought to their actual functionality. So you end up with workflows that don't make much sense, or new UI elements with undefined behavior, or some little control that's just "thrown in" but represents new behavior that is non-trivial to implement.

Honestly, at this point I think it's just time to switch to Chrome. I already use it for all my development work anyway, so I might as well use it for my personal stuff too. I had a good run with Opera, but I just don't think the new version is going to meet my needs. Maybe I'll take another look when version 16 comes out.

Site redesign

This weekend I did something I've been meaning to do for a while: I redesigned my website. In fact, unless you're reading this in your RSS aggregator, you probably already noticed. It was about time, too - I'd had the same fractured, dated design for years. There are probably still some kinks to work out, but for the most part I think it looks much cleaner.

This time I decided to do a real site design. As in, not only did I update this blog, but also the other sections of this site and the landing page as well. In fact, this started as a re-do of my landing page and I ended up abstracting that design out and making it a theme for LnBlog. Then I just set this and my other LnBlog blogs (you know, the ones in the header that nobody reads) to use it. Update accomplished!

I figured that, what with being an experienced front-end web developer, it would probably be a good idea to make my site look decent. Lends to the credibility and all. The randomly differing styles of the old pages looked kinda crappy and the sharp edges,weird colors, and embellishments were not so great. Of course, I'm no graphic designer, but I think this looks a bit cleaner. And at the very least, it's consistent.

Better IE testing

It seems that Microsoft has decided to be a bit nicer about testing old versions of Internet Explorer. I just found out about modern.IE, which is an official Microsoft site with various tools for testing your web pages in IE.

The really nice thing about this is that they now provide virtual machines for multiple platforms. That means that you can now get your IE VM as a VMware image, or a VirtualBox image, or one of a few other options.

When I was using Microsoft's IE testing VMs a couple of years ago, they were only offered in one format - Virtual PC. Sure, if you were running Windows and using Virtual PC, that was great. For everyone else, it was anywhere from a pain in the butt to completely useless. This is a much better arrangement and a welcome change. Nice work, Microsoft!

Switching to Tiny Tiny RSS

With the imminent demise of Google Reader, and FeedDemon consequently being end-of-lifed, my next task was clear: find a new RSS aggregator. This was not something I was looking forward to. However, as things turned out, I actually got to solve the problem in more or less the way I'd wanted to for years.

The thing is, I never really liked Google Reader. The only reason I started using it was because I liked FeedDemon and FeedDemon used Google Reader as it's back-end sync platform. (And if you've ever tried to use a desktop aggregator across multiple systems, you know that not being able to sync your read items across devices is just painful.) But I seldom used the Reader web app - I didn't think it was especially well done and I always regarded the "social" features as nothing but a waste of screen space. Granted, the "accessible anywhere" aspect of it was nice on occasion, but my overall impression was that the only reason it was so popular was because it was produced by Google.

The other issue with Reader is that I don't trust Google or hosted web services in general. Paranoia? Perhaps. But they remind me of the saying that "If you're not paying for the product, then you are the product." And while I know a lot of people aren't bothered by that, I think that Google knows far too much about me without handing them more information on a silver platter. Furthermore, you can't rely upon such services to always be available. Sure, Google is huge and has ridiculous amounts of money, but even the richest company has finite resources. And if a product isn't generating enough revenue, then the producer will eventually kill it, as evidenced by the case of reader.

tt-rss-thumb.png
What I'd really wanted to do for some time was to host my own RSS syncing service. Of course, there's not standard API for RSS syncing, so my favorite desktop clients wouldn't work. But with FeedDemon going away as well, and having no desktop replacement lined up, I no longer had to worry about that. So I decided to take a chance on a self-hosted web app and gave Tiny Tiny RSS a try. I was very pleasantly surprised.

The installation for TT-RSS is pretty simple. I use a shared hosting account, and though the documentation says that isn't supported, it actually works just fine. The install process for my host consisted of:

  1. Copy the files to my host.
  2. Create the database using my host's tool.
  3. Import the database schema using using PHPMyAdmin.
  4. Edit the config.php file to set the database connection information and a couple of other settings.
  5. Use my host's tool to create a cron job to run the feed update script.
  6. Log in to the administrator account and change the default password.
  7. Create a new account for myself.
  8. Import the OPML file that I exported from FeedDemon.

That's it. Note that half of those steps were in the TT-RSS UI. So the installation was pretty much dead-simple.

In the past, I wasn't a fan of web-based RSS readers. However, I have to say that Tiny Tiny RSS actually has a very nice UI. It's a traditional three-pane layout, much as you would find in a desktop app. It's all AJAX driven and works very much like a desktop client. It even has a rich set of keyboard shortcuts and contextual right-click menus.

mobile-thumb.png
As an added bonus, there's also a pretty nice mobile site. While the latest release (1.7.5) actually removed the built-in mobile clientweb site, there's a very nice third-party JavaScript client available. It uses the same API as the mobile clients, so the installation pretty much consists of enabling the API, copying the files to your host, and editing two settings in the config file to tell it the path to itself and to TT-RSS.

But who cares about the mobile site anyway? There are native Android clients! The official client is available as trial-ware in the Google Play store. And while it's good, I use a fork of it which is available for free through F-Droid. In addition to being free (as in both beer and speech), it has a few extra features which are nice. And while I may be a bit cheap, my main motivation to use the fork was not the price, but rather the fact that the official client isn't in the Amazon app store and I don't want to root my Kindle Fire HD. This was a big deal for me as I've found that lately my RSS-reading habits have changed - rather than using a desktop client, I've been spending most of my RSS reading time using the Google Reader app on my Kindle. The TT-RSS app isn't quite as good as the reader app, but it's still veyr good and more than adequate for my needs.

Overall, I'd definitely recommend Tiny Tiny RSS to anyone in need of an RSS reader. The main selling point for me was the self-hosted nature of it, but it's a very strong contender in any evaluation simply on its own merits. In my opinion, it's better than Google Reader and is competetive with NewsBlur, which I also looked at.

A VZW WTF

I've been looking into new cell phones recently. My Verizon Wireless contract is up next month, so I've been doing my homework, trying to figure out what model phone I should get this cycle.

Like any good computer geek, I've been doing my research online. So while I was browsing the Verizon Wireless site to see what was available, I came across this lovely little popup:
Due to inactivity, your session will end in approximately 5 minutes.  Extend your session by clicking "OK" below

Seriously Verizon? Asking the user to manually "extend" his session? What the heck?!? Is this some kind of throw back? Is this the Web 2.0 version of a pay phone - you just click a button instead of putting a quarter in the slot?I mean, REALLY?!?

Aside from the mind-bendingly horrible design, there are a few other things that bother me about this dialog.

First, what does it even mean? It's not like I was logged in or anything, so what possible data could this "session" have that I even care if it expires?

Second, why is this an alert box and not a confirmation box? The only option is to click "OK". Well, what if I don't want to extend my session? What do I do then? And if not extending the session isn't an option, why are they even bothering to ask?

Third, why is this even here at all? Speaking as a professional web developer, I can't think of a single technological reason why they would have to do this. It just isn't necessary. There are other, less obtrusive ways to persist data that don't involve issuing prompts. Is this really nothing more than a lame attempt to keep the user engaged, whether he wants to be or not? "Look at me! Look at me! You have to click me! Hey, pay attention!"

Sigh.... Just another reason to hate Verizon. If only it wasn't for their coverage area and free in-network calling, I'd jump ship.

On the up side, at least I got some useful phone information. I was originally leaning toward those new touch-screen iPhone knock offs, like the Samsung Glyde or LG Voyager. However, we just got back from a vacation in Hawaii, so I'm trying to keep the expense down. And even after the contract pricing and rebates, the touch screen phones are pretty much all $150 or more, and I really don't want to drop $300+ on a pair of new phones right now. Plus, appart from the inherent coolness of the touch screen, I'm not really sure what those phones would do for me that anything else with a QWERTY keyboard couldn't.

So, I'm thinking maybe a basic smartphone. After contract pricing and rebate, Verizon has the Motorola MOTO Q 9c and Palm Centro for $100. Both of them seem nice and have document viewing and other features that would actually be quite handy for me. There are still things to debate, though - the Centro has a touch screen, but the MOTO Q has Windows Mobile. I'm going to have to do a little more reading.

The world's most accurate Twitter account

Thanks to Carl and Richard from DNR for pointing out the best Twitter stream even. They mentioned it on show 379. You can find it at: http://twitter.com/thisispointless.

There are 3 things I like about this stream. First, it's actually kind of funny. Second, I think the username pretty well sums up everything about Twitter. And third, I just love the open mockery when there are people out there who are actually trying to communicate over Twitter. Because apparently nothing worth saying could possibly take more than 140 characters.

I'm sorry, that was a cheap shot. I was misrepresenting the true purpose of Twitter - to send pointless messages that people really shouldn't care about.

John Lam rocks

So this week, .NET rocks had an interview with John Lam of Iron Ruby fame. It was a good show, but for me, the single best part was a comment John made in the last 5 minutes of the show. He said:


Rails is the Visual Foxpro of the 21st century.


I just thought that was great. It really appealed to the contrarian in me. With people touting Ruby on Rails as the future of web development, it's nice to see it compared to the has-been languages of yesterday.

Of course, I'm taking that quote out of context. John wasn't actually trying to put down Rails, but was simply making the point that it's a tool that appeals to pragmatists, in that it allows you to quickly and easily create simple applications that interact with databases. Which is exactly what Visual Foxpro did too.

For the record, I'm not a Ruby on Rails fan. I don't dislike it, though - I haven't used it enough to form a strong opinion. However, the few times I've played with it, I found it nice, but not compelling. I didn't hate it, but I didn't like it enough to put in the effort of learning both Rails and Ruby. Though, truth be told, from what I've seen, I like Ruby a lot better than I like Rails.

Likewise, I am not a fan of Visual FoxPro either. Of course, I've never actually used Visual Foxpro, but at my last job I did have to maintain and rewrite some old FoxPro 2.6 (for DOS) applications. Maybe my opinion was influenced by the apps I was working on (they were written by a clerk who "knew FoxPro" and was trying to help out), but I found it to have all the elegance and sophistication of VBA in Microsoft Access. And for those who aren't familiar with Access VBA, it has all the grace and subtlety of a sledgehammer.

The thing is, FoxPro actually worked pretty well. Granted, it was ugly and promoted antipatterns, but it was pretty easy to create a simple desktop database application. I don't know if it was simpler than Rails, but it was probably in the same league. Same thing with Visual Basic - just drag and drop a few controls on a form and, voila! Working database app!

I think that's one of the things that turns me off about Rails a bit - the examples and hype around it smack of VB6 demos from the 1990's. Whenever I see a demo that says something like "Create <impressive sounding thing> in 15 mintes," I'm automatically skeptical. I'm skeptical because I've seen the same thing done in VB6. It's same reason I get turned off whenever I see someone extol the virtues of programming language X over language Y by pointing to how much shorter a "Hello, World," program is in X, or some other such inane metric. It's just not a useful or meaningful comparison.

Getting something up and running fast is all well and good. Nobody wants to spend 3 months on infrastructure just to spend a week building the actual application. But the name of the game is maintainability. It doesn't matter how fast you get up and running if you have to go back and tear out half the application when your requirements change. Likewise, your productivity gains start to evaporate when you need to spend 3 days coming up with a work-around because your environment forces you to do things in a particular way that doesn't really work in your case.

Bottom line: there's always a catch. And if you think there isn't a catch, then you just don't have enough experience to know what the catch is yet. No technology is perfect, whether it's ActiveRecord, data-bound controls, or whatever other new miracle library came up in your RSS reader today. There's always something that will get you if you're not careful. The trick is to know what it is before it's too late to account for it.

No, that spam didn't come from me

I found a nice surprise in my e-mail the other morning: about 200 new messages. All of them bounce notifications from undeliverable spam I didn't send.

Apparently some sleaze-bag decided to forge my e-mail address as the sender on a batch of spam. That sucks. Unfortunately, there's not really anything I can do about it. Forging e-mail headers is ridiculously easy and there isn't really any way to keep someone from using your address. So I guess it just sucks to be me.

As to the nature of the spam, I noticed a number of commonalities from the bounce messages. First, all the bounced messages I checked had the same mailer header:
X-Mailer: Microsoft Outlook Express 6.00.2900.3138
Second, I noticed two big trends in the actual content. Lots of them were targeted at Russians and eastern Europeans, as evidenced by the target domains and the use of the Cyrillic alphabet, and there were also a bunch that simply contained links to what appeared to be a Google ad for a German taco restaurant. As for the originating IP addresses, most of the ones I checked were from eastern Europe, but I also saw IPs from China, Turkey, Germany, and several from the US.

Here's a representative example of one of the German taco link messages, as encapsulated in a bounce message.
Received: (qmail 74968 invoked from network); 31 Mar 2008 10:24:57 -0000
Received: from unknown (66.218.66.72)
by m54.grp.scd.yahoo.com with QMQP; 31 Mar 2008 10:24:57 -0000
Received: from unknown (HELO 84.255.241.179) (84.255.241.179)
by mta14.grp.scd.yahoo.com with SMTP; 31 Mar 2008 10:24:56 -0000
Message-ID: <000501c89319$05ef79a0$e3e718b8@kdjpt>
From: "dom carey" <pageer@skepticats.com>
To: <XXXXXX@yahoogroups.com>
Subject: Your neighbour naked!! watch
Date: Mon, 31 Mar 2008 08:37:48 +0000
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_0002_01C89319.05ED4C2A"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
X-eGroups-Remote-IP: 84.255.241.XXX
X-UID: 1472

This is a multi-part message in MIME format.

------=_NextPart_000_0002_01C89319.05ED4C2A
Content-Type: text/plain;
   charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

eGjMPUJeuz
Download and WatchuogLVeGjMPU
------=_NextPart_000_0002_01C89319.05ED4C2A
Content-Type: text/html;
   charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2900.3199" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<center><style>eGjMPUJeuz</style><br><a =
href=3D"http://www.google.com/pagead/iclk?sa=3Dl&ai=3DfFfwMk&num=3D72011&=
adurl=3Dhttp://www.taco-loco.de/video.exe">Download and =
Watch</a><style>uogLVeGjMPU</style> </center></BODY></HTML>
------=_NextPart_000_0002_01C89319.05ED4C2A--

Anyone care to share an opinion? Looks to me like your typical botnet spam wave to me. The only thing that makes it interesting is that I happened to end up getting a glimpse of where it came from.

MIME type trivia

Random bit of trivia: apparently the correct MIME type for JavaScript files is application/x-javascript.

We actually discovered this by accident at work this morning. Edwin was trying to figure out why Apache wasn't compressing a new JS file he'd added to the site and discovered that the file was being served as application/x-javascript instead of the expected text/javascript. This raised the question: "What the hell is wrong with Apache?" A little Googling then revealed the answer: "Oh. there's nothing wrong with Apache. It's just us."

So far, all the general MIME type references I've checked list JavaScript as application/x-javascript. Who knew? I've been using text/javascript for years. So has Edwin. Heck, it's even in the examples in the HTML 4.01 spec. I just figured that was the "official" type. Apparently I was wrong. Guess I'll have to have Lumberg send me another copy of that memo.

Account for cache

Here's another interesting debugging tid-bit for you. Assuming, that is, that anyone but me cares about such things.

This past Friday, we rolled out a new feature of the site. Without giving away any identifying details, it involved a change to the way data related to user activity is tracked in one of my modules. In the past, we tracked this "user activity data" (UAD) via a batch calculation. Every night we ran a cron job that analyzed various database tables and calculated some results that we then stored in the database. It was a fairly intenxive calculation that touched a lot of tables, so it didn't make sense to do it in real time.

For this new site feature, I had to switch the module over to an event-driven approach. That is, rather than the batch calculation, we're now keeping a "running tally" of the UAD based on user actions. This means that the users can now see this information change in real time, which is good, but it also means it's now hard/impossible to determine exactly how a particular user arrived at their current UAD.

So here's the catch: for reasons I won't get into, I need to track the per-day UAD change. I did this by simply adding a "daily change" column in the database that I update along with the UAD score. Then, every night, we run a cron job to zero-out that column in the database. The SQL query it runs is simply:
update uad_table set daily_change = 0;
For non-coders, that just sets the daily_change to 0 for every row, i.e. every user, in the database. Nice and simple. For a query like that, there isn't much testing you can do - it either works or it doesn't.

With that in mind, I was quite surprised this morning to learn that this simple query was apparently not working for all users. While the daily change column was indeed cleared for some users, it was not cleared for some others. But, for SQL above, that is absolutely impossible. The only way that could happen is if MySQL was badly, badly broken.

But, as with all things in software, the devil is in the details. I failed to account for two things in this equation: my database update code and caching.

First, my update code. The relevant portion of my PHP update code looked something like this:
$this->uad += $adtl_uad;
$this->daily_change += $adtl_uad;
$this->save();

In this, the save() method writes the object properties to the database. This is pretty standard stuff. The problem is: what happens if the database row changes between the time the object fetches it and the time it writes the updates? The answer: the change will be clobbered and the object will write bad data. So if the user is doing something that changes his UAD and if things are timed just right, we could end up in a situation where this code reads a "daily change" value of 500, our cron job updates the database to make it zero, and then the update code writes back a "daily change" value of 500 plus something, thus clobbering our update. However, this alone could not be the problem because the time between reading the data and writing it is so small - way less than a second. The timing would have to be just right, and the odds of that happening often enough to account for the number of un-updated users are just too small.

No, the real problem was when we combined this update code with cachine. You see, we use memcached to cache the results of database queries in order to take some of the load off of our database servers and increase performance. I was using memcached for the UAD retreival, so whenever I retreived a user's UAD from the database, I would first check if we had a memcached copy of it. If so, I'd use that and bypass the database altogether. If not, then I'd query the database and then save the results in memcached. When I wrote the updated UAD back to the database, I would clear the memcached data for that record so that it would be refreshed on the next access.

If you're sharp, you may have picked up on the problem now. Where does the cron job clear memcached? It doesn't. So what happens if I run the cron job and then someone accesses a user UAD record for which there is still memcached data? Well, the PHP script is going to get the cached data that still has a non-zero "daily change" value, update that value, and write it back to the database, effectively canceling out the column zero-out for that record. Doh!

My solution was simply to change the PHP method that updates the UAD. Rather than calculating the UAD in code and writing the result back to the database, I now just let the database do the calculation. The parameterized SQL looks something like this:
update uad_table set uad = uad + :adtl_uad, daily_change = daily_change + :adtl_uad where uid = :uid

The moral of the story is two-fold. First, scale changes everything. On a smaller site that didn't need memcached, the race condition in the update code probably never would have been an issue.

Second, keep your database access consistent. The reason this problem popped up is because I was handling data through cache-aware domain objects in the web application, but going straight to the database in the cache-ignorant cron job. Ideally, I would have used the same update method in both the cron job and the web application. The only problem with that is that running one bulk update query in the cron job is a lot faster than iterating over 400,000 rows of user records to run a PHP method that has the same effect. Which is yet another example of how little in software is ever as simple as it seems.

Remember the magic properties

You know what's a good way to feel really beaten-down at the end of the day? Spend 2 hours debugging a 1-line problem.

That's just what happened to me the other day. And, while I place some of the blame on PHP in order to make myself feel better, it was actually due to my own forgetfulness and tunnel-vision. If I had just taken a minute to step back and consider the possibilities, I probably would have had it figured out much faster.

Since last week, I have been working on the data access code for our site. Basically, we're going from using ad hoc SQL queries scattered throughout the codebase to a model-based design, where we have an abstract data model class which handles database access and then classes that inherit from that. You know, sort of an ActiveRecord pattern, a la Ruby on Rails or CakePHP.

Anyway, things were going along quite nicely. That kind of code conversion is always a little tedious, but I hadn't hit any major road-blocks and all my unit tests were passing.

Well, Tuesday afternoon, I hit a snag. I'm not sure at what point it happened, but I ran my unit tests, and suddenly I was seeing red. Lots of red. Everything was failing!

Well, I immediately when into homing mode. I started taking out tests and trying to isolate a single failure. It turns out that the first failure was in the test case for my class constructor. A little more debugging revealed the problem - none of the class members were being populated from the database. And yet, there was nothing in the error logs and the connection to the database wasn't failing.

Now, the way our (custom) abstract model class works is that it reads the table layout from the database, stores that in the object, uses it to determine the primary key of the table, and then retrieves a row based on the primary key value you pass. What was happening was that the constructor was running, getting the table data from the database (which is how I know the connection was functioning), and running the code to retrieve the row, but the values of the class members were never set.

Well, needless to say, I was baffled. Before too long, I thought I had narrowed it down to the method that actually retrieves the database row. We're using PDO (PHP Data Objects) for database access in the model class and I was using the feature of PDO whereby it will return a row from the database as or into an object. You can pass it an object and it will set each column as a field in the object. Alternatively, you can pass PDO a class name it it will return a new object of that class with the appropriate fields set from the database. You can also have it return an stdClass, which is PHP's generic object. I thought I was onto something when I discovered that fetching into the current object or fetching into a new object of the same class didn't work, but fetchin a generic stdClass did. So, of course, I eventually decided to work around the problem by re-coding the method to use the stdClass and just populate my object from that. Imagine my surprise when, even after that, and verifying that I was getting the row from the database, my unit tests still failed.

Well, to cut a long and boring story short, the problem was the magic __set() method. This is one of the "magic" functions PHP lets you define on a class. This particular one takes a key and a value and is executed every time you try to assign a value to a class property that doesn't exist. Well, we had defined that in the child class, not the model class, to do an update on the database in certain conditions and when I was re-writing it, I forgot to account for the default case - when you we don't want to update the DB, just assign the value to the key. It was one line, $this->$key = $value, which I had put inside an "if" block instead of out. So every time my code tried to do something like $this->foo = 'bar', the statement ended up being a no-op. Same thing when PDO tried to populate the object. But populating the field with the table layout worked fine, since that field was set in the class definition rather than being created at run-time.

So what's the moral of this story? Well, I guess there are several. First, tunnel-vision in debugging is a bad thing. I got hung up on the PDO object fetching and wasted lots of time trying to figure out why it wasn't working when the problem was actually someplace completely different.

The second lesson is that you need to stick to a process - in this case, I should have been doing real test-driven development. The reason this took so long is that I let far too much time elapse between the change to the __set() function and running my unit tests. If I had been doing real TDD and had run my tests after every change, I would have instantly been able to pinpoint the problem.

The last lesson is simply that software development is about managing complexity, and managing complexity is hard. Hacking away, designing as you go, relying on raw brain-power is all well and good, but it's not sustainable in the long run. When you run out of gas, or you're having a bad day, you need something to fall back on. This episode reminded me of that the hard way.

This seems to be a recurring theme for me. The longer I develop software, the more obvious it becomes that I need to imrpove my process - even whey I'm working by myself. It's kind of funny, really. The more I learn about software development, the less I feel like I know what I'm doing. There's just so much I have left to learn, it seems a little overwhelming. I know that doesn't make me inadequate - in fact, they say wisdom is knowing what you don't know. But it can still be a bit...disquieting at times.

When null is not null

What follows is the saga of a bug that took about 2 hours to pinpoint and 5 minutes to fix. It reminds me of one reason I some find dynamically typed languages annoying: implicit casting.

Here's the scenario: The manager tells me he noticed some weird output on one web page. Specifically, this one particular field, which should be empty, is showing the word "null." I own this field, so it's my job to fix it.

Naturally, I start by looking at the code. The site is written in PHP, so I figured that the field is null in the database and PHP is just printing out the word instead of the empty string, as I'd expect. But that's not it. There's actually already a check for null in the data retrieval code. So, just to check, I look at the relevant row in the database. And guess what - the field in question has the value "null." And just to be clear, I don't mean the field is null, I mean it contains "null" - as a string.

Now the plot thickens. This row is user-submitted content. This particular field has a default value of NULL (that's actual, database null) and users need special permissions in the application to add or edit it. The user in question does not have these. Another check of the database reveals that there are a number of similar records, submitted by users without the proper permissions, that have the string "null" in this field. However, it's not all of them, which means it's not related to the default field value. So something is weird.

Off the top of my head, I knew that there are only two places in the code where a user can set that field. One was the main submission form, the other was an AJAX-based update page. A quick check of the database revealed that the updated timestamps on the affected records were empty, so naturally I concentrated on the submission form. Of course, this turned out to be the wrong decision, as the submission form was correct. It turns out that the updated timestamps for user-submitted records never actually got updated.

So I checked out my other possibility - the AJAX update page. It turns out that the server-side code was correct. The null was actually coming from the JavaScript!

When I added the problem field to this page, I made one bad mistake. You see, since only privileged users can edit the field in question, I decided that the server-side code would simply not put it on the page for non-privileged users. That's fine. However, I failed to account for the dynamically-created edit boxes. You see, because this page is actually a listing page with an edit feature, we were actually dumping the database field values into DIVs and then replacing these DIVs with input boxes when the user clicked the "edit" link. We then took the values of those boxes and sent an HTTP request to the server.

Well, if you're paying attention, you may see what happened. I wasn't careful with my JavaScript. When I added the ability to edit this one field, I didn't check if it actually existed in the page. I just used our JavaScript framework to grab the value and then dump it into the POST data I was sending to the server. Well, if the DIV for that field didn't exist, then the value I got back was null. And when I put that in the POST data, it was type-cast to the string "null." So, when the server-side page got the data, it saw that string, not the empty object value, and dutifully stored it in the database.

The moral of the story: when working in a weakly-typed language, you still have to be careful about your variable types. In fact, you have to be more careful. Because the compiler/interpreter won't help you find problems and you can't always count on implicit conversions being done the way you mean. Especially in JavaScript.

PHP IDE mini-review

Tomorrow marks my 2-month anniversary at my new job doing LAMP. And for most of that two months, I've been going back and forth on what editor or IDE to use.

My requirements for a PHP IDE are, I think, not unreasonable. In addition to syntax highlighting (which should be a given for any code editor), I need to following:

  1. Support for editing remote files over SSH. This is non-negotiable.
  2. A PHP parser, preferably with intellisense and code completion.
  3. A file tree browser that supports SSH.
  4. Syntax highlighting, and perferably parsers, for (X)HTML and JavaScript.
  5. Search and replace that supports regular expressions.
  6. Support for an ad hoc, per-file workflow. In other words, I don't want something that is extremely project-centric.
  7. It should be free - preferably as-in-speech, but I'll take as-in-beer if it's really good.

So far, my preferred IDE has been Quanta Plus. It has all of the features I need and also integrates nicely with KDE. It also has a few other nice features, including context-sensitive help (once you install the documentation in the right place). However, the build of Quanta 3.5.6 that came with Kubuntu Feisty is kind of unstable. It crashes on me every few days, and for one project, I actually had to switch to something else because I was making heavy use of regex search and replace, which was consistently crashing Quanta. Also, while Quanta has a PHP parser with some intellisense, it's pretty weak and not in any way comparable to, say, Visual Studio.

My second heavier-weight choice is ActiveState's free KomodoEdit. This is a very nice, XUL-based editor. It's stongest feature is undoubtedly the PHP parser. It's really outstanding. For instance, it can scan pre-determined paths for PHP files and do intellisense for them. It even understands PHPDoc syntax and can add the documentation to the intellisense.

The down side is that, while Komodo does speak SFTP, the file browser tree only does local files. There is a Remote Drive Tree extension that adds this feature, but while it's better than nothing, it still isn't that good. I also don't much care for the look of Komodo or for the keyboard shortcuts. Those things are much easier to customize in Quanta.

After Quanta, my other old stand-by is jEdit. After installing the PHPParser, XML, and FTP plugins, this meets my needs. On the down side, the PHP parser doesn't do any intellisense (although it does detect syntax errors). The interface also feels a littly clunky at times, although it's much better than the average Java application and not really any worse than Quanta in that regard.

I took a brief look at a couple of Eclipse setups, but wasn't initially impressed by them. It might be worth looking at them again some time, but the whole process of getting and installing the appropriate plugins just seemed like a lot of trouble. Same goes for Vim. I'm sure I could get it to do most, if not all, of what I want, but it seems like an awful lot of trouble. And then, of course, there's the Zend IDE, which I don't really want to pay for. And besides, my one of my co-workers told me that, while it's a decent IDE, the real selling point is the integrated debugging and profiling, which won't work on our setup.

And so my intermitent search goes on. I'm hoping that the upgrade to Kubuntu Gutsy will fix the stability problems in Quanta, which is my biggest problem with it. I'm also hoping for some nice new features when KDE 4 comes along. But I guess I'll keep looking in the meantime.

The new job and new facts

I started my new job yesterday. After a whopping 4 days of work, I'm actually feeling pretty good about it. I've been leaving the office feeling energized and enthused about my work - or at least not run-down (though the 1.5 hour commute fixes that). This is quite a change from my last job.

On Tuesday I started work on my first project. Of course, I really only got in a couple of hours of real work. Most of the day I spent with another programmer going over the database schema, application architecture, and all that good stuff. You know, the things you need to know to be able to sensibly make an addition to a substantial piece of software. I managed to get a bunch of work done Wednesday and today, though. It's really not that big of a project, it's just that I'm not used to the codebase and it's a somewhat large and complicated.

So far, this job is a lot more fast-paced than my last one. In fact, I've already learned a few things.

  1. In PHP library files, you don't need to close the <?php ?> tag. You can just have the opening <?php and not have to worry about stray space at the end of the file. For some reason, I just never knew that.
  2. Writing PHP will error reporting turned off really sucks. Well, technically it's not completely off. The errors are logged - to a 200MB file on another server. Which still sucks.
  3. MySQL throws really weird errors when you try to declare a foreign key and give it the wrong size integer for the key column.
  4. I had never used ti before, but the jQuery JavaScript library is very cool. For example, it allows you to select DOM nodes with XPath expressions. How cool is that? Though on the down side, the syntax is a little...weird.

I'm sure the learning is just beginning. Methinks this job will make for much better technical blog-fodder than the last one.

PHP bug of the day

I came across an annoying little bug in PHP this afternoon. Nothing I can't work around, but it's another example of the general suckiness of object-oriented programming in PHP.

Here's the situation. I've been slowly refactoring LnBlog over the last few months. I'm trying to make the design more object-oriented, easier to maintian, and just generally less messy and ad hoc. I'm adding in unit tests with SimpleTest and, at least today, I was actually working with a copy of Martin Fowler's Refactoring open in front of me.

The particular problem that popped up was with a class called Path that managers the building and converting of filesystem paths. Two of its methods are get() and getPath(). Both simply join a list of path components into a single path string. The difference is that get() is an instance method and works on instance variables. The getPath() method, on the other hand, is a static method where you just pass in the path components as parameters. Since these two methods do essentially the same thing, I thought that it would make sense to combine them.

In a language like C#, I would simply do this by overloading the get() method and having an instance get() with no parameters and a static get() with a parameter list. However, there is no overloading in PHP. The typical method is simply to fake it with optional parameters. Ugly, but it works.

Well, today I thought I'd be clever. I reasoned that, when a method statically, there is no instance of the class and hence the $this variable isn't set. So I tried something like the following: function get() { if (isset($this)) { return $this->implodePath($this->sep, $this->path); } else { $args = func_get_args(); return Path::implodePath(DIRECTORY_SEPARATOR, $args); } } The problem with this is that it only sorta, kinda works. More specifically, it only works when you don't call it statically inside another class method.

If you're familiar with bug 30355, this shouldn't come as a surprise. Turns out this behavior is actually for backward compatibility with various badly broken code. I can only hope that this is a bug that got grandfathered in rather than something that was originally done by design.

At any rate, the workaround was quite simple. I should have done it the first time - just replace that isset($this) with func_num_args() == 0. problem solved.

Tasting the Rails kool-aid

This week I started dabbling in Ruby on Rails. After reading Scott Hanselman's and Mike Gunderloy's coverage of RailsConf, I almost felt like I had to. So far, I'm impressed.

For anyone who hasn't been paying attention, Ruby on Rails is the current "big thing" in developing web applications. Ruby is a dynamic, object-oriented programming language from Japan. So far, it seems to read like the bastard child of Python and Perl. But in a good way, if that's possible. Rails is an MVC web application framework for Ruby. Through the magic of Rails, you can literally create a working, if extremely basic, database application in less than 10 minutes. No joke.

I haven't had much time to play with RoR yet, but it seems really, really nice. It basically takes all the pain out of building data-driven web applications. Rather than spend your time messing around with database access - buidling the plumbing, designing insert/edit screens, and so forth - rails lets you automatically generate all that stuff. For example, in the tutorials, they just have you create a new Rails application, create a new MySQL database and table, and run a couple of commands. Then you can fire up a web browser and add records to the database. All without writing a single line of code!

The thing that's really nice about RoR is that it encourages good system design. The default Rails application has a proper model/view/controller architecture, includes testing infrastructure, and other good stuff. In other words, it gives you a good starting point. Contrast this with Visual Basic. Bask in the VB 6 days you used to see tutorials on data-bound controls that showed you how to build database applications without writing a single line of code. All you had to do was drag and drop a bunch of controls in the VB designer and voilà - instant application. The only problem was that VB 6 seemed to actively encourage writing really crappy code. For instance, it generated brain-dead default names for all those drag-and-drop controls - why name a control DataSource1 and force the user to change it when you can just prompt him] for th econtrol name? It also made no attempt separate the interface, data access layer, and business logic. In fact, doing that made the magical drag-and-drop features less helpful. And unit testing? Was that even possible in VB 6?

My only complaint about RoR so far is that there doesn't seem to be a lot in the way of documentation yet. Sure, there's the API reference, but not much in an instructional vein. And so far, I'm unimpressed with the two tutorials I've looked at. But I suppose I can let that slide. After all, RoR is quite new - building up a good documentation base takes time. Of course, by the time we get some good tutorials, I won't need them, so what do I case?

The browser feature I always wanted

In Roger Johansson's latest thoughts on HTML 5 (which I generally agree with), he mentions a potential browser feature that I've been wanting for some time: built-in notification of markup parsing errors. He even linked to a Firefox extension that implements it.

There are two reasons this would be an extremely great feature to have in all browsers. The first, and most obvious, is that it would make web development easier. Need to validate all your pages? Just visit them. Instant feedback with no need to mess with web-based validation services.

The second reason is essentially the one Roger gives. That is, it would be a nagging little problem indicator that's actually visible to users and developers. It basically adds accountability. Users might not care about the markup per se, but they may very well be concerned that there's a little red error icon in their status bar. It's a little harder to justify bad markup when you have to explain to customers why you haven't fixed it.

For now, I guess I'll just have to make due with that HTML validator Firefox extension. In my 20 minutes of using it, I've found it surprisingly powerful and quite handy. Now if only they had one for Opera....

Does anyone actually like eBay and PayPal?

Last night I bought something through eBay for the very first time. It was a DVD set that I'd been looking for for a while. I actually discovered the aution through Google. The price to "buy it now" on eBay was lower than any other sites I could find and the auction ended in an hour, so I figured, "Why not?" As it turns out, because it's a lousy online shopping experience. That's why not.

Let me state right up front that, aside from one technical issue, I didn't actually have any problems making my purchase. I successfully signed up for an account, paid for the DVD set through PayPal, and today received a notice from the seller that it had shipped. So in terms of actually getting what I wanted, it worked perfectly. And yet, the process of setting up an account and buying the item through PayPal was so thoroughly annoying that it put me in a foul mood for the rest of the night. It's almost a case study in how not to implement an online shopping system.

Problem 1: Long, annoying registration

My first problem with eBay was the long, annoying account registration form. Actually, I think it might have been multiple forms. There were so damn many forms involved in this process that I kind of lost track after a while.

However, I do clearly remember that eBay needed a credit card number to register an account, even though they claim they won't actually charge it and they don't directly handle payment. This alone made me uneasy. So that's -5 points for eBay right off the bat.

Problem 2: Broken AJAX

Related to the annoying registration form was the one actual problem I encountered. As part of the registration, you have to select a unique eBay username. Of course, to me this was nothing but a hinderance, because I don't give a damn about establishing any kind of identity or reputation on eBay - I just want to buy the stupid DVD and get the hell off their site!

Anyway, to make the registration process somewhat more "user friendly", the username box had a button to check that the name you entered was unique. This button disabled the username box and form submit button, started a little "waiting" progress indicator, and made an AJAX call to eBay's servers.

Unfortunately, the AJAX call never returned. A quick look at Opera's error console indicated that this was actually due to the JavaScript violating cross-domain security rules. This cause Opera to (correctly) terminated the script. However, that left me high and dry, because the script had disabled the form's submit button, so I couldn't test the username the old-fashioned way. Instead, I just had to refresh the page and fill in the form again. That's -10 more points for eBay.

Problem 3: Confusing payment

Now that I had an eBay account and had committed to buy the DVD set, it was time to pay. Like a fool, I selected the seller's preferred payment method: PayPal. In particular, I used a credit card through PayPal.

This proved to be somewhat more confusing than I would have thought. For the payment, I was redirected to a third-party payment service. However, I was paying through PayPal, and was eventually redirected to them. Don't ask me why. I don't really understand why I couldn't just go straight to PayPal.

What's worse, at no point during the payment process was I actually sure that my credit card had been billed. At one point, I thought I had successfully paid, but was them prompted to login to PayPal or create an account. After that, my transaction was apparently complete. I think. Or maybe I didn't need to create a PayPal account. I'm still not clear on that.

I'd say this is -20 points to eBay and/or PayPal. I mean, I'm a computer programmer, for crying out loud. I'm not supposed to get lost navigating thought payment forms. I know I'm not perfect, but even on a bad day, a process like that has to be pretty poor for me to get as confused as I was. Hell, I still don't know what the heck happened. All in all, the whole process had a really ad hoc feel to it. Too ad hoc. When it comes to dealing with money, I don't like to feel as if software handling the transaction is held together with the code-equivalent of Scotch tape and bubblegum.

Problem 4: Saving my credit card

I had a couple of nits to pick with the PayPal signup process. The first was that they unilaterally decided to save my credit card information to make future purchases more "convenient."

The problem is, I don't want PayPal to store my credit card information. I don't trust them, or anybody else, to keep that on file. In fact, when I have the choice, I make it a point to never let any online store save my credit card info. I want them to hold onto it just long enough to get the charge authorized and then forget it.

I say that's -20 points for PayPal. It was nice of them to inform me that they were saving my info, and it was nice that it was easy to delete it after the fact. But they really should have offered some kind of opt-out feature. Is that too much to ask?

Problem 5: PayPal vitiates its own credibility

You know what smacks of incompetent, amatuer web design? Pages that play sounds. And that's exactly what one of the pages in PayPal's setup process did.

It was on a page with some kind of coupon offer. When the page loaded, a voice actually started reading some kind of instructions. I couldn't believe it. It was like a bad flashback to Geocities circa 1996.

And to make matters worse, the page was already really text-heavy to begin with. So not only did I have all this text to deal with, but I also had to listen to somebody yammer on about something I probably didn't care about. All I can say is, "Thank God for the mute button."

I give PayPal -100 points for that little "feature." That was the straw that broke the camel's back. It completely destroyed any trust I had in PayPal. In fact, at that point I seriously considered just walking away from the whole transaction and buying from a different site. And if I hadn't already entered my credit card information, I probably would have. I was already a little wary of PayPal, so the last thing I wanted to see was fourth-rate incompetent "webmaster" crap like that. If that's the best they can do, I don't want them handling my money.

The end

So there you have it. A happy ending to a miserable experience. I wasn't cheated or misled, and yet I regretted making this purchase before I was even done with it. I felt a little better about the whole thing in the morning, and a lot better after seeing that my purchase had shipped, but the whole experience left a bad taste in my mouth. I don't know if I'll ever use eBay or PayPal again. But at the very least, I won't be running off to do so any time in the forseeable future.

Is HTML 5 stupid?

I've been thinking a little about HTML 5 today, thanks to a blog entry by Roger Johansson on taking the semantics out of HTML. Apparently, the HTML 5 WG mailing list as gone off into loony-land and Roger is a bit concerned that the accessible, semantic markup he and other have been promoting for years is going to fall by the wayside in order to make things easier on browser vendors. And so far, his fears are not without some justification.

I can't claim to be an expert on HTML in general or the HTML 5 development process, but from what I've read, things do seem to be getting a little...weird. For instance, the requirement that all pages served as text/html will be treated as HTML 5. Apparently the idea is that HTML 5 will be backward-compatible with all existing versions of HTML and, I would guess, XHTML 1.0. Or rather, it will have to be if they intend to keep that requirement. Otherwise they really will "break the web."

There's also this item from the HTML 5 FAQ responding to the "HTML 5 legitimates tag soup" claim. It describes the difference between requirements for conforming documents and requirements for user agents. For example, user agents are required to support the MARQUEE tag, but conforming documents cannot contain it.

My question is: Is this a good thing? html5.pngAt this point, I really don't know. It seems like the W3C is trying to codify the the tag soup that browsers have been forced by circumstances to support into a single, coherent standard. On the one hand, this is undoubtedly a good thing, as it will (if done properly) solve the problem of different user agents rendering things in inconsistent ways.

On the other hand, is this really solving the long-term problem? In the marquee example, I understand that the document and user agent requirements are orthogonal. But if user agents are required to support a tag, won't people feel entitled to use it? And if people aren't supposed to use it, wouldn't it be better to discourage that by deprecating it or writing it out of the spec altogether? Am I just being naive here?

What I'm getting at is that I'm not sure how (or even if) this approach is supposed to fix the generally miserable quality of markup on the web. People relying on the browser to implement things a certain way is what got us into trouble in the first place. By stating that browsers will now be required to implement MARQUEE in perpetuity, but you shouldn't use it, isn't that effectively giving people a license to use it? If the people writing HTML cared what the specs recommended, we wouldn't have MARQUEE in the first place. I can easily imagine people taking the browser rendering requirements as gospel and simply disregarding the document conformance requirements.

While I do have some sympathy for the forgiveness by default philosophy, I think we need to focus a little more on discipline. The laissez-faire attitude of the 1990's left us with a web that was held together with duct tape and bailing wire. Surely we want to do better than that. I'm not necessarily advocating the strict XHTML "validate or die" approach, but surely there must be a happy medium between that and semi-legitimizing every ill-advised vendor extension to HTML out there. Surely.

Sprucing up the blog

I've been trying to spruce up the blog the last few days. I'm trying to make things a little more reader-friendly and possibly increase traffic a little.

First, for the 11 people who actually subscribe to it, I've changed the RSS feed over to FeedBurner. That gets me subscriber statistics, a little extra exposure, and a bunch of other miscellaneous features. It's also completely painless for everybody else, since I just set a mod_rewrite rule to redirect everybody to the new external feed. I even did a little hacking to make the new feed URL integrate nicely with LnBlog.

I also set myself up with a Technorati account. This provides another handy tool for gaining wider exposure. It's a nice complement to TrackBack and Pingback in addition to actually being a nice service to use.

I'm also experimenting with my page layout. In particular, I'm reorganizing the sidebar. I don't really know what the optimal layout it, but I do know I wasn't particularly happy with the old layout. There's still plenty of stuff left to play with, so I guess I'll just try some different things and see what works.

Tomorrow I'll get into the motivation for the change. Right now, I need to go to bed. I was just barely able to muster the concentration to write this, so I'm certainly not up for a long explanation.

No, bloggers aren't journalists

Last week, Jeff Atwood posted an anecdote demonstrating yet again that bloggers aren't real journalists. I know this meme has been floating around for some years, but I'm still surprised when people bring it up. In fact, I'm still surprised that it ever got any traction at all.

I'm going to let you in on a little "open secret" here: blogging in 2007 is no different than having a Geocities site in 1996. "Blogging" is really just a fancy word for having a news page on your website.

Oh, sure, we have fancy services and self-hosted blog servers (like this one); there's Pingback, TrackBack, and anti-comment spam services; everybody has RSS or Atom feeds, and support for them now built into browsers. But all that is just gravy. All you really need to have a blog is web hosting, an FTP client, and Windows Notepad.

That's the reason why bloggers in general are not, and never will be, journalists. A "blog" is just a website and, by extension, a "blogger" is just some guy with a web site. There's nothing special about it. A blogger doesn't need to study investigative techniques, learn a code of ethics, or practice dispassionate analysis of the facts. He just needs an internet connection.

That's not to say that a blogger can't practice journalism or that a journalist can't blog. Of course they can. It's just that there's no necessary relationship. A blogger might be doing legitimate journalism. But he could just as easily be engaging in speculation or rumor mongering. There's just no way to say which other than on a case-by-case basis.

Like everything else, blogging, social media, and all the other Web 2.0 hype is subject to Sturgeon's law. The more blogs there are out there total, the more low-quality blogs there are. And the lower the barrier to entry, the higher the lower the average quality is. And since blogs have gotten insanely easy to start, it should come as no surprise that every clueless Tom, Dick, and Harry has started one.

I think George Carlin put it best:

Just think of how stupid the average person is. Then realize that half of them are stupider than that!

Any average person can be a blogger. Thus the quality of those blogs will follow a standard distribution. For every Raymond Chen, Jeff Atwood, and Roger Johansson, there are a thousand angst-ridden teenagers sharing bad poetry and talking about not conforming in exactly the same way. They're definitely bloggers, but if we're going to compare them to journalists, then I think society is pretty much done for a blog.

Top excuses for bad design

Roger Johansson over at 456 Berea Street has posted a really great rant on lame excuses for not being a web professional. It's a great read for any developer (of any type) who takes pride in his work.

My personal favorite excuse is the "HTML-challenged IDEs and frameworks." It always seems odd to me that back-end developers can look down on the front-end web designers as "not real programmers" and yet be utterly incapable of writing anything even close to valid markup. Sometimes they don't even seem to aware that there are standards for HTML. They have the attitude that "it's just HTML," that it's so simple as to not even be worth worrying about. "Hey, FrontPage will generate it for me, so I'll just concentrate on the important work."

This is fed by the "the real world" and "it gets the job done" excuses. The "target audience" excuse even comes into play a bit. After all, nobody in the real world worries about writing HTML by hand. Especially when FrontPage gets the job done. And since our target audience all uses Internet Explorer anyway, it's all good.

This sort of thinking is especially widespread in the "corporate IT" internal development world. For example, if Roger ever saw the homepage to my employer's "intranet," he would fall over dead, having choked on vomit induced by the sickeningly low quality of the HTML. The index.html page contains an ASP language declaration (but no server-side script in, oddly enough - probably a relic of a Visual Studio template) and a client-side VBScript block before the opening HTML element. Several of the "links" on the page are not actually links at all, but TD elements with JavaScript onclick events to open new pages. For that matter, it uses tables despite the fact that it's laid out as a couple of nested lists of links. Needless to say the markup doesn't even come close to validating against any DOCTYPE. And this was written by a senior programmer who fancies herself the "local expert" on web development.

I guess it's just a problem of mindset. Some people just want to get the project finished and move on. They don't really care about code quality, maintainability, interoperability, or any of those other little things that make for really good software. As long as the customer accepts the final product, everything is fine.

While I don't share that attitude, I can sort of understand it. I'm interested in software development for its own sake. I care about the elegance and purity of my work and enjoy trying new technologies and learning new theories and techniques. But to some people, programming (or web design) is just a job. It's not a hobby or part of their identity, but simply a way to pay the mortgage. For those people, it's probably hard to get excited about semantic HTML or the efficacy of object oriented programming. As long as they get the project done and get paid, that's really all that matters.

Of course, that's just an explanation. It's no excuse for professional incompetence or unwillingness to learn. If you're going to call yourself a professional, I think you have an obligation to at least try to keep up with the current technologies and best practices in your niche. Not everybody has to be an über geek or an expert on the latest trend, but it would be nice if all web developers were at least aware of the basics of web standards, all CRUD application developers knew the basics of relational theory and that XML is more than just angle brackets, and all desktop developers had some basic grasp of object orientation. Yeah, that would be really nice....

What the heck is CDF?

I learned about something new yesterday. While reading a blog entry on ClickOnce security, I noticed a little feed icon down at the bottom of the page. Actually, there were three icons. The first two were RSS and ATOM, which I already knew about. The last one was labeled CDF. I had never heard of that before.

It turns out CDF stands for Channel Definition Format. It's the XML format Microsoft used for the "channels" they tried to push on users some time back. You may remember these from the days of Internet Explorer 4 and Active Desktop.

Although the format is officially obsolete as of Internet Explorer 7, a few people are apparently still using it for blog syndication. I don't know why. I can't imagine there's much demand for this. After all, I'd never even heard of it, and I'm far geekier about these things than the average person. And while I haven't read any of the specifications for CDF, a cursory examination of the file suggests that this might actually have less information than a simple RDF feed, so I don't see much gain.

A CDF feed in Internet Explorer

The only thing CDF buys you, as far as I can tell, is the ability to have blog entry links show up in your Internet Explorer favorites or your Active Desktop (if that even exists anymore), as seen above. Kind of like Firefox Live Bookmarks, but with less support. And I always thought live bookmarks was kind of a crappy feature anyway, so I don't know why you'd want an IE-specific format to get them. I guess it's no wonder CDF is obsolete.

I hope this was machine-generated

In the spirit of the "Code SOD" feature on Worse Than Failure (formerly The Daily WTF), I present the following image. It's from my organization's "electronic bulletin board," which is really just a few static HTML pages maintained. We impose this on our users by adding it to the Windows startup items so that it comes up with every log-on.

Web page source in Vim

I am, mercifully, not involved with this particular item, but I can only assume that this code was generated by an older version Front Page, Word, or other intellectual abortion that Microsoft passes off as a "web development tool."

So, what's your favorite part? Personally, I can't decide between the italicized image and the nested, redundant font and italics tags.

The answer is VS 2005?

Here's a funny one. I saw the image below while reading Kode Vicious's latest column on ACM Queue. Someone going by the name "Scanning for answers" wrote in with a question about static analysis tools. Note the ad that appears between his signature and KV's response.

kv-tn.jpg

I thought that was hilarious. Microsoft couldn't have bought better text placement. They should use that in their advertising campaign. "Scanning for answers... Found 1 answer: Visual Studio 2005."

Cheesy? Maybe. But seeing how they're the face of big, soulless corporations, Microsoft could use a little humanity injected in every now and then.

Firefox feeds fixed

I've discovered why Firefox hates me. Apparently it's not just me, but broken code in Firefox.

I found the solution in this Mozillazine thread. It contained no mention of the error I got in the console, or of external feed readers simply not launching, but the patch to FeedConverter.js did fix my problem.

All I had to do was change the indicated section of /usr/lib/firefox/components/FeedConverter.js, start Firefox, and add the strings browser.feeds.handlers.application.args and browser.feeds.handlers.application.uriPrefix in the about:config dialog.

So now external feed readers work. In fact, they work better than they're supposed to, because this patch adds support for setting arguments to the external command, so no wrapper script is required. For Akregator, I just set "--addfeed" as the browser.feeds.handlers.application.args and /usr/bin/akregator as the command. Bingo!

Firefox hates me

Nothing ever goes right when I try to use Firefox. I don't know why. I never do anything that exotic. But for some reason, it always falls short.

This time, it's the new RSS features in Firefox 2. You click on a feed link and you get a nice, readable display of the content with a "Subscribe Now" button at the top that lets you subscribe to it with several online services or allows you to pick a desktop application. However, in my case, it doesn't work.

And when I say "doesn't work" I don't mean it doesn't work as well as I'd like, or that it pops up an error message. I mean it does absolutely nothing. I open a feed, point Firefox to a script to add it to Akregator, click the "Subscribe Now" button and...nothing. Absolutely nothing happens.

I know it's not the script, because it works perfectly on the command line. I know it's not something in my profile because I already tried doing this from a blank profile. The only conclusion is that the Ubuntu Edgy Firefox package is broken.

There was one piece of feedback I turned up. After clicking the subscribe button, two instances of an extremely obscure error message show up in the error console. The exact text is:
Error: [Exception... "ServiceManager::GetService returned failure code:" nsresult: "0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE)" location: "JS frame :: file:///usr/lib/firefox/components/FeedConverter.js :: FRS_addToClientReader :: line 338" data: no]
Source File: file:///usr/lib/firefox/components/FeedConverter.js
Line: 338

The referenced line of code is this:
var ss = Cc["@mozilla.org/browser/shell-service;1"]
.getService(Ci.nsIShellService_MOZILLA_1_8_BRANCH);

I have absolutely no idea why this is failing or how to fix it. That is particularly frustrating because it's the only thing I have to go on. Does anybody else have a clue?

I'm back online

I'm now officially back online! If you tried to access this site during the last three days or so, you may have noticed it was broken and/or unavailable. That's because I got myself new web hosting.

As I mentioned before, I decided to go with the ICDSoft universal hosting plan. It's nice, cheap, and has all the features I need. I'll probably write up a little review of it after I've used it for a while.

I ordered the hosting account Wednesday night and had my login information within half an hour. I sucked down the entire contents of my old hosting account overnight and then changed the DNS on my domain and started uploading to the new host Thursday morning. It turns out that FTP transfers to and from both the old and new hosts were extremely slow. However, I suspect that may have something to do with Time Warner's internet service.

My only problem was a small blog configuration issue. I failed to account for LnBlog's LOCALPATH_TO_URI_MATCH_RE setting. That's the one that changes paths like /home/peter/www/ to URIs like http://somehost/~peter/. The problem was the my new host uses the path /home/yourdomain/www/www/ as the document root, so LnBlog's default setting picked that up as a ~username URL. I noticed that when trying to blog earlier today and getting a 404 on the mangled URL. Easily fixed by adding a line to a configuration file, but I stlil find it annoying that I didn't remember that earlier, what with having written the program myself.

E-mail woes

It seems my web host is doing funny things with the mail server. Last week, when trying to send an e-mail, I got a strange error message - something about an unauthorized server, which I didn't bother to write down. Anyway, a quick Google search revealed that the fix was to switch KMail to using authenticated SMTP.

Tonight, I try to send a message, and what do I get? An error message saying I'm already authenticated. So I turn off authentication and my message goes out. What the heck is going on?

Incidentally, I think I might have decided on a new web host: ICDSoft. Their personal hosting plan is only $6/month and they're currently ranked #4 on Web Hosting Jury. They seem to be a fairly no-nonsense kind of hosting operation, which appeals to me. Their plans don't offer the big numbers and huge feature sets of companies like Lunarpages or DreamHost, but they have a lot more than my current host and they meets my needs.

I actually view the smaller numbers, i.e. 1GB storage and 20GB bandwidth, as a selling point. Most web hosts these days seriously oversell their disk space and bandwidth on the premise that most sites will only use a small fraction of it. However, with the really big numbers, you have to be skeptical of the company's ability to make good on the service. For example, can DreamHost really give any significant proportion of its customers the 200GB of storage ahd 2TB of bandwidth they promise? Their service may be very good, but they're basically claiming they can host Slashdot for $8/month. Sounds too good to be true if you ask me.

Let me just edit my Google ranking...

I guess this should be filed under "too funny to make up." It seems that some crazy person has been threatening one Dean Hunt because his blog is ranked too high on Google. Apparently Dean's site ranks higher for the same of some product that the crazy guy is selling and he wants Dean to do something about that, by God. There's a short and humorous summary at Search Engine Journal.

Now, I think we can all agree that this unnamed merchant doesn't have a clue. The fact that he would even make a bizarre request like this proves beyond a shadow of a doubt that he has no idea how Google works and probably hasn't worked too hard on his site's SEO.

However, I don't think Dean's response was really appropriate. While the merchant's request was certainly misguided and unreasonable, it was apparently made in earnest. As such, I think the obvious contempt in Dean's response was a little much. It certainly didn't do anything to resolve the situation.

This kind of attitude is how "computer geeks" get a reputation for being arrogant. When someone makes an unreasonable request, there's no need to say or imply that they're stupid. Instead, give the person the benefit of the doubt. You don't have to devote hours to patient explanation of the underlying concepts, just politely say that it simply doesn't work that way and there's nothing you can do about it. If the person continues to be unreasonable after that, then just walk away. Insults and alienation are seldom productive (unless you're goal is to piss people off).

Why don't they validate?

You know what I noticed in my hosting company research? Nobody's home page is valid. Of the few dozen I've looked at so far, not a single one has had a home page that goes throught the W3C's HTML validator cleanly.

To me, that's a real turn-off. I know that the vast majority of web sites in existence are, frankly, complete and total garbage from a coding standpoint. I can accept that. I can even accept that many of them have a good reason for being complete and total garbage, like a sickeningly expensive legacy CMS. But for tech companies that exist almost entirely on the web, and many of which offer web design services, this is just disgraceful. To me, it indicates that they just don't take their web design work seriously.

And you know what? It's not the fact that some of them have crappy markup that bothers me. It's not even the fact that most of them have crappy markup. It's that all of them do. Out of all those sites, I would have expected at least a couple of them to have valid markup.

Is this really too much to ask? I mean, it's almost 2007, for crying out loud! Get with the program, people! At the very least, take a little damn pride in your work and put up a site with half-decent code. Remember, just because it looks good doesn't mean it is good.

Looking for new hosting

For the last few weeks, I've been looking into new web hosting. My current host is, frankly, a bit crappy and I'm looking to trade up. The service hasn't been bad, but it hasn't been great either, and they don't really offer much in the way of features. In particular, I'm looking for subdomains and imap e-mail, which they don't offer.

So now I'm in the middle of the daunting task of picking a new host. I'm finding it quite difficult to locate reliable-sounding information on how good any of these companies are. After all, half of them promise you the moon on a stick for $5 a month with 99.9% uptime and the Dalai Lama working the support line, so it's not like you can take their word.

There are lots of hosting review sites, but many of them appear to be full of shills, with short, useless positive "reviews" that inspire no confidence. So far, Web Hosting Jury is one of the more credible looking review sites I've come across. The reviews tend to be a bit longer and there are lots of highly negative reviews (which means they're not shills). They also have dates and IP addresses attached to the posted reviews, which adds a bit of confidence.

Right now, I'm leaning toward the Lunarpages basic hosting plan. It includes basically everything I want and is pertty well rated. I'm going to be doing a little more research, though, as the last three hosts I thought I'd decided on ended up looking not so great on closer inspection.

Playing with XSL

Today I finally got around to learning a bit of XSLT. It's one of those things that's been on my list for a while, but I just never really had need for it before.

Well, since I converted LnBlog's data files to actual XML last week (it's about freaking time), I figured I'd experiment with displaying the entries without any server-side processing. To that end, I went through the W3 Schools tutorial and tested out a couple of XSL stylesheets to see how it works. I probably won't actually do entry display that way, but it seemed like an interesting experiment.

What I may actually use XSL for is RSS feeds. You know, so I can get that cool "my feed is a web page" thing like FeedBurner has. The only problem I had on that was getting my entry text to display as HTML. For some reason, when I try to loop through the channel items, Opera displays the HTML is the descriptions shows as raw HTML code. Doesn't seem to matter whether the entities are escaped or the whole thing is enclosed in a CDATA block. I guess I'll need to do some actual studying to figure that one out. Who'd have guessed?

Filters suck

You know what sucks? Internet filters. In particular, my employer's internet filters.

You know why they suck? One word: Wikipedia. About a month or so ago, our filters started blocking freakin' Wikipedia. Sure, it has it's pointless side, like the 13 page Chocobo article. Honestly, who needs 13 pages on an imaginary bird from a video game series, even if it is the best game series ever (rivaled only by Wing Commander)?

But there's actually lots of useful information on Wikipedia, particularly on technical topics. For instance, I've found it quite useful for explaining some of the telecom lingo I have to deal with on occasion. It might not be the most definitive reference in the world, but it's very good for quick explanations of unfamiliar topics.

I guess I shouldn't be surprised, though. We also blocked about.com for quite a while. The blacklist seems to be kind of off sometimes.

Digg users are morons

It's official: the Digg userbase is full of losers and morons. Of course, we all knew that already, but here I have photographic evidence. Observe:
Digg "YouTube down" article in Akregator

That's right: over 300 people voted for a "story" that was nothing more than a statement that YouTube wasn't working for half an hour or so. Better yet, it wasn't even a story: if you look at the URL, it was link spam for somebody's Counter Strike site!

Isn't it nice to know you can trust the Digg user community to carefully examine each story and weed out the garbage? Much better than those lazy, incompetent editors over at Slashdot! For example, take that time earlier this year when Slashdot had all those links to that crack-pot junk "science" site, rebelscience.org. The good users at Digg got the same submissions and -- oh, wait, the Digg community voted up a bunch of those links too. Well, at least the people at OSnews -- hold on, they published at least one of the same links. Hmmm... I guess democtatic, user driven sites can publish just as much garbage as sites controlled by a small group of editors. The only difference seems to be the the user-driven sites can publish greater volumes of junk in less time.

E-mail obfuscation?

I was reading a thread on e-mail and spam over at Tek-Tips the other day, and several of the posters recommended e-mail obfuscation as a method for avoiding spam. For example, listing your address as "bob AT foo DOT com" or "johnATgmailNOSPAM.com" instead of the actual address. This sort of thing is everywhere now, and has been for several years.

My question is: why are people still doing this? Does it actually still work? I mean, is there any actual evidence that obfuscating your e-mail addresses is an effective way to combat spam? Or is it just that it used to work years ago and nobody has bothered to re-evaluate method?

I've read studies in the past that indicated this was effective, but nothing recently. For example, about three years ago I read a paper that found entity and/or URL encoding your address worked very well, e.g. the letter "A" would become &amp;#65; in the text and %41 in the clickable link.

But that was years ago. And while the spammers may be subhuman dirt-bags, they're not stupid or lazy. I find it impossible to believe that the people writing address harvesters have just been sitting on their thumbs for the past three years. These obfuscation methods have been in wide circulation for some time, so they must have accounted for them by now.

And when you think about it, it's not even really that hard. For example, converting URL and entity encoding to plain text is a simple matter for anyone with a Python/Perl/whatever interpreter and a chart of the relevant character set. Likewise, accounting for simple obfuscations like the ones I mentioned earlier is well within the abilities of any competent programmer. A talented programmer could do it in an afternoon with a few well-placed regular expressions.

The futility of obfuscation becomes much clearer when you consider that the address harvesters don't necessarily care that much about the quality of the addresses they collect. Sure, high-quality, known-good addresses are more valuable, but the low-quality, probably invalid ones can still be sold for a few pennies per thousands. And since many (if not most) spammers are using botnets to do their dirty work - stealing the bandwidth they use to send spam - they aren't hurt much by having a bunch of bogus addresses in their lists. Why not just try a few variations on any potentially obfuscated addresses just in case you get lucky?

Pretty much the only obfuscation methods I've seen that seem to be effective are putting the address in an image and using some convoluted JavaScript to disguise the e-mail address, but still make the mailto link function normally. The problem with these approaches is that they're extremely annoying and inaccessible to people with disabilities. They also don't offer any guarantees. Image recognition software is getting better and there's nothing to stop the harvesters from implementing JavaScript interpreters, so while both techniques may work now, it seems they're living on borrowed time.

It seems to me that the whole thing is just an ill-conceived battle to maintain the old way of doing things. If you're really that concerned about your e-mail address being harvested on a web site, then just don't display it on the site at all. Just use a PHP form mailer, or something. They're not hard to set up and they offer complete protection because your e-mail address doesn't have to appear on the page in any way, shape, or form. They also have the advantage of being completely accessible to users with visual impairments or who, for whatever reason, can't use JavaScript.

So, in conclusion, please don't obfuscate your e-mail address. It's really annoying, sometimes inaccessible, and there's no evidence that it still works but plenty of reason to suspect it doesn't.

End of service problems

Well, I can finally stop complaining about my crappy network connection. It got really bad last weekend, so on Monday I finally called Time Warner to complain.

The good news is that they got a tech out here the next day. The bad news was that, as usual, the "appointment" was for sometime between noon and 5PM. The even worse news was that shortly after the tech left, the service went out again.

So, after another call, I ended up going down to the Time Warner office to trade in my six-year-old cable modem for a newer one. This seemed to fix the problem.

However, it seems that the frequent service outages played havoc with KMail. At least, I'm guessing that's what caused the problem. All I know is that, despite having KMail up and running the whole time, I didn't get any e-mail between Monday and Friday. On Friday, I logged out and logged back in later, only to discover messages from Tuesday in my inbox.

I don't know what happened there. I never saw any error messages, even when I manually checked my mail. It just silently failed. Apparently closing KMail fixed the problem, but it's still really annoying - especially since one of the messages from Tuesday was important.

Net neutrality

I just saw something I've never seen before: a TV commercial attacking net neutrality. Apparently they've started appealing directly to voters now.

I must admit to having some degree of ambivalence about net neutrality. Let's face it: it's just one set of big companies fighting another. Who should pay for the internet: Google or Time Warner? Does it really matter to me which one it is?

Not that I have no opinions on the issue. I'm definitely not a fan of the "tiered" service concept. Discriminating against certain kinds of data doesn't appeal to me either. When you take the money out of the calculation, none of this is good for the end user.

On the other hand, you really can't take the money out of the calculation. And it's not like the service providers don't have a point. Somebody has to pay for the cost of providing bandwidth, and a non-neutral scheme might very well result in lower overall costs and/or lower costs for end-users. At least, that's the claim. I don't claim to know enough about the business to evaluate its truth.

I think the only thing I'm really sure of in this discussion is that getting the government involved is a bad idea. In fact, as a public servant, I take it as a general rule that getting the government involved is nearly always a bad idea. And what with the DMCA and software patents, it's not like the US government has the best track record on technical issues.

So for now, I'm more inclined to let the market decide this issue. Who knows, the non-neutral net might not even be really feasible. We can only hope....

Windows paths in PHP

Time for the PHP annoyance of the day: includes on Windows. PHP 4 has a nasty bug with the way it handles the require_once() and include_once() functions on Windows and I got bitten by it today.

If you don't know PHP, there are four functions to include code that's in other files: include(), require(), include_once(), and require_once(). The include function works the same way as #include in C: it just dumps the contents of the given file into the current one. The require() function does the same thing, but errors out the script if the file cannot be included for some reason.

Now, the *_once() varieties have a handy extra feature: if the given file has aleady been included, then they won't include it again. This is nice because it keeps you from needing to worry about errors caused by re-including the same fucntion or class library. The only problem with these functions is that they don't work correctly on Windows.

This problem came up while I was testing LnBlog on Windows. See, LnBlog stores each blog in a folder located outside the program directory, and the blog URL is actually the folder URL. It makes for a nice URL structure, but it means that the wrapper scripts that generate your pages have to be told where the LnBlog program files are. Well, at some point, while I was messing around with my test blog, I changed the path to the LnBlog directory.

Actually, that's not quite right. I changed the string that represents that path. The path that string referred to was actually correct. It's just that the path I gave was all lower-case, while the path on the filesystem was mixed-case.

It seems PHP 4 doesn't like it when you do that. Apparently it checks if a file has been included by storing the full path to each included file in a list and then doing a simple list search at each subsequent include. So one script include a require_once("lib/utils.php") and it would be included relative to the mixed-case current directory. That's fine. Then that script would include another files that did the same thing. Only this file apparently found the file relative to the include_path, which had the all lower-case path. Same path, but a different string representing it. Since PHP was apparently doing a simple string comparison to check if the file was included, it concluded that these were actually two different files and included the same one again. Bah!

Something of an exoteric bug, but still a pain. Although, to be fair, this is fixed in PHP 5. Not that it matters, because my target audience is still made up of people who don't necessarily have PHP 5.

To be honest, I'm getting a little sick of PHP. I was playing with Python again last week, and PHP is just painful by comparison. I'm starting to agree that it really is the "Visual Basic of the web." The only thing going against that impression is the fact that PHP treats Windows as a second-class citizen.

Online applications won't work

Periodically, you see stories making the rounds about how online applications are going to completely supplant traditional desktop applications. In fact, these ideas have recently been extended to encompass the entire desktop, with the rise of web-based "operating systems."

It sounds great, doesn't it? All your data and all your applications would be available from a central server. You could use any computer, anywhere in the world, to access your personal, customized desktop, and it would always be exactly the same.

However, over the last month or so, and this week in particular, I've experienced the perfect proof that such ideas are, well, over-rated. That proof is internet outage.

Yes, Time Warner's Road Runner cable internet service has been very unreliable the last month or so. It's normally pretty good, but I've been experiencing frequent outages, usually for several hours at a time.

With wide broad-band availability, many of us have started to take high-speed, always-on connections for granted. Putting your entire desktop online is great when you know you will always be able to access it. But if everything is online, then when your connection goes down, your computer is completely useless.

The everything online philosophy also seriously limits the usefulness of laptops. I know that may sound shocking to some people, but the truth is that you can't get free WiFi access everywhere. In fact, there are many places where you can't even get paid WiFi access. For instance, my wife sometimes takes the laptop to work on slow days, where they have no web connection (though I'm not sure why) and no wireless access points nearby. On those days, it's nice that OpenOffice and Klickity (her new addiction) are desktop, rather than web, applications.

Not that I have anything against web applications. They're great! It's just that, like everything else in the world of computing, they've been over-hyped. Not everything needs to - or should - be a web application. Not every web application has to use AJAX. Not every program will benefit from using the latest trendy technology. And, finally, one that seems to have finally sunk in: not every application has to incorporate XML in some way, whether it makes sense or not.

MSDN pain

Will someone please tell me when MSDN started to suck? I remember back when I first started with Visual Basic, MSDN was really great. It was a wonderful reference source with lots of good material. The site was relatively quick and easy to use, the documentation was useful, and the examples tended to be at least moderately informative.

What the hell happened? Today I was looking up some information on using the XPathNodeIterator class in the .NET framework and Google directed me to the MSDN page for it. It was horrible!

The first thing I noticed was the truly massive page size. I literally sat there for seven seconds watching Opera's page load progress bar move smoothly from zero to 100%. And that's on the T1 connection at work!

The second problem is the class declaration, which says that it's a public, abstract class that implements the ICloneable and IEnumerable interfaces. There's nothing wrong with including that information per se. I personally don't think that including the code for the declaration is particularly helpful, as they could just as easily say that in pseudo-code or English, but whatever. What I do object to is that they included this declaration in five different programming languages! Why?!?! Of what conceivable value is it to waste half a screen worth of text to display a freakin' declaration in VB, C#, C++, J#, and JScript? Is the average Windows programmer really so completely clueless that he can't decipher this information without a declaration in his particular language? It's ridiculous!

The third problem is the code samples. Or should I say "sample." There are three code blocks, each of which has exactly the same code, except translated into different languages - VB, C#, and C++. Again, why? Is this really necessary? And if it is, why do they have to display all three on the same page? Why not break out at least two of the samples into separate pages? It's just a pain to have to sort through lots of irrelevant information.

My last complaint is the content of the example itself. Maybe this is just a product of my not yet being too familiar with .NET or with object-oriented enterprise-level frameworks in general, but the code sample just struck me as kind of bizarre. The goal of the algorithm was to iterate through a set of nodes in an XML file. To do this, they created an XPathDocument object and got an XPathNavigator object from that. Fine. Then they selected a node with the navigator object to get an XPathNodeIterator object. OK, I get that. Then they saved the current node of the iterator, which returns an XPathNavigator. Umm.... And after that, they selected the child nodes from the navigator to get another XPathNodeIterator, which they then used to actually iterate through the child nodes.

Is that normal? Do people actually write code like that? I mean, I can follow what they're doing, but it seems like an awfully circuitous route. Why not just go straight to from the initial navigator to the final iterator? You can just chain the method calls rather than creating a new variable for each object that gets created, so why not do that? I suppose the charitable interpretation is that the example is intentionally verbose and general for instructive purposes. But to me, all those extra object variables are just confusing. It makes for another, seemingly redundant, level of indirection. Maybe I'm atypical, but the direct approach makes a lot more sense to me.

Fixing sites with Opera

Well, after a bit of experimenting, I implemented my first quick-and-dirty site-specific fix in Opera. It wasn't even that hard.

The motivation came when I received a site update e-mail from theotaku.com, which I apparently registered with at some point. I had completely forgotten about it. I reaquainted myslef with it and rediscovered the fairly decent selection of images they have.

The only problem was that their homepage layout was completely garbled in Opera. It consists of a bunch of div tags that divide the content area up into news entries. However, the divs end up being crushed down to almost nothing and the text spilling out and running together. It turns out the problem was a height: inherit line in the stylesheet for the class applied to those divs. I'm not sure what the purpose of that line was, but removing it fixed the problem.

Getting the site to render correctly for me turned out to be quite simple, once I figured out how to do it. I ended up simply downloading a copy of the (single) stylesheet for the page, removing the problem line, and setting it as "my style sheet" in the site preferences dialog. That allowed me to simply change the view mode from author mode to user mode and ta-da! The page now renders correctly.

PHP suckiness: XML

After weeks of mind-numbing IT type stuff, I'm finally getting back into programming a little. I've been playing with the .NET XML libraries the past couple of days. In particular, the System.XML.XPath library, which I found quite handy for accessing XML configuration files. So, after reading up a bit on XPath, XSLT, and XML in general, I was naturally overcome with a fit of optimism and decided to look at converting LnBlog to store data in XML files.

Currently, LnBlog stores it's data in "text files." What that really means is that it dumps each piece of entry meta into a "name: value" line at the beginning of the file and then dumps all the body data after that. It's not a pretty format in terms of interoperability or standardization. However, when you look at it in a text editor, it is very easy to see what's going on. It's also easy to parse in code, as each piece of metadata is one line with a particular name, and everything else is the body.

This scheme works well enough, but it's obviously a bit ad hoc. A standard format like XML would be much better. And since PHP is geared mostly toward developing web applications, and XML is splattered all over the web like an over-sized fly on a windshield, I figured reading and writing XML files would be a cinch.

Little did I know.

You see, for LnBlog, because it's targeted at lower-end shared hosting environments, and because I didn't want to limit myself to a possible userbase of seven people, I use PHP 4. It seems that XML support has improved in PHP 5, but that's still not as widely deployed as one might hope. So I'm stuck with the XML support in PHP4, which is kind of crappy.

If you look at the PHP 4 documentation, there are several XML extensions available. However, the only one that's not optional or experimetal, and hence the only one you can count on existing in the majority of installations, is the XML_Parser extension. What is this? It's a wrapper around expat, that's what. And that's my only option.

Don't get me wrong - it's not that expat is bad. It's just that it's not what I need. Expat is an event-driven parser, which means that you set up callback functions that get called when the parser encounters tags, attributes, etc. while scanning the data stream. The problem is, I need something more DOM-oriented. In particular, I just need something that will read the XML and parse it into an array or something based on the DOM.

The closest thing to that in the XML_Parser extension is the xml_parse_into_struct() function, which parses the file into one or two arrays, depending on the number of arguments you give. These don't actually correspond to the DOM, but rather to the sequence in which tags, data, etc. were encountered. So, in other words, if I want to get the file data into my objects, I have to write a parser to parse the output of the XML parser.

And did I mention writing XML files? What would be really nice is a few classes to handle creating nodes with the correct character encoding (handling character encoding in PHP is non-trivial), escape entities, and generally make sure the document is well-formed. But, of course, those classes don't exist. Or, rather, they exist in the PEAR repository, but I can't count on my users having shell access to install new modules. Hell, I don't have shell access to my web host, so I couldn't install PEAR modules if I wanted to. My only option is to write all the code myself. Granted, it's not a huge problem, so long as nobody ever uses a character set other than UTF-8, but it's still annoying.

Maybe tomorrow I can rant about the truly brain-dead reference passing semantics in PHP 4. I had a lovely time with that when I was trying to optimize the plugin system.

Good-bye TrackBack Spam

Today, I happened across an interesting paper on TrackBack spam called Taking TrackBack Back (from Spam), by a team at Rice University. In fact, it was so interesting and sensible, I immediately implemented it on my weblog.

If you have a blog with TrackBack support enabled, you've probably been hit by TrackBack spam. In fact, according to the paper, approximately 98% of all TrackBacks are spam. To me, this is not even remotely surprising, as every single ping I've gotten since I implemented TrackBack in LnBlog has been spam. I've been fighting it with IP blacklisting and content filtering, but it's a losing battle. After implmenting Pingback last week, I was seriously considering just disabling TrackBacks on my blogs.

The problem with TrackBack, if you've read anything about it, is that it's completely unauthenticated. To send a TrackBack ping to a blog entry, all you need to do is send an HTTP POST, populated with whatever data you like, to a specific URL. Although it is required by the specification, the most obvious (and common) implementation of TrackBack is to simply accept and store the information sent by the client. Needless to say, this leaves you completely vulnerable to spammers.

Pingback is supposed to fix this by virtue of the fact that the server receiving the ping does all the work. The client just sends an XML-RPC request with the URL of the page to ping and the URL of the page that references it. The server is not required to do anything, but it is recommended that it fetch the referring page, check that it links to your site, and extract some information to display, like a title and excerpt.

However, as the Rice University paper points out, there's no requirement in the TrackBack specification that you just take what the client gives you. In fact, the anti-spam measure recommended by the paper is essentially to do what the Pingback spec recommends - fetch the page and see if it links to you. Not only is this compatible with the TrackBack specification, but it is also, according to their information, highly effective.

The beauty of this is that it's so obvious. In fact, when I read it, my first reaction was, "Why didn't I think of that?" Although it's not required, TrackBacks from legitimate blogs will virtually always include a link to your blog. After all, how else will the readers know about your entry? However, this is almost never the case for spam pings. The spammers aren't at all interested in what your blog says - they just want to spray their links all over the web. So if the page doesn't link to your site, you can be pretty sure it's spam. And if the page does link to my site - well, at least it's boosting my Google Page Rank.

Opera and Akregator

Yay! I can finally do it! I can finally use Opera and Akregator together! Well, at least to a certain extent.

Yesterday I discovered this blog entry by zeroK on this very subject. The basic concept is so simple it's brilliant: define a custom protocol. Opera allows you to modify the handler programs for protocols like mail, telnet, etc. Well, the solution is to simply define a feed:// protocol and set the handler to your RSS aggregator.

Unfortunately, there's really no such thing as a feed:// protocol, so you need some JavaScript. For feeds linked in the page header, the solution was to use the modified bookmarklet that extracts the links and pops up a list with feed:// substituted for http://.

As for the handler application, I banged out a little shell script using DCOP calls and KDialog to add a feed to a selected group. I didn't use the Akregator command line options because they don't seem to work when you're embedding Akregator in Kontact.

The only problem with this is that it doesn't work with Opera's built-in RSS icon. Changing the protocol on the linked RSS feeds with a user JavaScript just seems to make them stop working altogether.

Hopefully Opera will eventually add a setting to configure an external feed reader. While I love Opera as a web browser, I never really cared for the mail client. And since the RSS reader is based on the mail client, I don't like that either. In fact, not only is the feed reader based on a mail client I don't like, but it seems to work more like a mail client than an RSS aggregator. I tried it out again the other day and I really hate it. I'd much rather have something with the three-panel layout like Akregator or SharpReader, so I don't think I'm going to be switching any time soon.

But, at any rate, at least I'm making progress in this department.

Homepages are outmoded

Today's Daily Grind had a link to Tim Haines's thoughts on why he doesn't use Live.com as his home page. I've only looked briefly at Live.com, but I have to say I tend to agree with Tim. However, my real question is, do people actually still use home pages?

I haven't really used home pages in quite a while. My homepage on my home PC is still set to Yahoo! mail, despite the fact that I bought a domain two years ago and now only check it once every couple of weeks. At work, I never even bothered to set a home page. You know why? Because I'm an Opera user, and I configured by browser to always restore my last session. So, in other words, I never actually see my home page. I just open up my browser and immediately pick up where I left off last time.

Apparently, from Tim's informal poll, there are a fair percentage of people who have their browser start on a search page, like Google. I don't know why, though. Opera and Firefox both support configurable inline searches in the address bar, after all. When I want to search Google, I just open a new tab and type "g search term" to run my search. I also set up "define word" and "wiki search term" to search Dictionary.com and Wikipedia in the same way. It's so convenient I honestly don't know how anyone can go back to actually visiting the main search page after discovering this.