Good-bye TrackBack Spam

Today, I happened across an interesting paper on TrackBack spam called Taking TrackBack Back (from Spam), by a team at Rice University. In fact, it was so interesting and sensible, I immediately implemented it on my weblog.

If you have a blog with TrackBack support enabled, you've probably been hit by TrackBack spam. In fact, according to the paper, approximately 98% of all TrackBacks are spam. To me, this is not even remotely surprising, as every single ping I've gotten since I implemented TrackBack in LnBlog has been spam. I've been fighting it with IP blacklisting and content filtering, but it's a losing battle. After implmenting Pingback last week, I was seriously considering just disabling TrackBacks on my blogs.

The problem with TrackBack, if you've read anything about it, is that it's completely unauthenticated. To send a TrackBack ping to a blog entry, all you need to do is send an HTTP POST, populated with whatever data you like, to a specific URL. Although it is required by the specification, the most obvious (and common) implementation of TrackBack is to simply accept and store the information sent by the client. Needless to say, this leaves you completely vulnerable to spammers.

Pingback is supposed to fix this by virtue of the fact that the server receiving the ping does all the work. The client just sends an XML-RPC request with the URL of the page to ping and the URL of the page that references it. The server is not required to do anything, but it is recommended that it fetch the referring page, check that it links to your site, and extract some information to display, like a title and excerpt.

However, as the Rice University paper points out, there's no requirement in the TrackBack specification that you just take what the client gives you. In fact, the anti-spam measure recommended by the paper is essentially to do what the Pingback spec recommends - fetch the page and see if it links to you. Not only is this compatible with the TrackBack specification, but it is also, according to their information, highly effective.

The beauty of this is that it's so obvious. In fact, when I read it, my first reaction was, "Why didn't I think of that?" Although it's not required, TrackBacks from legitimate blogs will virtually always include a link to your blog. After all, how else will the readers know about your entry? However, this is almost never the case for spam pings. The spammers aren't at all interested in what your blog says - they just want to spray their links all over the web. So if the page doesn't link to your site, you can be pretty sure it's spam. And if the page does link to my site - well, at least it's boosting my Google Page Rank.

You can reply to this entry by leaving a comment below. This entry accepts Pingbacks from other blogs. You can follow comments on this entry by subscribing to the RSS feed.

TrackBacks #

    Spam Mail Filtering: Avoid, Shun, Thwart, Prevent, and then Filter Spam

    Email is rapidly becoming the standard means of communication among businesses, associates, and even friends. While many people have now been using the internet and email for years, there are thousands of new users on the internet each day. With inexpensive web hosting, free email services, and the blog burst upon us, getting your own slice of the internet pie has never been easier.

Add your comments #

A comment body is required. No HTML code allowed. URLs starting with http:// or ftp:// will be automatically converted to hyperlinks.