<<< Friday, May 02, 2003 08:31 AM

Home

Fehlervorhersagefreude II >>>


Outbound Trackbacks

Friday,  05/02/03  02:44 PM

For the web nerds among you...  (yeah, you!)

I implemented "outbound trackbacks" today.  Essentially a trackback is a way to tell someone: "hey, I linked to your site".  To post a trackback to somebody their site has to support "inbound trackbacks".  This is not yet a widespread feature; I discovered that since the start of the year I've made 1188 links to other sites, of which 28 were trackback-enabled.  Hardly seems worth it, except that I'm sure this will become more popular over time.

I'm still deciding whether to implement "inbound trackbacks".  This would allow me to know when someone has linked to me, but only if they have a trackback-enabled site.  I think for now I'm going to keep looking through my referer logs instead...  Not only does this cover every inbound link (including those from non-trackback-enabled sites), but it tells me when the link was used, which is actually a little more interesting than whether it exists.

Trackbacks are pretty simple; the concept was developed by the folks at Movable Type (a popular blogging tool), and the specification is on their site.  My implementation was to write a script which will run once a day and process all new posts and articles.  For each link in each post, the script retrieves the linked-to page and looks for RDF information in the page which describes the trackback.  (If there isn't any the site isn't trackback enabled, and you're done.)  If there is a trackback URL, you make an HTTP POST to it giving your URL, your site name, and an optional excerpt (there's a good example in the spec).  That's it.

The most interesting part of the script creates a reasonable "excerpt":

grep "$url" $file |
sed "s/<[^>]*>//g;s/&amp;/&/g;s/&lt;/</g;s/&gt;/>/g" |
cut -c1-252 |
sed "s/\$/%24/g;s/&/%26/g;s/+/%2B/g;s/=/%3D/g;s/?/%3F/g;s/ /+/g" |
sed "s/+[^+]*$//;;s/.$/&.../"

Yeah, I know, nerdy.  The grep gets the paragraph containing the link.  The first sed converts the HTML into text, throwing away tags.  The cut truncates the excerpt at 252 characters.  The second sed URL-encodes the excerpt, and the final sed appends a "..." to the end.  Voila.

If all sites were trackback-enabled in both directions, it would have the effect of making all links two-way; for any page you would know all the links to it, from all over the web.  I doubt this will ever happen; for one thing the information is not always useful and could be huge (imagine all the inbound links to the Google home page, for example).  But it is a cool thing in the blogosphere, and I expect all the popular blogging tools will support it...