Waybacking Links

Waybacking Links

broken links Now that I've finally created a mail API for my blog - this very post was made via email - the next important advance (/ next long-delayed planned capability) is "waybacking" dead links. The longer I blog and the more I go back and look at old posts, the more aware I've become that old links are mostly dead links. The bigger and more professional the organization whose website I had linked, the more likely it is that those links have died, and the original content is gone. (Meanwhile the individually curated and cared for blogs those links are often surprisingly fine 😊.)

So what can be done? Well fortunately the amazing Wayback Machine exists, and quite a lot of that old linked-to content has been captured there. The challenge is to figure out which links are now dead, and how to redirect them.

So, is a link dead? Used to be, you could follow the link, and if you got a "404" the link was dead. (And if the site was gine, well, then it was even deader.) But anymore many sites now redirect any 404 to a special "not found" page, or a search engine, or their home page. Harder to tell. Jamie Zawkinsky solved this problem by waybackifying every link over five years old, on the assumption that it was dead or about to be.

I'd like a smaller hammer. I'm thinking of a logic like this:

Is the link waybacked? If no, well, too bad.
Is the link (or site) truly dead? If so, waybackify the link.
Else compare the linked-to page to its waybacked copy. If they are "sufficiently different", assume the link is dead (and/or the content has been changed) and waybackify it. Some experimentation can probably iterate into a reasonable measure of "sufficiently different".

Anyway stay tuned. This will be internal plumbing. Perhaps one day we can visit old archived pages and follow their links!

Comments?