You ask me to upgrade Silverlight, for security purposes, then loudly warn me it might harm my computer. No, that’s not confusing at all.
I’ve written previously about how the archives of my blog were less full than they should be – that, between domain changes, server/CMS moves, and times when I simply didn’t care, there were potentially hundreds of posts missing from the early years in particular.
Back up your crap, people – including your blog.
For the last couple of years I’ve had an on-off project to restore as much of this personal history as possible. Every so often I’d go ferreting through old hard disks, or exploring the Internet Archive’s Wayback Machine for old content I could salvage. At first I had limited success, turning up only a handful of posts. Of those, I was fussy and only restored the “worthwhile” posts – usually longer posts about big events, or technical in nature.
This last weekend though, I revised my stance on this. If I was going to recreate my blogging history, I couldn’t – shouldn’t – just cherry-pick. I should include as much as I could possibly recover: the good, the bad, the plain inane. Anything less would feel a bit dishonest, and undermine the raison d’etre of the whole endeavour: saving the past.
The only exception would be posts which were so incomplete due to missing assets (images mainly) that any body text made no sense, or posts which were completely unintelligible out of context of the original blog – entries about downtime, for example. Also excluded were my personal pet peeve – posts “apologising” for the time between updates1!
A Brief Synopsis of the “How”:
To bring the past kicking and screaming into the present, I dove back into the Wayback Machine, going as far back on my first domain as I could. From there I worked as methodically as I could: working from the furthest back onwards, post-by-post. The basic process was:
- Copy the post text and title to the WordPress new post screen
- Adjust the post date to match the original
- Where possible, match the original publishing time. Where this wasn’t available, approximate based on context (mentions of morning/afternoon/evening, number of other posts that day, etc)
- Check any links in the post (see below)
- Add any recovered assets – which was rare
- Turn off WordPress social sharing
- Publish
I started on the Friday afternoon, and manually “imported” around 50 posts in the first batch.
Turning off social sharing was done so I didn’t flood my Twitter followers with a whole load of links to the posts – some over a decade old. One thing I didn’t anticipate though, and which I had zero control over, was WordPress emailing the old posts to those who had subscribed to email notifications. It wasn’t until a friend IM’d me about her full inbox that I realised what was happening – so if you found your mail filled with notifications as a result of this exercise, I apologise!
To get around this, I ended up creating a new, private WordPress blog to perform the initial manual process, so I could later export a file to import into this blog.
Between Saturday, Sunday, and Monday evenings, I tracked down and copied over a further 125 or so posts. Due to the vagaries of the Wayback Machine, not every post could be recovered. Generally speaking, it was reliable in having a copy of the first page of an archive section, but no further pages. Sometimes I could access “permalink” pages for the other posts, but this was really hit-or-miss. A lot of the time the page the WBM had “saved” was a 404 page from one of my many blog reorganisations over the years, or in other cases, it would have maybe one post out of eight.
I made a rule not to change the original posts in any way – no fixing of typo’s/correcting something I was wrong about. The only thing I would do, was mark where there was a missing asset with an “Editors Note” of some sort, when appropriate. The only content I did have to consider changing were links.
Dealing with Links
One thing I had to consider was what to do about links which might have changed or disappeared over time. When copying from the WBM, links had already been written to point to a (potentially non-existent) WBM archive page, but if the original still existed, I wanted to point to that instead. In the end I would have to check pretty much every link by hand – if the original existed, I would point to that page; if not, I would take a chance with the Wayback Machine. In some cases I had to consider what to do where the page existed, but had different content or purpose to the original. I dealt with these on a case-by-case basis.
For internal links, I pointed to an imported version, if it existed, or removed it if there was none and context allowed.
Wrapping Up
In total, I imported around 175 previously “lost” blog entries, covering 2002-2006, with the majority from 2005. These years have gone from having a handful of entries, to having dozens. Overall, this has grown the archives by roughly 50% – a not so insubstantial amount!
At some point I will go back and appropriately tag them all, but that’s a lower priority job for another time.
2007-2010 were years when my writing output dropped a lot, so while I will look for missing entries from this period, I don’t expect to find many at all.
Side Note: History Repeats
I discovered, in the process of doing all this, that I had gone through the same exercise before, roughly 10 years ago!
Over the last few days, I’ve been working on the archives of my old site; cleaning and recategorising them. Today, I have added them to the archives of Pixel Meadow.
These additions represent everything that was left of ChrisMcLeod.Net. Over the course of its life many changes occured and data was lost – so these additions don’t represent everything that I’ve written there over the years.
You would think I might have learned from this mistake back then, but obviously not! Fingers crossed it’s finally sunk in.
- Though only where they had no other content to the post. ↩
via Signal vs. Noise
The Reading List is a round-up of interesting blog posts and articles I’ve recently read, curated and posted every couple of days.
- Scotland: No Country for Free Speech
- A mile wide, an inch deep — Medium
- This is why you shouldn’t take people’s Facebook lives seriously
- More Traffic, Less Noise: 15 Social News Websites You Must Know
- Illogical Campaigning from Scottish Labour — Medium
- How We’re Working Without Managers at Buffer
- Hire Remotely
- Apple has lost the functional high ground – Marco.org
- How to get – and stay – organised
“Time is a created thing. To say ‘I don’t have have time,’ is like saying ‘I don’t want to.’”
The Reading List is a round-up of interesting blog posts and articles I’ve recently read, curated and posted every couple of days.
- Turn Any YouTube Video Into A GIF By Just Adding “GIF” To The URL
- Google+, the interest network — Medium
- Aberdeen are top but they surely won’t stay there . . . will they?
- Dynamic firewall with FirewallD
- Stop Re-setting the Clock
- Old Fashioned 101
- It’s a mugs game. Bodum vacuum travel mug review — Stuff & Nonsense, And All That Malarkey
- Valley of the Blahs: How Justin Bieber’s Troubles Exposed Twitter’s Achilles’ Heel
- 3 Christmas Skills Every Man Should Master
- Personal Transparency / Hi, I’m Leo
- All the Things We Did Wrong With Our Blog Images
- A Day in the Life of a Buffer Happiness Hero
- Git for Grown-ups ◆ 24 ways
- How To De-Google-ify Your Life: The Complete Guide To Leaving Google
- Love, Laughs and Lemons – How to Have Fun on Tinder
- Why Is Making Grownup Friends So Hard?
- Ten great Tabletop games you can use to introduce your friends to gaming
- 15 Lessons from 15 Years of Blogging
- The Blogosphere lives!
- Obliterate Startup Depression
- The Languages And Frameworks That You Should Learn In 2015
- Would you like to play a game?
- The No-Cardio Workout
- The 15 Best Browser Extensions to Improve Your Social Media Marketing