Monday, 21 June 2010

Vintage Computing Festival

Yesterday I took a trip to the first official Vintage Computing Festival in Britain. I was a little surprised to hear that it was the first, but I imagine that there are plenty of 'unofficial' gatherings too. This event was held by the National Museum of Computing in Bletchley Park, which warrants a visit in its own right.

For the weekend's festival, Bletchley was transformed into vintage computing heaven: a couple of marquees and the ground floor of the house were packed with computers of all makes and models, each one up and running and ready for some hands-on time. The vast majority were being used for gaming - chuckie egg was all over the place - but I did spot the odd word-processing application here and there.

I thought I'd post some pictures from two exhibits that really caught my eye.
First was the BBC playing the 1980s BBC Domesday project from laserdisc. Look right and you'll see some video footage that we found having searched for 'falklands'. I've read quite a bit about the BBC Domesday laserdiscs over the years (after the CAMiLEON project they've become digital preservation folklore), but seeing the content at stake, and interacting with it on a contemporary platform is something quite special. I also suffer from BBC Micro nostalgia (though this is a Master).

This other I'm including partly for nostalgic reasons (I loved my spectrums, and so did my sister and my grandfather :-) ), and partly because it amused me. Twittering from a spectrum! Whatever next?!


This is probably an old and battered hat for you good folks (seeing as the Web site's last "announcement" was in 2004!), but most days I still feel pretty new to this whole digital archiving business - not just with the "archive" bit, but also the "digital preservation", um, bit so it was news to me... ;-)

Perusing the latest Linux Format at the weekend, I chanced on an article by Ben Martin (I couldn't find a Web site for him...) about parchive and specifically par2cmdline.

Par-what? I hear you ask? (Or perhaps "oh yeah, that old thing" ;-))

Par2 files are what the article calls "error correcting files". A bit like checksums, only once created they can be used to repair the original file in the event of bit/byte level damage.


So I duly installed par2 - did I mention how wonderful Linux (Ubuntu in this case) is? - the install was simple:

sudo apt-get install par2

Then tried it out on a 300MB Mac disk image - the new Doctor Who game from the BBC - and guess what? It works! Do some damage to the file with dd, run the verify again and it says "the file is damaged, but I can fix it" in a reassuring HAL-like way (that could be my imagination, it didn't really talk - and if it did, probably best not to trust it to fix the file right...)

The par2 files totalled around 9MB at "5% redundancy" - not quite sure what that means - which isn't much of an overhead for a some extra data security... I think, though I've not tried, that it is integrated into KDE4 too for a little bit of personal file protection.

The interesting thing about par2 is that it comes from an age when bandwidth was limited. If you downloaded a large file and it was corrupt, rather than have to download it again, you simply downloaded the (much smaller) par2 file that had the power to fix your download.

This got me thinking. Is there then any scope for archives to share par2 files with each other? (Do they already?) We cannot exchange confidential data but perhaps we could share the par2 files, a little like a pseudo-mini-LOCKSS?

All that said, I'm not quite sure we will use parchive here, though it'd be pretty easy to create the par2 files on ingest. In theory our use of ZFS, RAID, etc. should be covering this level of data security for us, but I guess it remains an interesting question - would anything be gained by keeping par2 data alongside our disk images? And, after Dundee, would smaller archives be able to get some of the protection offered by things like ZFS, but in a smaller, lighter way?

Oh, and Happy Summer Solstice!

Thursday, 10 June 2010

OSS projects for accessing data held in .pst format

Thanks to Neil Jefferies for a link to this article in The Register, which tells us that MS has begun two open source projects that will make it possible for developers to create tools to 'browse, read and extract emails, calendar, contacts and events information' which live in MS Outlook's .pst file format. These tools are the PST Data Structure View Tool and the PST File Format SDK, and both are to be Apache-licensed.

Wednesday, 2 June 2010

Developing & Implementing Tools for Managing Hybrid Archives

As previously blogged, we were invited to talk at the University of Dundee's Centre for Archive and Information Studies seminar. I understand that the presentations along with a set of notes will be made available shortly, but in the mean time I thought I'd let you know my slides and notes are available on SlideShare and also my rather hastily thrown together home page! :-)