Tuesday, 24 February 2009

Shoot those files!

Just wanted to record a pointer to Manfred Thaller's 'shoot' tool, which I saw demonstrated at a Planets preservation planning event sometime last year (see the posting). It's a handy way to get a feel for which formats suffer most loss of functionality when damaged.

Odds and ends from day one of the digital lives conference

The digital lives conference provided a space to digest some of the findings of the AHRC-funded digital lives project, and also to bring together other perspectives on the topic of personal digital archives. At the proposal stage, the conference was scheduled to last just a day; in the event one day came to be three, which demonstrates how much there is to say on the subject.

Day one was titled 'Digital Lifelines: Practicalities, Professionalities and Potentialities'. This day was intended mostly for institutions that might archive digital lives for research purposes. Cathy Marshall of Microsoft Research gave the opening talk, which explored some personal digital archiving myths on the basis of her experiences interviewing real-life users about their management of personal digital information.

Next came a series of four short talks on 'aspects of digital curation'.

  • Cal Lee, of UNC Chapel Hill, emphasised the need for combining professional skills in order to undertake digital curation successfully. Archives and libraries need to have the right combination of skills to be trusted to do this work.
  • Naomi Nelson of MARBL, Emory University, told a tale of two donors. The first donor being the entity that gives/sells an archive to a library and the second being the academic researcher. Libraries need to have a dialogue with donors of the first type about what a digital archive might contain; this goes beyond the 'files' that they readily conceive as components of the archive, and includes several kinds of 'hidden' data that may be unknown to them. The second donor, 'the researcher', becomes a donor by virtue of the information that the research library can collect about their use of an archive. Naomi raised interesting questions about how we might be able to collect this kind of data and make it available to other researchers, perhaps at a time of the original researcher's choosing.
  • Michael Olson of Stanford University Libraries spoke of their digital collections and programmes of work. Some mention of work on the fundamentals - the digital library architecture (equivalent to our developing Digital Asset Management System - DAMS - which will provide us with resilient storage, object management and tools and services that can be shared with other library applications). Their digital collections include a software collection of some 5000 titles, containing games and other software. I think that sparked some interest from many in the audience!
  • Ludmilla Pollock, Cold Spring Harbour Laboratory, told us about an extensive oral history programme giving rise to much digital data requiring preservation. The collection contains videos of the scientists talking about their memories and has a dedicated interface.
After, we heard from a panel of dealers in archival materials: Gabriel Heaton of Sotheby's, Julian Rota of Bertram Rota and Joan Winterkorn of Bernard Quaritch. I was curious to hear if the dealers had needed to appraise archives conatining obsolete digital media. Digital material is still only a tiny proportion of collections being appraised by dealers, and it seems that what little digital material they do encounter may not be appraised as such (disk labels are viewed rather than their contents). While paper archives are plentiful, perhaps there's not much incentive to develop what's needed to cater for the digital (many archivists may well feel this way too!). What's certain is that the dealer has to be quite sure that any investment in facilitating the appraisal of digital materials pays dividends come sale time.

Inevitably, questions of value were a feature of the session. The dealers suggest that archives and libraries are not willing to pay for born-digital archives yet; perhaps this stems from concerns about uniqueness and authenticity, and the lack of facilities to preserve, curate and provide access. It's not like there's actually much on the market at the moment, so perhaps it's a matter of supply as much as demand? Comparisons with 'traditional' materials were also made using Larkin's magic/meaningful values:

"All literary manuscripts have two kinds of value: what might be called the magical value and the meaningful value. The magical value is the older and more universal: this is the paper [the writer] wrote on, these are the words as he wrote them, emerging for the first time in this particular magical combination. We may feel inclined to be patronising about this Shelley-plain, Thomas-coloured factor, but it is a potent element in all collecting, and I doubt if any librarian can be a successful manuscript collector unless he responds to it to some extent. The meaningful value is of much more recent origin, and is the degree to which a manuscript helps to enlarge our knowledge and understanding of a writer’s life and work. A manuscript can show the cancellations, the substitutions, the shifting towards the ultimate form and the final meaning. A notebook, simply by being a fixed sequence of pages, can supply evidence of chronology. Unpublished work, unfinished work, even notes towards unwritten work all contribute to our knowledge of a writer’s intentions; his letters and diaries add to what we know of his life and the circumstances in which he wrote.”

Philip Larkin 'A Neglected Responsibility: Contemporary Literary Manuscripts', Encounter, July 1979, pp. 33-41.
The 'meaningful' aspects of digital archives are apparent enough, but what of the 'magical'? Most, if not all, contributors to the discussion saw 'artifactual' value in digital media that had an obvious personal connection, whether Barack Obama's Blackberry or J.K. Rowling's laptop. What wasn't discussed so much was the potential magical value of seeing a digital manuscript being rendered in its original environment. I find that quite magical, myself. I think more people will come to see it this way in time.

Delegates were then able to visit to digital scriptorium and audiovisual studio at the British Library.

After lunch, we resumed with a view of the 'Digital Economy and Philosophy' from Annamaria Carusi of the Oxford e-Research Centre. Some interesting thoughts about trust and technology, referring back to Plato's Phaedrus and the misgivings that an oral culture had about writing. New technologies can be disruptive and it takes time for them to be generally accepted and trusted.

Next, four talks under the theme of digital preservation.

  • First an overview of the history of personal films from Luke McKernan, a curator at the British Library. This included changes in use and physical format, up to the current rise of online video populating YouTube, and its even more prolific Chinese equivalents. Luke also talked about 'lifecasting', pointing to JenniCam (now a thing of the past, apparently), and also to folk who go so far as to install movement sensors and videos throughout their homes. Yikes!
  • We also heard from the British Library's digital preservation team, about their work on risk assessment for the Library's digital collections (if memory serves, about 3% of the CDs they sampled in a recent survey had problems). Their current focus is getting material off vulnerable media and into the Library's preservation system; this is also a key aim in our first phase of futureArch. Also mention of the Planets and LIFE projects. Between project and permanent posts, the BL have some 14 people working on digital preservation. If you count those working on webarchiving, audiovisual colections, digitisation, born-digital manuscripts, digital legal deposit, etc., areas, who also have a knowledge of this area, it's probably rather more.
  • William Prentice offered an enjoyable presentation on audio archiving, which had some similar features to Luke's talk on film. It always strikes me that audiovisual archiving is very similar to digital archiving in many respects, especially when there's a need to do digital archaeology that involves older hardware and software that itself requires management.
  • Juan-José Boté of the University of Barcelona spoke to us about a number of projects he had been working on. These were very definitely hybrid archives and interesting for that reason.

Next, I chaired a panel of 'Practical Experiences'. Being naturally oriented toward the practical, there was lots for me here.

  • John Blythe, University of North Carolina, spoke about the Southern Historical Collection at the Wilson Library, including the processes they are using for digital collections. Interestingly, they have use of a digital accessioning tool created by their neighbours at Duke University.
  • Erika Farr, Emory University, talked about the digital element of Salman Rushdie's papers. Interesting to note that there was overlap of data between PCs, where the creator has migrated material from one device to another; this is something we've found in digital materials we've processed too. I also found Rushdie's filenaming and foldering conventions curious. When working with personal archives, you come to know the ways people have of doing things. This applies equally to the digital domain - you come to learn the creator's style of working with the technology.
  • Gabby Redwine of the Harry Ransom Center, University of Texas at Austin gave a good talk about the HRC's experiences so far. HRC have made some of their collections accessible in the reading room and in exhibition spaces, and are doing some creative things to learn what they can from the process. Like us, they are opting for the locked down laptop approach as an interim means of researcher access to born-digital material.
  • William Snow of Stanford University Libraries spoke to us about SALT, or the Self Archiving Legacy Toolkit. This does some very cool things using semantic technologies, though we would need to look at technologies that can be implemented locally (much of SALT functionality is currently achieved using third-party web services). Stanford are looking to harness creators' knowledge of their own lives, relationships, and stuff, to add value to their personal archives using SALT. I think we might use it slightly differently, with curators (perhaps mediating creator use, or just processing?) and researchers being the most likely users. I really like the richness in the faceted browser (they are currently using flamenco) - some possibilities for interfaces here. Their use of Freebase for authority control was also interesting; at the Bod, we use The National Register of Archives (NRA) for this and would be reluctant to change all our legacy finding aids and place our trust in such a new service! If the NRA could add some freebase-like functionality, that would be nice. Some other clever stuff too, like term extraction and relationship graphs.

The day concluded with a little discussion, mainly about where digital forensics and legal discovery tools fit into digital archiving. My feeling is that they are useful for capture and exploration. Less so for the work needed around long-term preservation and access.

Thursday, 12 February 2009


Seems that one of the things we wrestle with when preserving old stuff for use in the future is the question of what I guess is called "transforming content" - the process by which a thing made usable to a reader by (literally) transforming it into a different format (Word 5.5 for DOS (download direct from Microsoft) to Word 2007 for example) - and "preserving environments" - which is where you make DOS and Word 5.5 for DOS and the document available to the reader and let them go back in time.

There are pros and cons to both and the best thing will be to do both. Some readers will want, for example, to experience the pain of using Word for DOS, others will care only for the content of the document and want to read it with their new personal computer (we have to spell it out now since the great PC/Mac debate - folks, a Mac IS a PC!).

Why am I saying all this? Mostly because I sit opposite a wall of shelves that will one day form a museum of old kit and those old machines have kept the subject on my mind for a bit. I have also been experimenting with virtual machines (for reasons beyond emulation) and emulators. Finally, I'm saying all this because Susan tells me this blog is the place to keep and share things that might be useful and so I wanted to log that Apple make their old software available including the OSes and that MinivMac and this Mac-On-A-Stick project looks like they may one day be useful to us. (And if you're a Mac user, check out System 7 via Mac-On-A-Stick - it really isn't much different! :-))

Friday, 6 February 2009

Behold yesterday's snow family

The abundance of snow got me snapping yesterday. Everybody else too from what I saw. Afterwards, many headed home and started downloading, editing and uploading (well, maybe they skipped the editing part). The web is laden with snow-related images, moving and still, from affected parts of the UK. Perhaps we'll be acquiring some of them in personal archives in a few years time. Just for fun, here's one more. I'm particularly proud of the snow dog.

Wednesday, 4 February 2009

Academic Earth

Academic Earth presents 'thousands of video lectures from the world's top scholars'. So far, contributors are from top U.S. universities: Berkeley, Harvard, MIT, Princeton, Stanford and Yale. There is scope for expansion and the Academic Earth team are inviting new partners to contribute.

This is a great idea, but my main reason for linking to Academic Earth is that I rather like the interface. It feels very clean and it's easy to navigate.

Tuesday, 3 February 2009

KEEP project (FP7)

The latest round of FP7 projects in cultural heritage, digital libraries and preservation have started. The Bibliothèque National de France's KEEP project may be interesting - "KEEP addresses the problems of transferring digital objects stored on outdated computer media onto current devices through portable emulators for accurate rendering of both static and dynamic digital objects".