EnrichUK good practice guidebook

Digital Preservation

Acknowledgements

Written by Pete Johnston, Research Officer, UKOLN

Background To This Document

This section is based on the information paper which was provided for the NOF-digitise programme which provided advice and guidance to projects funded by the NOF-digitise programme on preservation of their digital content. However the advice provided in this section should still be appropriate for new projects.

What is digital preservation?

A basic definition of digital preservation is the maintenance of digital material over the long-term with a view to ensuring continued accessibility. Digital material refers to any material processed by a computer and includes both that which is "digitised" (reformatted to digital) as well as those resources that are "born digital". Long-term in this context should be taken to mean long enough to be concerned with the impact of changing technologies - and should include timescales of decades and even centuries. There has been a great deal of confusion around the term "digital preservation" mainly because of early projects that (perhaps inadvertently) equated the process of digitisation with preservation. In general, digital preservation involves a number of organised tasks associated with a variety of technical approaches or strategies for ensuring that digital resources are not only stored appropriately, but also adequately maintained and thus consistently useable over time. Day-to-day preservation is based around the management of archive copies of deposited data resources - i.e. copies that are independent of any online representation.

Technical strategies for ensuring digital materials are maintained and accessible over the long-term

Put simply, a digital preservation strategy is a particular technical approach to the preservation of digital materials. There are three main technical approaches to preserving digital materials: technology preservation, technology emulation and data migration. The first two focus on the technology itself. In order to preserve the functionality of any digital resource there must be preservation, in some form, of the technical environment that created and employed it. Data migration strategies focus on the need to maintain the digital files in a format that is accessible using "current technology". In this strategy scenario, files require regular migration from one technical environment to another, newer one. Each strategy will be described briefly in turn but it is important to realise that even within each approach there are variations.

Technology preservation

If digital material relies on the technical environment used to create it in order to preserve the functionality and "look and feel" of the product, then the most obvious approach is to preserve the original technology. This is a "museum style" approach and probably only suitable as a short-term solution. Hardware and software from the object itself are maintained so that access can be guaranteed. However, this generally means that access is limited to a specific physical location (i.e. where the hardware/software are kept) and the cost/space implications for storing this kit are probably beyond the realm of possibility on a large scale. Over a number of years the machines themselves will inevitably degrade making this approach problematic as a long-term strategy. The Science Museum and the Computer Conservation Society in the UK are interested in the merits of this approach and are currently maintaining old computer systems that may prove valuable resources for scholars in the future. For a library, archive or publisher, the sheer space and resources needed to maintain old systems would in all likelihood make this approach impossible.

Technology emulation

Another approach based on the need to preserve the technological environment (and therefore original functionality) is emulation. Unlike the strategy described above, an emulation strategy seeks to preserve that environment not through the preservation of original hardware/software but by using current technology to mimic the original environment. This might involve emulation of the original software or (more likely) emulation of the original hardware (in this case the original software and operating system are stored along with the digital object itself). Either way, the strategy relies on a detailed description of the original environment on which to base the emulation in future. The emulator itself is not necessarily stored in the archive (although it may be, it may be created at a later date when there is demand for the material. The detailed technical descriptions (metadata - see below) on which this strategy is often based is a key component of an emulation strategy and. As yet, there are no standard approaches available for descriptions of this kind.

Although controversial, many experts are beginning to believe that for truly long-term preservation emulation is the best solution. It accepts the necessary conundrum of preserving the original technical environment but it ensures that material is not held hostage to obsolete technology. Instead it can take advantage of new technologies as they develop for emulation. Although we are unable to predict how future technologies will develop, we can be certain of some general principles - it will be more effective, cheaper and faster.

Data migration

Unlike the strategies above, data migration focuses on maintaining digital material in current formats. At present many libraries and publishers are involved in regular migrations for image files e.g. moving images from one software version to a newer version. The attraction of this strategy is that material is maintained in an accessible format. The two strategies above both advocate storing the material as a bytestream in its original format and then making it accessible when necessary. Data migration means the material is maintained in the archive in a currently useable format.

However, there are also significant disadvantages to this strategy:

It is important to stress that data migration as it is described here is more complex than what is often called data "refreshing". All preservation strategies must include regular data refreshing which is the systematic transfer of stored material to newer and fresher media (e.g. from one magnetic tape to another). Refreshing does not imply ensuring the material is kept useable - it is only the transfer of a bytestream from one medium to another. Migration focuses on keeping the material functional with new technology.

Preservation metadata

The effective use of digital resources in an archive will rely on a robust system of resource description - for the purposes of resource discovery, managing access and ensuring preservation of the resources. Metadata research has continued to generate interest world-wide but, to date, most of activity has focused on metadata for resource discovery. However, there is increasing awareness that effective digital archives will depend on the creation and storage of relevant descriptive information (metadata) required to support a chosen preservation strategy, whether migration, emulation or technical preservation. This information will need to describe the data in detail including file format, software and hardware platforms. It may also contain information about rights management and access control. UKOLN has led a great deal of the work on preservation metadata for the Cedars project [1] and the first public draft of the Metadata for Digital Preservation: the Cedars Project Outline Specification was released in March 2000. Well-kept digital preservation metadata is essential and all NOF projects must comply with the technical standards and guidance published [2].

A summary of the three main strategic approaches to digital preservation

Technology preservation strategy: preserve the individual software (and possibly hardware) that was used to create and access the information; also involves preserving the original operating system and hardware on which to run it

Technology emulation strategy: programme future powerful systems to emulate older obsolete computer platforms/operating systems as required

Digital information migration strategy: ensure that digital information is re-encoded in new formats before the old format becomes obsolete.

Choice of strategy must reflect fitness for purpose. Certain technical factors will impact on this choice: the basic data types employed in each category; the application programs used to create them; the structures applied to them; the systems used to manage or distribute them prior to deposit.

Getting started

Maintaining access to archived digital resources over the long term involves interdependent strategies in the short/medium term based on:

A preservation strategy is going to be most effective if it takes into account the full life-cycle of the resource - allowing for the greatest efficiencies between: data creation; access and preservation.

Procedures to prepare data and documentation for storage and preservation

Unique numbering

Every data source accessioned should be allocated a unique identifier. This number will identify the resource in the institution's catalogue and be used to locate or identify physical media and documentation - if a resource is de-accessioned for any reason, this unique number should not be reallocated.

Preferred marking and labelling

At a minimum all physical media and hard-copy documentation should be marked with the unique number allocated to the resource, and any additional information required by the institution to easily identify content and formats.

Handling guidelines

From accessioning, guidelines should be followed that reflect best practice in storage/preservation handling for the different media involved.

Validation

Validation checks should be carried out by the institution on transfer media; content; structure of deposited data resources, and on any accompanying documentation.

Validation procedures may well need adapting in the light of the materials/resources available in the acquisitions/collections section - and some of these procedures will have to be undertaken manually.

Such checks may include:

Re-formatting file formats

Where the file formats used to transfer the resource are unsuitable for long-term preservation, the institution may reformat the resource onto its preferred file formats. In addition to archive formats, versions in other formats suitable for delivery to users may also be produced from the original

Reformatting storage media

Where storage media used to transfer the resource are unsuitable for long-term preservation - the institution may reformat the resource onto its preferred media

Copying

Multiple back-up copies of an item may be generated during accessioning as part of a storage and preservation policy and to enable disaster recovery procedures.

References

  1. Cedars Project,
    http://www.leeds.ac.uk/cedars/
  2. NOF-digitise Technical Standards,
    http://www.peoplesnetwork.gov.uk/content/technical.asp

Further Information

Digital Culture: Maximising the nation's investment
http://www.ukoln.ac.uk/services/elib/papers/other/jisc-npo-dig/

Dublin Core metadata Initiative
http://dublincore.org/documents

AHDS Guide: Creating a viable data resource
http://www.ahds.ac.uk/viable.htm

Comments On This Document

This section will be used to provide notes on the section, including details of any changes.

2 Dec 2004
Document made available to MLA staff for comments.