Cedars Access Issues Working Group


Rethinking Preservation Description Information (PDI) for the Cedars project

Cedars Project Document AIW03

Michael Day
UKOLN: The UK Office for Library and Information Networking, University of Bath, Bath BA2 7AY, UK.
m.day@ukoln.ac.uk
http://www.ukoln.ac.uk/

Status: Draft

Last update: 20-May-1999

Created: 21-Mar-1999

Availability: Project

1. Introduction

Cedars Project Document AIW02 (Stone and Day 1999) was an attempt to define metadata elements for use in the Cedars project. The first draft of AIW02 was based on an analysis of the selected initiatives identified in Cedars Project Document AIW01 (Day 1998) and the proposed elements were intended to broadly conform with the reference model for an Open Archival Information System (OAIS) being developed by ISO (Reich and Sawyer 1999). This document (AIW03) attempts to refine and further develop the definitions of Preservation Description Information (PDI) metadata elements produced in AIW02.

2. PDI and Descriptive Information

One initial issue was defining the exact relationship between PDI and Descriptive Information. The OAIS document defines Descriptive Information as the "set of information, consisting primarily of Package Descriptions, which is provided to Data Management to support the finding of the preserved information by consumers" (Reich and Sawyer 1999, p. 9). PDI, on the other hand, is defined as "information necessary to adequately preserve the Content Information" and is characterised as Reference, Context, Provenance and Fixity Information (Reich and Sawyer 1999, p. 11). The same information (metadata) may - depending upon its function - be seen either as PDI or Descriptive information. So, for example, PDI Reference Information will usually contain titles or identifiers that could also be used for resource discovery purposes.

On the other hand, in the OAIS model PDI and Descriptive Information exist at different levels. PDI describes Content Information while Descriptive Information describes a whole Information Package. In OAIS terms, each Archival Information Package (AIP) "is associated with a structured form of Descriptive Information called the Package description" (Reich and Sawyer, p. 61).

The Package Description must contain at least one Associated Description that supplies data for a Retrieval Aid that allows authorized users to retrieve the Content Information and PDI described by the Package Description. This Retrieval Aid is generally part [of] the Archival Storage functional area and translates from the unique identifier assigned by the OAIS to identify the AISP into the set of operations and filenames needed to retrieve the AIP from the file management system used in Archival Storage and returns the Content Information and PDI for the requested AIP (ibid., pp. 61-62).

The Package Descriptions need to be used for both resource discovery (OAIS Finding Aids) and for Ordering Aids that allow users "to specify transformations to be applied to the AIPs prior to dissemination".

This implies that PDI and Package Descriptions need to be kept distinct.

3. Relationship with Dublin Core

Many of the elements identified as PDI or Descriptive Information within Cedars AIW02 bear a close resemblance to those that exist within other description schemas. Most descriptive bibliographic-type information could be handled using existing formats and tools, e.g. using TEI headers or MARC. It is important that Cedars does not spend too much time re-inventing wheels.

Many of the elements identified in AIW02 correspond to one of the fifteen elements defined by the Dublin Core (DC) initiative. Many of the issues being currently addressed by DC - which primarily originated in discussions of qualifiers and the RDF data model (Weibel 1999) - are also relevant in a Cedars context. For example, debates over whether elements that identify personal or corporate names (DC.Creator, DC.Contributor and DC.Publisher) should be amalgamated into one "Agent" element whereby the actual role taken would become a qualifier rather than an element. Again, there has been debate about the precise link between DC.Relation and DC.Source. At the present time, so-called DC-simple has been defined in RFC 2413 (Weibel et al. 1998) while details on how to encode DC in HTML is available as an Internet-Draft (Kunze 1999). Adopting - where possible - existing DC interpretations of core elements may be one way of simplifying what Cedars needs to define in its metadata schema.

4. Pre-existing metadata

[TBA]

5. Automatic metadata generation

[TBA]

6. Metadata elements

It is assumed that:

Additionally:

This more worked-out list of Cedars PDI elements is based on a broad framework developed by the National Archives of Australia in its Recordkeeping metadata standard for Commonwealth Agencies (1999).

1. Identifier

Definition

Any identifier assigned by the repository itself or by an external agency.

Reason

To assist in uniquely identifying a resource within a repository. Resource discovery.

OAIS

Reference Information (PDI)

Descriptive Information

Obligation

Mandatory

Use conditions

All digital objects in a repository should be given a unique and persistent identifier - this may be a Cedars Identifier (CRID) or other identifier (e.g. URN). Other relevant identifiers - e.g. those previously assigned to the resource - should also be added to aid resource discovery.

Use a separate Identifier element for each additional identifier.

Sub-Elements

Name

Obligation

Schemes

 

1.1 Identifier Type

Mandatory

?

Notes

The Identifier element needs to be repeatable - so the syntax would need to reflect this. For example the 2.1 Identifier Type could be embedded within the tag

<identifier type="ISBN">0-582-05132-0</identifier>
<identifier type="URN">urn
:foo:8573705137518375</id>

1.1 Identifier Type

Definition

The type of identifier used in 2.2 Identifier Value.

Reason

The identifier value may not give any clues to its origin,

Obligation

Mandatory.

Conditions

Identifier types should be selected from a list of assigned values which would need to be kept up to date.

Assigned values

Value name (examples)

Definition

 

CRID

Cedars Identifier -

 

URN

TBA

 

DOI

TBA

 

ISBN

TBA

 

ISSN

TBA

 

SICI

TBA

Default value

CRID?

Repeatable

Yes.

Assigned by

Identifiers assigned by the repository will need to be system provided (?) at ingest. Other (existing) identifiers would need to be

Schemes

?

Notes

-

2. Title Information

Definition

A name given to the Content Information

Reason

To assist in identifying a resource. Resource discovery.

OAIS

Reference Information (PDI)

Descriptive Information

Obligation

Mandatory

Use conditions

-

Sub-Elements

Name

Obligation

Schemes

 

Title

Mandatory

-

 

Alternative Title

Optional

-

 

Version or Edition Number

Optional

-

Notes

Example:

<title-information>

<title>The poetical works of John Milton</title>

<alternative-title>Milton's poetical works</alternative-title>

<edition>2nd ed.</edition>

</title-information>

2.1 Title

Definition

The form of words used to name the Content Information

Reason

Resource discovery.

Obligation

Mandatory

Conditions

-

Assigned values

-

Default value

-

Repeatable

Yes

Assigned by

Manually entered by repository staff at ingest, possibly converted from existing metadata.

Schemes

Free text

Notes

-

2.2 Alternative Title

Definition

Any form of words used to name the Content Information that differs from 3.1 Title

Reason

Resource discovery.

Obligation

Optional

Conditions

-

Assigned values

-

Default value

-

Repeatable

Yes

Assigned by

Manually entered by repository staff at ingest, possibly converted from existing metadata.

Schemes

Free text

Notes

-

2.3 Version or Edition Number

Definition

The version or edition number of the Content Information where relevant.

Reason

Resource discovery.

Obligation

Optional

Conditions

-

Assigned values

-

Default value

-

Repeatable

Yes

Assigned by

Manually entered by repository staff at ingest, possibly converted from existing metadata.

Schemes

Free text

Notes

There may be a requirement for standard abbreviations, e.g.: "v. 1.01", "2nd ed.", etc.

3. Relation

Definition

A link between one item and another

Reason

To provide contextual information about a resource by documenting its relationships with other resources both inside the repository and outside.

OAIS

Context Information (PDI)

Obligation

Optional

Use conditions

Types of relationships include: levels of aggregation (contains, contained in), place in sequence, sources, etc.

Sub-Elements

Name

Obligation

Schemes

 

Related Item ID

Mandatory

 
 

Relation Type

Mandatory

 
 

Relation Description

Optional

 

Notes

 

4. Subject

Definition

The subject or topic of a resource.

Reason

Discovery access point.

OAIS

Context Information (PDI)

Descriptive Information

Obligation

Optional

Use conditions

-

Sub-Elements

Name

Obligation

Schemes

 

Subject Scheme

Optional

Defined

 

Subject Term

Optional

 
 

Keywords

Optional

 

Notes

 

5. Publisher

Definition

"The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity" - Dublin Core (RFC 2413)

Reason

To record the person or organisation responsible for making the resource available. This is both a

OAIS

Provenance Information (PDI)

Descriptive Information

Obligation

Optional

Use conditions

-

Sub-Elements

Name

Obligation

Schemes

 

Publisher Name

Optional

 
 

Publisher Contact Details

Optional

 

Notes

-

6. Rights Management

Definition

Details of any publisher agreements, legislation, etc. that restrict access to resources or dictate particular restrictions with regard to preservation strategies.

Reason

To record any agreements made with rights holders, to help with management of preservation role and to help manage access and use of resource.

OAIS

Provenance Information (PDI)

Obligation

Mandatory

Use conditions

 

Sub-Elements

Name

Obligation

Schemes

 

7.1 Rights Owner (Agent)

Mandatory

-

 

7.2 Rights Agreement Description

Mandatory

-

 

7.3 Rights Agreement Date

Mandatory

ISO 8601

 

7.4 Copyright Statement

Optional

-

 

7.5 Rights Review Description

   
 

7.6 Rights Review Date

Mandatory

ISO 8601

Notes

The management of intellectual property rights will be an important feature of any digital archive. The elements identified here are given by way of illustration and are not intended to be definitive.

6.2 Rights Agreement Description

Definition

A brief textual description of any agreement or contract made with rights owners at the time of ingest into an archive.

Reason

To summarise rights agreements in a standardised way so that archive staff and archive users can quickly assess whether access (or any other archive operation) is permissible without recourse to the original legal documents.

Obligation

Optional

Conditions

?

Assigned values

Value name

Definition

     
     

Default value

Free text

Repeatable

 

Assigned by

Initially by repository staff at time of ingest, but will

Schemes

 

Notes

 

7. Context (Pre-ingest details)

Definition
 

Reason

 

OAIS

Provenance Information (PDI)

Obligation

 

Use conditions

 

Sub-Elements

Name

Obligation

Schemes

       
       
       

Notes

This element is concerned with the physical

8. Ingest history

Definition

Dates and description of the ingest process as managed by the repository.

Reason

To record the ingest process by which resources are received by ...

OAIS

Provenance Information (PDI)

Obligation

Mandatory

Use conditions

 

Sub-Elements

Name

Obligation

Schemes

 

Ingest Agent

   
 

Ingest Date

Mandatory

ISO 8601

 

Depositor

   
 

Deposit Agreement

   
 

Repository-name

   
 

Selection criteria

   
 

Metadata creator

   
 

Metadata creation date

 

ISO 8601

 

Capture Procedure

   

Notes

 

9. Change history (Preservation actions)

Definition

Dates and descriptions of all preservation actions carried out after initial ingest and registration. This will include recording media refreshing and format migrations over time.

Reason

To provide a complete record of preservation actions carried out on an information object. To provide information for preservation management of an information object.

OAIS

Provenance Information (PDI)

Obligation

Optional

Use conditions

Need agent

Sub-Elements

Name

Obligation

Schemes

 

Action Agent

Mandatory

 
 

Action Date

Mandatory

ISO 8601

 

Action Type

Mandatory

 
 

Action Description

Optional

 
 

Next Action Due

Optional

ISO 8601

 

Next Acton

Optional

 

Notes

 

10. Use history

Definition

Dates and descriptions of the usage of a particular digital object.

Reason

To record all access and use of resources. To provide contextual information on how the record is or was used.

OAIS

Provenance Information (PDI)

Obligation

Optional

Use conditions

 

Sub-Elements

Name

Obligation

Schemes

 

Use Date

Optional

ISO 8601

 

Use Type

Optional

 
 

Use Description

Optional

-

Notes

In archival record-keeping contexts, Use History could be the means of maintaining an important 'audit trail' of record usage - vital to protect records from unauthorized or illegal access and use (e.g. National Archives of Australia 1999). In a digital preservation context, however, there is less emphasis on this aspect. It may, however, provide useful information on how a digital repository is used and feed back into other management processes.

11 . Authenticity Information

Definition

Metadata that will help prove the authenticity of an AIP, e.g. checksums, digital signatures, etc.

Reason

To provide basic authenticity data

OAIS

Fixity Information (PDI)

Obligation

Mandatory

Use conditions

-

Sub-Elements

Name

Obligation

Schemes

 

14.1 Checksum

Optional

MD5

Notes

I have only included an element relating to checksum information here. This is not meant to prevent other authentication elements being developed. This whole area is underdeveloped within the Cedars project and would probably benefit from some elaboration.

11.1 Checksum

Definition

The value of a checksum calculated according to the particular algorithm used by the digital archive.

Reason

Checksums might be used in a digital archive to help validate data integrity after storage, transfer or use.

It is supposed that these would be calculated by the digital archive according to a particular algorithm, e.g. MD5.

Obligation

Optional.

Conditions

This element must be used if 14.2 Checksum Value is present.

Assigned values

Value name (example)

Definition

 

MD5

-

Default value

-

Repeatable

?

Assigned by

System generated.

Schemes

e.g. MD5

Notes

For more information on MD5 see Rivest, R., 1992, The MD5 Message-Digest Algorithm. RFC 1321, April.
http://sunsite.doc.ic.ac.uk/rfc/rfc1321.txt

It is likely that the algorithms used for ensuring authenticity and integrity will change over time and so default values and schemes will also change.

Periodic checking of AIPs in a digital archive may also require a 'Date Checksum Next Checked' element for management purposes.

Example:

<authenticity>
<checksum scheme="MD5">bf4fe6e5e5c0519cf82710ddf66f2481</>
</authenticity>

4. References

Day, M., 1998, Metadata for Preservation. Cedars project document AIW01. <URL:http://www.ukoln.ac.uk/metadata/cedars/AIW01.html>

Holdsworth, D., 1998, Proposed architecture for CEDARS demonstrator. Cedars project document PSW02. <URL:http://www.personal.leeds.ac.uk/~ecldh/cedars/architecture.html>

Kunze, J., 1999, Encoding Dublin Core metadata in HTML. Internet-Draft, 18 March. <URL:ftp://ftp.ietf.org/internet-drafts/draft-kunze-dchtml-00.txt>

National Archives of Australia, 1999, Recordkeeping metadata standard for Commonwealth Agencies. Pre-publication version, 14 April.

Reich, L. and Sawyer, D., eds., 1999, Reference Model for an Open Archival Information System (OAIS). Consultative Committee for Space Data Systems, White Book, Issue 5 (CCSDS 650.0-W-5.0). Washington, D.C.: CCSDS Secretariat, National Aeronautics and Space Administration, Washington, D.C. <URL:http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html>

Stone, A. and Day, M., 1999, Cedars preservation metadata elements. Cedars project document AIW02. <URL:http://users.ox.ac.uk/~cedars/Papers/AIW02.html>

Weibel, S., 1999, The state of the Dublin Core Metadata Initiative, April 1999. D-Lib Magazine, April. <URL:http://www.dlib.org/dlib/april99/04weibel.html>

Weibel, S., Kunze, J., Lagoze, C. and Wolf, M., Dublin Core metadata for resource discovery. RFC 2413, September. <URL:http://www.hensa.ac.uk/ftp/mirrors/ftp.isi.edu/in-notes/rfc2413.txt>


5. Acknowledgements

UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Services Committee of the UK Higher Education Funding councils, as well as by project funding from JISC's eLib Programme and the European Union. UKOLN also receives support from the University of Bath, where it is based.


6. Document History

Version

Date

Comments

Version 1

21-Mar-1999

Draft for project only.

Version 2

20-May-1999

Draft for project only

Cedars is a CURL Project funded by the Joint Information Systems Committee through its Electronic Libraries Programme (eLib).


Created and maintained by: Michael Day of UKOLN: The UK Office for Library and Information Networking, University of Bath.
Page created: 21-Mar-1999.
Last updated: 20-May-1999.