Michael Day
UKOLN: The UK Office for Library and Information Networking, University
of Bath, Bath BA2 7AY, UK.
m.day@ukoln.ac.uk
http://www.ukoln.ac.uk/
|
Status: Draft |
Last update: 20-May-1999 |
Created: 21-Mar-1999 |
Availability: Project |
Cedars Project Document AIW02 (Stone and Day 1999) was an attempt to define metadata elements for use in the Cedars project. The first draft of AIW02 was based on an analysis of the selected initiatives identified in Cedars Project Document AIW01 (Day 1998) and the proposed elements were intended to broadly conform with the reference model for an Open Archival Information System (OAIS) being developed by ISO (Reich and Sawyer 1999). This document (AIW03) attempts to refine and further develop the definitions of Preservation Description Information (PDI) metadata elements produced in AIW02.
One initial issue was defining the exact relationship between PDI and Descriptive Information. The OAIS document defines Descriptive Information as the "set of information, consisting primarily of Package Descriptions, which is provided to Data Management to support the finding of the preserved information by consumers" (Reich and Sawyer 1999, p. 9). PDI, on the other hand, is defined as "information necessary to adequately preserve the Content Information" and is characterised as Reference, Context, Provenance and Fixity Information (Reich and Sawyer 1999, p. 11). The same information (metadata) may - depending upon its function - be seen either as PDI or Descriptive information. So, for example, PDI Reference Information will usually contain titles or identifiers that could also be used for resource discovery purposes.
On the other hand, in the OAIS model PDI and Descriptive Information exist at different levels. PDI describes Content Information while Descriptive Information describes a whole Information Package. In OAIS terms, each Archival Information Package (AIP) "is associated with a structured form of Descriptive Information called the Package description" (Reich and Sawyer, p. 61).
The Package Description must contain at least one Associated Description that supplies data for a Retrieval Aid that allows authorized users to retrieve the Content Information and PDI described by the Package Description. This Retrieval Aid is generally part [of] the Archival Storage functional area and translates from the unique identifier assigned by the OAIS to identify the AISP into the set of operations and filenames needed to retrieve the AIP from the file management system used in Archival Storage and returns the Content Information and PDI for the requested AIP (ibid., pp. 61-62).
The Package Descriptions need to be used for both resource discovery (OAIS Finding Aids) and for Ordering Aids that allow users "to specify transformations to be applied to the AIPs prior to dissemination".
This implies that PDI and Package Descriptions need to be kept distinct.
Many of the elements identified as PDI or Descriptive Information within Cedars AIW02 bear a close resemblance to those that exist within other description schemas. Most descriptive bibliographic-type information could be handled using existing formats and tools, e.g. using TEI headers or MARC. It is important that Cedars does not spend too much time re-inventing wheels.
Many of the elements identified in AIW02 correspond to one of the fifteen elements defined by the Dublin Core (DC) initiative. Many of the issues being currently addressed by DC - which primarily originated in discussions of qualifiers and the RDF data model (Weibel 1999) - are also relevant in a Cedars context. For example, debates over whether elements that identify personal or corporate names (DC.Creator, DC.Contributor and DC.Publisher) should be amalgamated into one "Agent" element whereby the actual role taken would become a qualifier rather than an element. Again, there has been debate about the precise link between DC.Relation and DC.Source. At the present time, so-called DC-simple has been defined in RFC 2413 (Weibel et al. 1998) while details on how to encode DC in HTML is available as an Internet-Draft (Kunze 1999). Adopting - where possible - existing DC interpretations of core elements may be one way of simplifying what Cedars needs to define in its metadata schema.
[TBA]
[TBA]
It is assumed that:
Additionally:
This more worked-out list of Cedars PDI elements is based on a broad framework developed by the National Archives of Australia in its Recordkeeping metadata standard for Commonwealth Agencies (1999).
Definition |
Any identifier assigned by the repository itself or by an external agency. |
||
|
Reason |
To assist in uniquely identifying a resource within a repository. Resource discovery. |
||
|
OAIS |
Reference Information (PDI) Descriptive Information |
||
|
Obligation |
Mandatory |
||
|
Use conditions |
All digital objects in a repository should be given a unique and persistent identifier - this may be a Cedars Identifier (CRID) or other identifier (e.g. URN). Other relevant identifiers - e.g. those previously assigned to the resource - should also be added to aid resource discovery. Use a separate Identifier element for each additional identifier. |
||
|
Sub-Elements |
Name |
Obligation |
Schemes |
|
1.1 Identifier Type |
Mandatory |
? |
|
|
Notes |
The Identifier element needs to be repeatable - so the syntax would need to reflect this. For example the 2.1 Identifier Type could be embedded within the tag :foo:8573705137518375</id> |
||
Definition |
The type of identifier used in 2.2 Identifier Value. |
|
|
Reason |
The identifier value may not give any clues to its origin, |
|
|
Obligation |
Mandatory. |
|
|
Conditions |
Identifier types should be selected from a list of assigned values which would need to be kept up to date. |
|
|
Assigned values |
Value name (examples) |
Definition |
|
CRID |
Cedars Identifier - |
|
|
URN |
TBA |
|
|
DOI |
TBA |
|
|
ISBN |
TBA |
|
|
ISSN |
TBA |
|
|
SICI |
TBA |
|
|
Default value |
CRID? |
|
|
Repeatable |
Yes. |
|
|
Assigned by |
Identifiers assigned by the repository will need to be system provided (?) at ingest. Other (existing) identifiers would need to be |
|
|
Schemes |
? |
|
|
Notes |
|
|
Definition |
A name given to the Content Information |
||
|
Reason |
To assist in identifying a resource. Resource discovery. |
||
|
OAIS |
Reference Information (PDI) Descriptive Information |
||
|
Obligation |
Mandatory |
||
|
Use conditions |
- |
||
|
Sub-Elements |
Name |
Obligation |
Schemes |
|
Title |
Mandatory |
- |
|
|
Alternative Title |
Optional |
- |
|
|
Version or Edition Number |
Optional |
- |
|
|
Notes |
Example: |
||
Definition |
The form of words used to name the Content Information |
|
Reason |
Resource discovery. |
|
Obligation |
Mandatory |
|
Conditions |
- |
|
Assigned values |
- |
|
Default value |
- |
|
Repeatable |
Yes |
|
Assigned by |
Manually entered by repository staff at ingest, possibly converted from existing metadata. |
|
Schemes |
Free text |
|
Notes |
- |
Definition |
Any form of words used to name the Content Information that differs from 3.1 Title |
|
Reason |
Resource discovery. |
|
Obligation |
Optional |
|
Conditions |
- |
|
Assigned values |
- |
|
Default value |
- |
|
Repeatable |
Yes |
|
Assigned by |
Manually entered by repository staff at ingest, possibly converted from existing metadata. |
|
Schemes |
Free text |
|
Notes |
- |
Definition |
The version or edition number of the Content Information where relevant. |
|
Reason |
Resource discovery. |
|
Obligation |
Optional |
|
Conditions |
- |
|
Assigned values |
- |
|
Default value |
- |
|
Repeatable |
Yes |
|
Assigned by |
Manually entered by repository staff at ingest, possibly converted from existing metadata. |
|
Schemes |
Free text |
|
Notes |
There may be a requirement for standard abbreviations, e.g.: "v. 1.01", "2nd ed.", etc. |
Definition |
A link between one item and another |
||
|
Reason |
To provide contextual information about a resource by documenting its relationships with other resources both inside the repository and outside. |
||
|
OAIS |
Context Information (PDI) |
||
|
Obligation |
Optional |
||
|
Use conditions |
Types of relationships include: levels of aggregation (contains, contained in), place in sequence, sources, etc. |
||
|
Sub-Elements |
Name |
Obligation |
Schemes |
|
Related Item ID |
Mandatory |
||
|
Relation Type |
Mandatory |
||
|
Relation Description |
Optional |
||
|
Notes |
|||
Definition |
The subject or topic of a resource. |
||
|
Reason |
Discovery access point. |
||
|
OAIS |
Context Information (PDI) Descriptive Information |
||
|
Obligation |
Optional |
||
|
Use conditions |
- |
||
|
Sub-Elements |
Name |
Obligation |
Schemes |
|
Subject Scheme |
Optional |
Defined |
|
|
Subject Term |
Optional |
||
|
Keywords |
Optional |
||
|
Notes |
|||
Definition |
"The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity" - Dublin Core (RFC 2413) |
||
|
Reason |
To record the person or organisation responsible for making the resource available. This is both a |
||
|
OAIS |
Provenance Information (PDI) Descriptive Information |
||
|
Obligation |
Optional |
||
|
Use conditions |
- |
||
|
Sub-Elements |
Name |
Obligation |
Schemes |
|
Publisher Name |
Optional |
||
|
Publisher Contact Details |
Optional |
||
|
Notes |
- |
||
Definition |
Details of any publisher agreements, legislation, etc. that restrict access to resources or dictate particular restrictions with regard to preservation strategies. |
||
|
Reason |
To record any agreements made with rights holders, to help with management of preservation role and to help manage access and use of resource. |
||
|
OAIS |
Provenance Information (PDI) |
||
|
Obligation |
Mandatory |
||
|
Use conditions |
|||
|
Sub-Elements |
Name |
Obligation |
Schemes |
|
7.1 Rights Owner (Agent) |
Mandatory |
- |
|
|
7.2 Rights Agreement Description |
Mandatory |
- |
|
|
7.3 Rights Agreement Date |
Mandatory |
ISO 8601 |
|
|
7.4 Copyright Statement |
Optional |
- |
|
|
7.5 Rights Review Description |
|||
|
7.6 Rights Review Date |
Mandatory |
ISO 8601 |
|
|
Notes |
The management of intellectual property rights will be an important feature of any digital archive. The elements identified here are given by way of illustration and are not intended to be definitive. |
||
Definition |
A brief textual description of any agreement or contract made with rights owners at the time of ingest into an archive. |
|
|
Reason |
To summarise rights agreements in a standardised way so that archive staff and archive users can quickly assess whether access (or any other archive operation) is permissible without recourse to the original legal documents. |
|
|
Obligation |
Optional |
|
|
Conditions |
? |
|
|
Assigned values |
Value name |
Definition |
|
Default value |
Free text |
|
|
Repeatable |
||
|
Assigned by |
Initially by repository staff at time of ingest, but will |
|
|
Schemes |
||
|
Notes |
||
Definition |
|||
|
Reason |
|||
|
OAIS |
Provenance Information (PDI) |
||
|
Obligation |
|||
|
Use conditions |
|||
|
Sub-Elements |
Name |
Obligation |
Schemes |
|
Notes |
This element is concerned with the physical |
||
Definition |
Dates and description of the ingest process as managed by the repository. |
||
|
Reason |
To record the ingest process by which resources are received by ... |
||
|
OAIS |
Provenance Information (PDI) |
||
|
Obligation |
Mandatory |
||
|
Use conditions |
|||
|
Sub-Elements |
Name |
Obligation |
Schemes |
|
Ingest Agent |
|||
|
Ingest Date |
Mandatory |
ISO 8601 |
|
|
Depositor |
|||
|
Deposit Agreement |
|||
|
Repository-name |
|||
|
Selection criteria |
|||
|
Metadata creator |
|||
|
Metadata creation date |
ISO 8601 |
||
|
Capture Procedure |
|||
|
Notes |
|||
Definition |
Dates and descriptions of all preservation actions carried out after initial ingest and registration. This will include recording media refreshing and format migrations over time. |
||
|
Reason |
To provide a complete record of preservation actions carried out on an information object. To provide information for preservation management of an information object. |
||
|
OAIS |
Provenance Information (PDI) |
||
|
Obligation |
Optional |
||
|
Use conditions |
Need agent |
||
|
Sub-Elements |
Name |
Obligation |
Schemes |
|
Action Agent |
Mandatory |
||
|
Action Date |
Mandatory |
ISO 8601 |
|
|
Action Type |
Mandatory |
||
|
Action Description |
Optional |
||
|
Next Action Due |
Optional |
ISO 8601 |
|
|
Next Acton |
Optional |
||
|
Notes |
|||
Definition |
Dates and descriptions of the usage of a particular digital object. |
||
|
Reason |
To record all access and use of resources. To provide contextual information on how the record is or was used. |
||
|
OAIS |
Provenance Information (PDI) |
||
|
Obligation |
Optional |
||
|
Use conditions |
|||
|
Sub-Elements |
Name |
Obligation |
Schemes |
|
Use Date |
Optional |
ISO 8601 |
|
|
Use Type |
Optional |
||
|
Use Description |
Optional |
- |
|
|
Notes |
In archival record-keeping contexts, Use History could be the means of maintaining an important 'audit trail' of record usage - vital to protect records from unauthorized or illegal access and use (e.g. National Archives of Australia 1999). In a digital preservation context, however, there is less emphasis on this aspect. It may, however, provide useful information on how a digital repository is used and feed back into other management processes. |
||
Definition |
Metadata that will help prove the authenticity of an AIP, e.g. checksums, digital signatures, etc. |
||
|
Reason |
To provide basic authenticity data |
||
|
OAIS |
Fixity Information (PDI) |
||
|
Obligation |
Mandatory |
||
|
Use conditions |
- |
||
|
Sub-Elements |
Name |
Obligation |
Schemes |
|
14.1 Checksum |
Optional |
MD5 |
|
|
Notes |
I have only included an element relating to checksum information here. This is not meant to prevent other authentication elements being developed. This whole area is underdeveloped within the Cedars project and would probably benefit from some elaboration. |
||
Definition |
The value of a checksum calculated according to the particular algorithm used by the digital archive. |
|
|
Reason |
Checksums might be used in a digital archive to help validate data integrity after storage, transfer or use. It is supposed that these would be calculated by the digital archive according to a particular algorithm, e.g. MD5. |
|
|
Obligation |
Optional. |
|
|
Conditions |
This element must be used if 14.2 Checksum Value is present. |
|
|
Assigned values |
Value name (example) |
Definition |
|
MD5 |
- |
|
|
Default value |
- |
|
|
Repeatable |
? |
|
|
Assigned by |
System generated. |
|
|
Schemes |
e.g. MD5 |
|
|
Notes |
For more information on MD5 see Rivest, R., 1992, The MD5 Message-Digest Algorithm. RFC 1321, April. http://sunsite.doc.ic.ac.uk/rfc/rfc1321.txt
It is likely that the algorithms used for ensuring authenticity and integrity will change over time and so default values and schemes will also change. Periodic checking of AIPs in a digital archive may also require a 'Date Checksum Next Checked' element for management purposes. Example: |
|
Day, M., 1998, Metadata for Preservation. Cedars project document AIW01. <URL:http://www.ukoln.ac.uk/metadata/cedars/AIW01.html>
Holdsworth, D., 1998, Proposed architecture for CEDARS demonstrator. Cedars project document PSW02. <URL:http://www.personal.leeds.ac.uk/~ecldh/cedars/architecture.html>
Kunze, J., 1999, Encoding Dublin Core metadata in HTML. Internet-Draft, 18 March. <URL:ftp://ftp.ietf.org/internet-drafts/draft-kunze-dchtml-00.txt>
National Archives of Australia, 1999, Recordkeeping metadata standard for Commonwealth Agencies. Pre-publication version, 14 April.
Reich, L. and Sawyer, D., eds., 1999, Reference Model for an Open Archival Information System (OAIS). Consultative Committee for Space Data Systems, White Book, Issue 5 (CCSDS 650.0-W-5.0). Washington, D.C.: CCSDS Secretariat, National Aeronautics and Space Administration, Washington, D.C. <URL:http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html>
Stone, A. and Day, M., 1999, Cedars preservation metadata elements. Cedars project document AIW02. <URL:http://users.ox.ac.uk/~cedars/Papers/AIW02.html>
Weibel, S., 1999, The state of the Dublin Core Metadata Initiative, April 1999. D-Lib Magazine, April. <URL:http://www.dlib.org/dlib/april99/04weibel.html>
Weibel, S., Kunze, J., Lagoze, C. and Wolf, M., Dublin Core metadata for resource discovery. RFC 2413, September. <URL:http://www.hensa.ac.uk/ftp/mirrors/ftp.isi.edu/in-notes/rfc2413.txt>
UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Services Committee of the UK Higher Education Funding councils, as well as by project funding from JISC's eLib Programme and the European Union. UKOLN also receives support from the University of Bath, where it is based.
|
Version |
Date |
Comments |
|
Version 1 |
21-Mar-1999 |
Draft for project only. |
|
Version 2 |
20-May-1999 |
Draft for project only |
Cedars is a CURL Project funded by the Joint Information Systems Committee through its Electronic Libraries Programme (eLib).
Created and maintained by: Michael Day of UKOLN: The UK Office for Library and Information Networking, University of Bath.
Page created: 21-Mar-1999.
Last updated: 20-May-1999.