Collection Description Focus, Guidance Paper 1 UKOLN

Creating reusable collection-level descriptions



1. The problem

A collection-level description provides an overview of an aggregation of items, and such an overview may be useful in a wide range of contexts. Resources such as directories and guides of various forms often incorporate sets of collection-level descriptions, perhaps gathered together on the basis of the geographical location of the collections or the subject material of the items in the collections. Collection-level description may be used to inform decision-making by resource managers as well as to enable potential users to discover collections of interest.

So, collection-level descriptions may be created to support various functions and operations. Even within one functional area, however, it may be desirable to reuse the content of a description by presenting it in different forms, perhaps via different channels or in different media (e.g. the same descriptive content may be presented through a Web site and as part of a printed directory).

Furthermore, current trends highlight a separation in digital information architectures between the functions of content provision and presentation. Users seek access to resources through channels that are convenient for their activities. Like other types of metadata record, collection-level descriptions may be created, maintained and made available by one party, but presented to users by many different services, including services developed by other parties. The "scope" of such a service may be considerably broader than that of a service provided by the creator of the description. For example, collection-level description records may be deployed within the context of an institutional directory but may also be used in databases which are regional, national or international in their coverage.

The administrator or owner of a collection should not be required to re-describe their holdings each time a collection-level description is required. The requirement for "reusability" is, then, an important consideration in the creation of collection-level description records. Sometimes the context of reuse may be under the control of the creator, but increasingly it may not. If collection-level description records are to be reused effectively in different contexts, and the meanings intended by their creators are to be preserved in these new contexts, then some minimum requirements should be taken into consideration.

2. How to make your collection-level descriptions reusable

2.1. Separate presentation from structure and content

Collection-level descriptions will be delivered to users through a range of channels and media, and the requirements for presentation will vary accordingly. The user functionality available through a Web browser is different from that of reading a printed page, for example, and the presentation of information in different media reflects that. The information content of a record may be presented in different combinations in different channels and media, or it may be presented selectively, and it may be appropriate to vary the sequence of presentation of its component parts or to alter its labelling. Even within a single channel or medium, presentation may be varied to meet the requirements of different classes of user - for example, to enhance accessibility for visually-impaired readers or to address other problems for users with disabilities [1].

Where a record is presented through a separate service, control of presentation is largely delegated to the managers of that service. In such a service, some of the contextual information (e.g. headers and footers of a Web page, the juxtaposition with other information) may be quite different from that of the original context of delivery where presentation was under the control of the creator.

Recommendation: Ensure that, as far as possible, you do not rely on presentational techniques in order for the content of your collection-level descriptions to retain its meaning.

2.2. Create sufficient information

It is easier to generate a record that is structurally simpler or more limited in its level of detail from a richer, more detailed record than vice versa. For collection-level description, this may be a consideration in two areas:

  • Specifying the "structural" granularity of each descriptive record. If you may wish to manipulate (e.g. for indexing or searching, or for flexibility of presentation) a unit of data then structure your descriptive data accordingly. For example, you may find it useful to separate out the start and end dates of date ranges, or the component parts of personal names. It is easier for a software tool to concatenate separate fields into a single field than it is to attempt to deduce the implicit structure of a single field. Remember that some metadata schemas for collection-level description are (intentionally) simple in structure, and you may wish to store and manage your descriptive data at a finer level of detail.
  • Deciding the level of aggregation for which to create collection-level descriptions. The principle of "functional granularity" provides for the creation of descriptions for whatever level of aggregate it is useful to describe. If records are created for both a collection and its significant "sub-collections", then it is possible to choose between presenting only the "super-collection" record (while filtering out the more detailed "sub-collection" records), or presenting the more detailed "hierarchical" view. If the sub-collection records were not created then only that high level view is available.

However, creating collection-level descriptions, like other classes of metadata, is potentially time-consuming and costly. The possible flexibility gained from having richer data must always be balanced against the additional effort required to create it. Whether a detailed hierarchical view of collections and sub-collections is required or whether the single record view is sufficient depends on the context of use.

Recommendation: Establish what is an appropriate granularity of description, both in terms of the structural granularity within an individual record and in deciding what should be described as a 'collection'. Consider creating collection-level description metadata at a finer level of granularity, within the constraints of what is feasible.

2.3. Use shared semantics provided by standard schemas

If collection-level description records are to be exchanged effectively, then the recipient of the record (e.g. a remote service) and the provider of the record must have a shared understanding of its meaning. Such shared understanding is made possible by consensus on meanings that have been made explicit and recorded in standard schemas. Further, it is important not only to use those shared semantics but to declare in each "instance" record that you are doing so.

Recommendation: Make your collection-level descriptions available using the semantics of commonly understood metadata schemas for collection-level description, and declare clearly which schema is in use.

2.4. Use appropriate content standards for descriptive elements

The use of a collection-level description record to support effective resource discovery depends on the consistent application of standards for the construction of elements such as names, subjects and dates. This becomes more significant where resource discovery services are operating across collection-level descriptions and item-level descriptions where content standards such as AACR2 have been applied. As for the use of semantic schemas, the use of content and terminological standards should be declared so they are umambiguously identifiable to a recipient of the record. It may be the case that the use of a particular content standard is implicit from the choice of semantic schema, but wherever ambiguity is possible the use of the standard should be declared in the record.

Recommendation: Adopt commonly understood content standards and/or terminological control when making your collection-level descriptions available, and include in the metadata record an indication of the standard in use.

2.5. Make your collection-level description records available in machine-readable form

You will probably be creating your collection-level descriptions with the aim of providing access to them in the form of HTML documents through your own Web site (or that of another service provider). However you should also make them available in forms which other software tools can use.

The precise requirements for this may vary depending on the interfaces supported by the services with which descriptions are to be shared. Requirements may include:

  • exporting or exposing your descriptions as one or more XML documents conforming to a specified XML DTD or XML Schema
  • supporting distributed searching of your descriptions by making them available via a Z39.50 target [2]
  • supporting harvesting of your descriptions by making them available via the Open Archives Initiative Protocol for Metadata Harvesting [3]
  • supporting applications based on messaging using the Simple Object Access Protocol (SOAP) [4]

As an illustration of what may be required, it may be useful to consult the guidelines on the technical steps which content providers are recommended to take to interoperate with the JISC Information Environment [5]. It must be emphasised that this is provided as an example only: other services may well have different requirements, but this document gives an indication of what may be needed.

Recommendation: Establish the requirements for providing "structured" machine-oriented access to your descriptions.

2.6. Provide information about your collection-level descriptions

When a collection-level description record is presented in a context you control (e.g. via your own Web site), a user may derive additional information from that context e.g. information about the source of the description which influences how the user evaluates the descriptive content. If your records are to be used in services over which you have limited control, you may find that you need to provide information about your collection-level descriptions as information resources (as opposed to the information about the collections contained in the CLD). This may be a subset of what is sometimes called "administrative metadata" and might include:

  • Provenance data: information about the creation of the content of the CLD, so that an end user can answer the questions, "Who said this about this collection? When was this information last updated?"; it might include a pointer to a logo for purposes of "branding" your description when it is displayed. Where verifying the provenance of a CLD is of particular importance, the use of digital signatures might be considered.
  • Rights data: information on how the CLD (rather than the collection itself) may be used. There are a number of 'rights expression languages' which permit the creation of machine-readable statements about the rights and conditions associated with digital resources - in this case, with the collection-level description itself.
  • Audience data: information on who the CLD is intended for.
Recommendation: Provide metadata about your descriptions.

3. Some issues

3.1. Content and audience

Some parts of a collection-level description are typically "free text" descriptions of charcteristics of a collection. Such text should be neutral and concise, and its creators should bear in mind that it may appear in different contexts. However, there may be cases where it is desirable to tailor the content or style of such text for a specific audience or to address a specific purpose. Description tailored for an academic researcher may be inappropriate for presentation to a more general audience.

3.2. Identification of collections

An individual collection is an identifiable resource. If the scope of a service based on the use of collection-level descriptions is potentially global, then the identifier of a collection should be globally unique. Further, one of the valuable elements of collection-level description is the capacity to describe relationships between collections. Those relationships may be of various types but their description depends on the ability to cite a persistent identifier for a collection.

3.3. Administrative metadata

There is no general agreement on the requirements for administrative metadata, because metadata records themselves vary in complexity and are used and managed in different ways. Indeed some metadata schemas incorporate some administrative metadata. However, there are some efforts, for example within the Dublin Core Metadata Initiative [6], to identify simple, generic element sets to serve as administrative metadata schemas.

4. Summary

  1. Separate presentation from structure and content
  2. Create sufficient information
  3. Use shared semantics provided by standard schemas (and declare the usage of those schemas)
  4. Use appropriate content standards for descriptive elements (and declare the usage of those content standards)
  5. Make your collection-level description records available in machine-readable form
  6. Provide information about your collection-level descriptions

Acknowledgements

Much of the content included here has been developed from discussions with participants in the Collection Description Focus Workshop series.

References

[1] Chisholm, Wendy, Vaderheiden, Gregg & Jacobs, Ian, eds. Web Content Accessibility Guidelines 1.0. W3C Recommendation 5 May 1999. Available at http://www.w3.org/TR/WAI-WEBCONTENT/

[2] Library of Congress Network Development & MARC Standards Office. Z39.50 Maintenance Agency Page. Available at http://lcweb.loc.gov/z3950/agency/

[3] Lagoze, Carl, et al, eds. Open Archives Initiative Protocol for Metadata Harvesting. Available at http://www.openarchives.org/OAI/openarchivesprotocol.html

[4] Box, Don, et al. Simple Object Access Protocol (SOAP) 1.1. W3C Note 8 May 2000. Available at http://www.w3.org/TR/SOAP/

[5] Powell, Andy. 5 step guide to becoming a content provider in the JISC Information Environment. Ariadne 33 (Sep/Oct 2002). Available at http://www.ariadne.ac.uk/issue33/info-environment/
For more information on the technical archictecture for the JISC Information environment, see http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/

[6] Dublin Core Metadata Initiative. Administrative Metadata Working Group. http://dublincore.org/groups/admin/