RSLP 1/99

Collection Description

Study, Recommendation, Specification


A PROPOSAL SUBMITTED TO THE RESEARCH SUPPORT LIBRARIES PROGRAMME BY THE
UK OFFICE FOR LIBRARY AND INFORMATION NETWORKING, UNIVERSITY OF BATH.

18 JUNE 1999

Introduction

This proposal is submitted by UKOLN, University of Bath in response to RSLP 1/99. It aims to build on UKOLN’s existing work in the area of collection description by working with key libraries within the RSLP programme to refine, validate and promote a consistent human- and machine-readable approach to collection description.

It does not respond directly to the themes identified, rather it proposes a development and consensus making activity to support each theme. We hope that this approach is judged to be appropriate. We understand that this is a managed programme and suggest that there are benefits to be gained by some horizontal activities in areas where consistency is important. Accordingly, we intend this submission to be indicative of a programme of work that supports a shared approach to collection description.

The dates given in this proposal are shown in months, relative to the start of the project. Assuming that it starts in July 1999 then we anticipate completion of the initial deliverables of the project by the end of December 1999. These will include a collection description schema and concrete syntax, some ‘data entry guidelines’ describing its use, a simple Web-based tool enabling the creation of collection descriptions and a concertation day for RSLP project team members. This is an extremely short and challenging time frame in which to work but one that we consider to be viable, given a pragmatic approach to the issues involved.

Purpose of the Project

The description of collections will become increasingly important in the context of network library services and an important underpinning for developing a collective resource. This view has emerged clearly through our MODELS project, where it has influenced the course of the clumps and hybrid libraries who are working with collection and service descriptions, and in our current work on retrospective conversion for LIC, LINC and BLRIC. In the latter case, a strong view is emerging that libraries need to complement item-based description with description at a higher level. A particular feature of this discussion is that this would complement current work in the archives community and that descriptions at this shared level of granularity would facilitate cross-domain working (while acknowledging that ‘collections’ may mean different things in the different library and archive content models).

The creation of collection descriptions allows the owners or curators of collections to disclose information about their existence and availability to interested parties. Although collection descriptions may take the form of unstructured textual documents, for example a set of Web pages describing a collection, there are significant advantages in describing collections using structured, open and standardised formats. Such descriptions would enable:

There are additional advantages where catalogues do not exist for collections, as a collection description may provide some indication to the remote user of content and coverage.

So while the value of collection description is recognised, there is no standardised way of doing it. This is a potential danger. A project by project approach to the content and structure of such descriptions is potentially damaging to the overall service ambitions of the programme as it adds effort for users and managers.

In fact, we believe that the costs of not adopting a consistent, machine-readable description at an early stage may be significant. This cost will fall on users and managers of collections alike:

Work to be Attempted

UKOLN has developed a preliminary approach to collection description [1], it has experimented with this approach in describing the JISC Current Collections [2], and it has prepared a report that examines collection description in library, archive and museum domains [3]. The clumps projects are exploring how they will support collection description requirements using the approach developed here, and we are in discussion with several other initiatives (including some which have submitted EOIs to RSLP) about implementing and refining the approach. The work proposed here will:

  1. Refine our current approach based on a more thorough modelling of collections and their catalogues (issues here are further discussed below). This work will be carried out by UKOLN in association with Mike Heaney, Associate Director (Service Assessment, Planning and Provision), University Library Services Directorate, University of Oxford. The approach will be consistent with the emerging Dublin Core Version 2. Dublin Core [4] is an internationally agreed approach to resource description. Within version two, Dublin Core and current work in the rights management metadata area will be aligned, and DC will be based on a sounder content model. We intend that by associating this work with the Dublin Core initiative, we can improve the chances of widespread adoption, that we can benefit from wider input, and that we have the chance to influence standards in an increasingly important area. (Mike Heaney has done influential work on content models for bibliographic data that is similar to DC Version 2 discussions, and is familiar with the descriptive needs of large research collections.) This work will focus primarily on the needs of libraries in describing their collections but will also take into account the requirements of other sectors. We have some support from OCLC to carry out this work, so it will not be charged to the project.
  2. Specify a concrete syntax for the collection description schema. This is expected to be based on the Resource Description Framework [5][6], in line with current work on the Dublin Core. This work will also include the development of a simple, Web-based tool for creating collection descriptions. While it is hoped that this will not be the only tool available to projects, it is intended to be a useful, early, baseline mechanism for creating their descriptions.
  3. Validate the approach by working with RSLP projects and others to describe their collections. This activity would involve liaison, requirements analysis, and consensus making activity.
  4. Develop a prototype service based on the approach taken. This would involve working with libraries to develop descriptions of their collections and creating a searchable resource to provide access. We are well placed to assess ways in which this resource might interact with hybrid library and clump approaches (we work with these projects and are a member of Agora) and with the subject gateways (together with King’s, we are responsible for the Resource Discovery Network Centre, and we work with several projects in the area of resource discovery). We are in discussion over similar initiatives elsewhere. We would see this as a proof-of-concept, demonstrating the value of such an approach.

A Discussion of Collections and Descriptions

Our work suggests that requirements for collection description fall into three broad informational categories. Firstly, descriptive information about the collection. This may include the subject area, ownership, strengths and weaknesses and sources of items within the collection. We are keen to develop a fuller understanding of requirements in this area in association with the RSLP initiative. Our early discussions suggest that there would be advantage in working towards consensus in this area. Secondly, information about how to access the collection, including physical access, in the case of library, museum or archival collections for example, or networked access in the case of digital collections. Thirdly, the terms and conditions associated with access to the collection and individual items within it.

The term ‘collection’ can be applied to any aggregation of individual items. It is typically used to refer to collections of physical items, collections of digital surrogates of physical items, collections of ‘born-digital’ items and catalogues of such collections. Collections are exemplified in the following, non-exhaustive, list: library collections; museum collections; archives; library, museum and archival catalogues; digital archives; Internet directories (e.g. Yahoo); Internet subject gateways (e.g. SOSIG, OMNI, ADAM, EEVL, etc.); Web indexes (e.g. Alta Vista); collections of text, images, sounds, datasets, software, other material or combinations of these (this includes databases, CD-ROMs and collections of Web resources); collections of events (e.g. the Follett Lecture Series); other collections of physical items.

This is a broad list, of overlapping categories. However, it suggests the need for a planned approach, both so that techniques adopted fit in well with broader resource discovery directions and so that techniques are flexible enough to cope with the many collection types that libraries will develop and indicate the relationships between them. It is worth noting that the list includes collections of physical items and collections of digital items. In some cases, the digital items are surrogates of physical items, in others the digital items are the primary (only) manifestation of the item. It is also worth noting that some collections are actually catalogues (metadata) for other collections. For example, a library catalogue typically describes the items in one or more collections within a library. Finally, it is worth noting that collections are often composed of other collections.

Schedule of Work

This work will be carried out in four overlapping phases (corresponding to the four areas of work outlined above) lasting a total of 15 months.

Deliverables

The following deliverables will be made during the project:

Project Plan
a document providing details of the project’s external deliverables, delivery dates and work schedules. The project plan will include details of the balance of effort between UKOLN and Mike Heaney during the first phase of the project (note, however, that this effort is not being charged to the project). This deliverable will be internal to the project and will be completed during the first month.
Draft Collection Description Schema
a document providing a draft version of the collection description schema (for comment by various parties) and the content model upon which it is based. This will be delivered by the end of the fourth month.
Collection Description Schema
the final version of the above document, delivered by the end of the sixth month. The schema is seen as the key deliverable of the project.
Collection Description Editor
a simple Web-based tool. This will be based on DC-dot [7], a Web-based Dublin Core generator and editor, and will be delivered by the end of the sixth month. (A version of the tool will be available for demonstration during the collection description concertation day.)
Collection Description Concertation Day
a one-day workshop for RSLP project team members and relevant library staff. This will be organised during the fifth month of the project.
Data Entry Guidelines
a document providing guidance for those using the collection description schema. This document will be aimed at RSLP project team members, library staff and other interested parties. A draft version of this document will be available for the collection description concertation day, the final version being delivered at the end of the sixth month.
Prototype Search Service
a Web-based demonstrator, enabling searches to be made across a range of collection descriptions gathered from RSLP projects. This will be developed and run throughout the third phase of the project. During this phase, UKOLN will also provide advice and assistance to RSLP projects wishing to describe collections.
Final Report
a document describing the findings of the project, including a final version of the collection description schema, based on experience gained during the demonstrator.

Dissemination

Dissemination is integral to the success of this project. Clearly, the primary focus for this work will be within the RSLP programme. However, UKOLN will also disseminate information about the project more widely, maximising the benefits of the deliverables to the communities that have an interest in collection description and opening the developing collection description schema to more widespread scrutiny. Dissemination will largely be carried out by the Interoperability Focus [8], taking advantage of relationships with key communities including CIMI, the cultural heritage community, the LIC, libraries and library systems suppliers. UKOLN have close links with the Dublin Core Metadata Initiative, including staff membership of both the DC Technical Advisory Committee and the DC Policy Advisory Committee. Project outcomes will be disseminated during the normal course of our work but in addition, we propose the following specific activities:

References

  1. Collection Description Working Group – a report on work in progress
    <http://www.ukoln.ac.uk/metadata/cld/wg-report/>
  2. JISC Current Content Collection – demonstration ROADS database
    <http://roads.ukoln.ac.uk/jisc-ccc/cgi-bin/search.pl>
  3. Collection Level Description – A Review of Existing Practice
    <http://www.ukoln.ac.uk/metadata/cld/study/toc/>
  4. Dublin Core Metadata Initiative
    <http://purl.org/dc/>
  5. Resource Description Framework (RDF) Model and Syntax Specification
    <http://www.w3.org/TR/REC-rdf-syntax/>
  6. Resource Description Framework (RDF) Schema Specification
    <http://www.w3.org/TR/PR-rdf-schema/>
  7. DC-dot
    <http://www.ukoln.ac.uk/metadata/dcdot/>
  8. UK Interoperability Focus
    <http://www.ukoln.ac.uk/interop-focus/>

Maintained by: Andy Powell
Last modified: 3-September-1999