Beyond the Beginning: The Global Digital Library

Previous Paper Next Paper



Director, UKOLN, United Kingdom


The UK Higher Education community benefits from the ability to centrally plan and fund services and initiatives which improve overall provision of and access to network services and information systems. In this context, the JISC (Joint Information Systems Committee of the Higher Education Funding Councils) has funded a range of metadata activities, including several subject-specific gateways to network resources. This paper reviews current provision in this area and notes explicit planning initiatives. The author speculates on preferred future directions.


Metadata is data which describes the attributes of a resource. It can therefore be bibliographic data but may also include other description related to content, terms and conditions for use, coverage and technical or access characteristics. Metadata supports the processes of resource discovery, selection, evaluation, documentation and management.

Metadata has always existed: so what are the new issues which are attracting such wide attention? A major factor is the need to create a viable digital environment capable of dealing with the many processes implemented in software and providing access to the many resources on the network. Metadata means that users (human or program) do not have to know all the characteristics of a resource in advance in order to retrieve it.


Interest in the development of metadata can be divided broadly into three "cultural" sectors:


There has been activity in higher education in the archives, libraries and data archives areas, reflecting an interest in providing better managed Web services.

The Arts and Humanities Data Service (AHDS) has an executive and several service providers organised in a disciplinary way which is sensitive to the maintenance of curatorial traditions and perspectives. AHDS has expressed a desire to establish a service which will search across catalogues emerging from different domains and developed according to different principles.

Within the curatorial sector, there is an emphasis on Z39.50 as a vehicle for providing access to different domains, each with own curatorial tradition. This is a substantial area of research and development at present, with a significant "bundle" of activity directed toward investigation of the issue. One suggestion under consideration is the development of a simple layer discovery record for exportation, which could assist interoperability.

The eLib development programme includes clumps, cross-domain clumps and hybrid libraries. Implicit in this is the investigation of ways of looking at providing access across a range of sources: libraries, archives, museums and galleries. There is exploration of how to use Z39.50 as an SR protocol in this context. Hybrid libraries are looking at ways of integrating access to range of print and electronic resources to which libraries need to provide access.

In the Web management area there is a growing feeling that metadata is very important, but uncertainty about which direction to move in. An illustration of this concerns the embedding or associating of the Dublin Core. The status of the Core is partly a result of the consensual activities and championing of OCLC. It is seen by many as a format with a potential to be widely used. However, a counter-argument suggest that there may be little point if important Web engines such as AltaVista do not incorporate it, for it will remain largely unused.

A further issue concerns the need to undertake some harvesting of metadata throughout the United Kingdom. In creating metadata-aware robots, it may be better to avoid a "vacuum cleaner" approach. A productive way forward which is under discussion, could be to focus on making eLib projects metadata-friendly. A further possibility is that the management teams of universities will provide prospectus information with metadata. Public library digital sites may provide another usable target: there has been discussion in the context of Project EARL of harvesting from these sites in a managed way.

Areas of development in resource discovery include:

Where to next?

A spectrum of approaches of increasing complexity is being established. This involves discussion of a variety of encoding issues and a range of formats in the Web environment. At the one end of the spectrum lie more structured and semantically rich formats, for example SGML DTDs such as the Text Encoding Initiative (TEI) or on the other hand MARC, which provides a "rococo architecture" for describing bibliographic resources. This is represented diagrammatically in figure 1.

Simple unstructured indexes

Dublin Core


Web collections
PICS-ng, …

arrow pointing to right
Expense, Structure, Intellectual Effort

Figure 1: Metadata Approaches

Also currently being evaluated is a range of proposals for ways of aggregating catalogue access using Z39.50 and the cross-searching capabilities of archival data stores.

The museum sector might be seen as "swimming in the deep end of the pool" in the context of metadata. However, it is also demonstrating interest in sharing with others.

There has been discussion about establishing frameworks, including the ANIR Report which suggested setting up UK agency to promote more effective resource discovery, although nothing as yet has happened.

Within the curatorial tradition no-one knows what will happen. A convergence of interest is occurring at the "shallow end of the pool", although the focus of harmonisation is not yet clear. There are considerable benefits to be gleaned from taking central action, including the opportunities to capitalise on enthusiasms, provide a framework to align thinking and benefit from shared expertise. The most positive development in the construction phase could be establishment of a national agency for resource discovery which would seek to ensure effective disclosure and discovery. On the principle of NARD, this might be called the HERD (Higher Education Resource Discovery) Agency .

[63] This account was drafted for this report by The Marc Fresko Consultancy. It is based on notes taken during the presentation, and slides used by the speaker.

Previous Paper Next Paper