NOF-digitise Technical Advisory Service
NOF-digitise projects are in the process of creating valuable new digital resources to support lifelong learning. As part of that process, projects are required to create descriptions of those resources in the form of "metadata records", which can then be used by services such as the NOF portal to enable users to find your resources.
Metadata records might describe individual items (such as documents, images, films, and sound recordings) or they might describe "collections" of such items as a whole. Such "collection-level descriptions" are useful in the context of the NOF-digitise programme because they can provide an overview of a large number of items, and such overviews may be useful where large amounts of item-level detail may not be appropriate.
NOF-digitise projects are encouraged to provide collection-level descriptions for their resources:
The managers of the valuable resources held by museums, archives and libraries compile descriptions of their holdings both so that they can administer and control those resources more effectively and in order to enhance access to those resources by disclosing information about their existence and availability to potential users. The channels through which resource managers provide that information, and through which potential users seek information about resources are, increasingly, digital. Initiatives such as the NOF-digitise programme enable the creation of new digital resources, which, like their physical counterparts, must be managed and made accessible to users.
Librarians, archivists and museum curators have all considered the items within their custodianship as forming groups or "collections" of some form. However, the criteria by which they define these groupings, and the emphasis placed on the creation and use of descriptions of those groupings, has varied widely. Different ideas about "collections" have led to different approaches to "collection description". For archivists, the individual item is an integral part of a group of items that forms the record of an individual or organisation, and the description of such aggregates forms a fundamental (and standardised) element of descriptive practice. Traditionally, librarians have concentrated on the description of individual items. The notion of the "collection" is certainly present, with aggregates defined perhaps by various criteria including location, subject, form or use, but the descriptions of these aggregates tend to be more informal and less structured than those of their component items. Museums too employ the concept of the "collection", and use a range of criteria (form or type of object, subject, the objects donated by an individual benefactor) to delimit the aggregates they describe and manage.
The practice of creating descriptions of sets or aggregations of the items or objects held within their repositories instead of, or in addition to, descriptions of the individual items is not new to the different curatorial traditions. Such an approach, however, is receiving renewed attention as a means of improving the effectiveness of digital resource discovery procedures for these physical items, particularly where users wish to search across the distributed holdings of several repositories. At the same time, there is growing recognition of the value of providing aggregate-level descriptions of digital resources, such as the learning resource being created by NOF-digitise projects.
1. What is a collection?
At the simplest level, one can conceive of a collection as any aggregation of individual items (objects, resources). This definition says nothing about the form or nature of those items: they may be physical or digital, and digital items may be surrogates of physical items or they may be "born-digital", the primary manifestations of a work. A "catalogue" may be thought of as a collection, where the items are the catalogue records [l].
The definition is also neutral on the size of a collection: in theory at least, it is possible to have a collection containing only one item!
Collections may have varying degrees of permanence or transience. A collection of digital items may exist only while it is transferred between applications or for the duration of a user query; even for physical items, aggregates might be created for a limited period only. And the process of aggregation does not necessarily imply a physical juxtaposition: collections may be distributed, with the items dispersed across multiple physical locations.
2. Collection description and collection-level description
A description of a collection may include information about the aggregate as a whole, information about the individual items which make up the collection, or indeed information about some groupings of the items which form a subset of the whole.
On this basis, Heaney  suggests that "collection descriptions" (or "finding aids") may be classified as belonging to a small number of types. The principal distinction is between an "analytic" finding aid, consisting of information about the individual items only, and a "unitary" finding aid, which only describes the collection as a whole. A "hierarchic finding aid" provides information about both the whole and the items, including contextual information about the relationship of the items to the whole. In practice, as Heaney acknowledges, even an analytic finding aid may have some structure that dictates that meaning is conveyed by the relationship between the descriptions of individual items.
Different conceptions of "collections" result in different approaches to describing those collections, so that individual "collection descriptions" can be classified as belonging to one of Heaney's ideal types.
A further word of warning is perhaps necessary: the terms "collection description" and "collection-level description" are sometimes used interchangeably. The term "collection description" might be applied to any of Heaney's types of finding aid; but a "collection-level description" supplies information (at least primarily) about the aggregate as a whole i.e. a collection-level description is, in Heaney's typology, a unitary finding aid.
3. Why create collection-level descriptions?
Perhaps the most obvious benefit of collection-level description is that it can provide an overview (albeit a necessarily superficial one) of groups of otherwise uncatalogued items.
Even where item-level descriptions already exist, collection descriptions may be useful. For resource discovery, the existence of collection-level descriptions supports the high-level navigation of a large (and perhaps distributed and heterogeneous) resource base. For example, a researcher may make use of collection-level descriptions firstly to discover the existence of collections, and then to target their (item-level) queries to selected collections on the basis of their characteristics, or a software agent may perform these functions on behalf of a human user. Collection-level descriptions may be used to support controlled searching across multiple collections, and to assist users by reducing the number of individual hits returned in an initial response to a query .
Collection-level description has a potentially important role in supporting cross-domain resource discovery. Researchers want to discover and access resources drawn from across the collections of diverse institutions, and the technical infrastructure to support this is maturing. One of the challenges to be met is that (for good reasons) the managers of different classes of resource describe their items using different standards. Initiatives such as the Dublin Core seek to address this by defining a small set of elements, the semantics of which are commonly understood: they help to overcome the problems of differences in descriptive practice and terminology by providing what Baker describes as a "metadata pidgin" for the non-specialist user [4, 5]. Even with such support, however, a researcher may face the problems of managing large numbers of item-level "hits" in response to a query, where those hits describe heterogeneous resources. Description at collection level, using a common set of properties and some consensus on the criteria for defining collections, offers the possibility of comparing broadly similar high-level objects. Heaney employs a geospatial metaphor: "the scholar is concerned at the initial survey to identify areas rather than specific features - to identify rainforest rather than to retrieve an analysis of the canopy fauna of the Amazon basin" .
In addition to these benefits for resource discovery, description at collection level is an important component in collaborative approaches to resource management.
4. Archival collections
The archival community has not traditionally used the term "collection" to label the aggregates of material they typically describe. Archivists make the distinction between an archival fonds, where the items are of known provenance and their arrangement reflects their original working order as the records of an organisation or individual, and an "artificial collection", where the items are associated but lack the coherence of a fonds [6, 7]. The archivist recognises the fonds as the set of items that have been created and accumulated by an identifiable individual body (or bodies). However, it should be emphasised that both these classes of aggregates (the fonds and the artificial collection) are "collections" in the more general sense in which the term is used here.
Within an archival fonds, an item can be fully understood only within the context of its relationship with other items and aggregates in the fonds, and descriptive practice reflects this.
Description at the level of the aggregate (or rather at various levels, since descriptions of archives are usually arranged hierarchically) is a fundamental part of archival descriptive practice, and the archival community has well-established national and international standards for such "collection description". Indeed the level of description provided by archival catalogues often stops short of the description of individual items, particularly where there are multiple instances of the same type of item.
The General International Standard for Archival Description (ISAD(G))  is a permissive standard which defines a set of data elements for archival description, to be deployed within a framework of multi-level description from the general to the specific i.e. the ISAD(G) element set may be applied to any unit of description from the whole to the item (though in practice some elements are more applicable at some levels of description than others). In Heaney's typology, then, the descriptions of archives are typically hierarchic finding aids.
These same principles underpin the design of the Encoded Archival Description (EAD) standard, which defines an element set and an SGML/XML DTD for the encoding of archival finding aids . Like ISAD(G), EAD supports description either at collection level or at lower levels of detail i.e. it can represent both unitary and hierarchic finding aids. EAD was designed to capture a broad range of descriptive practice, and is sufficiently flexible to permit the encoding of a wide variety of catalogues and inventories. It is a measure of its designers' success that it has been widely adopted as a means of digital data exchange for archival description.
5. Library collections
Libraries have used many different criteria to define the scope of, or delineate, their "collections" . The first, which is often used implicitly, is that of institution or location: a collection is the totality of the holdings of a named library or repository.
Collections are often defined by the subject or coverage of the content of the items. A subject-based collection may coincide with the entire holdings of a library if the library is dedicated to collecting materials in a specific subject area, but more commonly it will be a subset of the larger (institution/location-based) collection. The items of a subject-based collection might be physically located together (and this is more likely if the items are also related through some other association, perhaps with a collector or donor), but it is quite probable that they are dispersed throughout the library. Since subject schemes may be hierarchical, collections defined using such schemes may also have hierarchical relationships.
However, subject-based collections are not discrete units in the way that archival fonds are. Instead, they form a set of overlapping "windows" on the holdings of one or more libraries: an item may form part of multiple subject collections, and the specialists in different subjects may present different perspectives on the collections of an institution. The relationships between subject collections may be complex.
Collections may also be defined by the form of the items (e.g. a video collection) or by some aspect of their use (e.g. items for the partially sighted, or items to which access or use is restricted).
Library collections may be distributed across several physical institutions. That distributed collection may comprise the entirety of the institutions holdings, but it is more likely to be a subset defined by one of the criteria noted above, such as subject.
Although libraries have recognised collections as units that they define and manage, their primary focus for resource description has been the item. Perhaps in part because of the diversity of criteria applied in defining library collections, collection-level description has tended to be informal, shaped by local conventions, and relatively unstructured. The description of library collections has not been subject to the standardisation which has been applied to item-level description in the form of MARC and AACR2, and standards for machine-readable collection-level descriptions have not been widely deployed.
6. Museum collections
The notion of the collection is a familiar one to museums and galleries, and the physical arrangement of objects within a museum may be based around collections and their curators. As in the library case, the criteria used to establish what constitutes a collection may vary. A collection may be the entire holdings of an institution, or it may be a subset of those holdings defined according to some other common attribute (subject, type or form of object, medium or technique etc.). In this context, one particularly important criterion is that of an aggregate of material collected by an individual and donated to the museum or gallery. And as with libraries, there may be complex relationships between these subsets: a collection defined by subject or object type may include subsets of items from the collections of several donors.
Museum collections may span the holdings of several institutions. This is particularly true as museums construct collaborative "virtual" collections composed of digital representations of physical objects housed in many different locations. Within this framework of the virtual collection, the researcher may wish to construct "collections" corresponding to criteria that match their own specific research interests.
However, the "collection management systems" used within museums have tended to focus on the description of individual items or objects. There is a practice of describing aggregates of these objects, and there have been successful examples of digital "guides" using collection-level descriptions as a gateway to the object-level databases and catalogues which describe the items within those collections . As in the library case, however, there has been little effort to develop a standardised approach to collection-level description.
7. Digital collections
The discussions above focused on the description of collections of physical items. However, information managers from all three of the domains above are also facing the challenge of managing collections composed of digital items, which may be digital representations of physical resources or may be primary "born-digital" resources. Some collections may be "hybrid", in the sense that they are made up of both digital and physical items. Some of these collections are made up of digital items that are descriptions of physical resources e.g. library OPACs or bibliographic indexes.
More broadly, the managers of Web resources need to describe aggregates of digital resources. Powell  makes a high-level distinction between collections of Web accessible-items and collections of information about such items i.e. collections of metadata records. The diversity of the nature and use of Web-accessible resources means that the criteria for defining aggregates of these resources vary widely. As a consequence, there have been various attempts to provide ways of describing aggregates of Web resources, but often they have been shaped by a particular use requirement and none have been widely adopted [11, 12].
7.1 The Distributed National Electronic Resource (DNER)
The JISC Distributed National Electronic Resource (DNER) is a "managed information environment" that allows users in UK higher and further education to access quality assured digital resources from many sources and of many different types . Many of these resources are made available as "collections". Those collections may be held and managed within institutions (either at a user's local institution or at another college or university), at central JISC services, or by external agencies. Although the collections with which the typical DNER user interacts in the first instance may be digital, they may use these collections as an intermediate step to accessing a physical resource. So a researcher might use (digital) bibliographic indexes and library catalogues in order to locate a (physical) copy of a book or journal. In this sense, the DNER is concerned with both digital and physical collections: it is a "hybrid" information environment.
Within the "technical architecture" envisaged for the DNER, a "collection description service" will provide information about all the collections available . The users of this service may be human researchers, but they may also be software agents acting on behalf of users, perhaps according to preferences or restrictions specified in a personal or institutional profile. On this basis, portals could present users with a tailored "information landscape" - a selection of collections filtered according to their interests (and perhaps by other criteria such as access rights). The effective operation of such a service depends on the availability of machine-readable collection descriptions that are consistent in their structure and semantics.
In a similar way, many of the initiatives within the NOF-digitise programme are creating collections of digital resources. Those resources may be made available in the first instance through a project-specific web site, but they will also be made available through other services and portals.
7.2 The NOF-digitise 'collection'
The NOF-digitise programme will release a very substantial body of new content onto the internet over the next three years, encompassing 152 grant holder's websites and up to 35 consortium websites, linked to the sites of over 500 further contributing organisations plus many hundreds of subject-related sites. The programme 'collection' will in total contain an estimated one million items - images, pages of text, film and sound recordings - as well as some 400 related learning journeys.
7.2.1 Delivery points
The content will be accessible from a number of delivery points:
7.2.2 User friendly access
The Fund is aware of the importance of presenting this diverse body of materials in as cohesive and accessible a way as possible to maximise its public use and release the full learning benefits. To help achieve this the Fund can draw on the experiences of other programmes such as ELib, the DNER or the current RSLP programme, but whilst there are many similarities there are also some important differences that need to be noted with regard to the NOF-digitise programme.
The Fund currently provides a simple search interface to the individual collection websites through www.nof-digitise.org. This enables users to search the programme sites on the basis of a) website name b) organisation name c) project name d) consortia name e) a short index of subject categories (which could be extended as needed). It provides the user with access to one 'collection' or to a list of 'collections' but does not enable a cross-search by subject between collections. This site will be maintained only until the close of the digitisation programme in December 2007.
The Fund wishes to enable the discovery of resources created by digi projects in a way that enables maximum benefit to be gained from them at programme level, as well as object-level. To this end, the Fund is looking at the development of an architecture that will allow the harvesting of collection level descriptions from each site into a centralised database with a cross-search facility enabling the user to find relevant materials across the entire programme. The foundations for building this option are already being laid, through the programme's technical standards which require project resources to be described in a way which will ensure they are interoperable within the programme and with other large-scale electronic content creation initiatives such as DNER, RSLP, A2A.
In order to achieve this further "high-level" interoperability, each project will need to adopt a consistent approach to the construction of collection descriptions.
This is why the Fund is recommending that all NOF projects use the RSLP model and schema and require that item level descriptions must be capable of being expressed in Dublin Core . If all 152 projects agree to follow this model from the outset it will make the task of creating a search engine for the programme much easier to achieve, whether it is created by the Fund within the lifetime of the programme or by one of its partners at a later date, and - more importantly - it will ensure that the resources created by the programme are accessible to the widest audience and thus achieve the programme's overall objectives.
8. The RSLP Collection Description schema
8.1 The Research Support Libraries Programme (RSLP)
The Research Support Libraries Programme (RSLP) aims to facilitate arrangements for research support in UK libraries . Two major strands of the programme emphasise collaborative arrangements for the management of collections and the improvement of information about collections in order to enhance discovery and access. The collections with which RSLP is concerned are primarily, but not exclusively, collections of physical items held by libraries, archives, museums and other specialist repositories.
A consistent approach to the description of collections was considered important to the success of both of these goals. RSLP supported a project to develop a model of collections and their catalogues (developed by Michael Heaney of the University Library Services Directorate, University of Oxford) and a metadata schema for the description of collections based on that theoretical model (developed by Andy Powell of UKOLN) [17, 18].
8.2 The RSLP Collection Description Model & Schema
The RSLP collection model presents a view of the collection as an "entity" that has relationships with a number of other entities, including objects, human agents and descriptions of the collection . So, for example, a "Collection" consists of a number of "Items", which have been gathered together by a "Collector", and so on. The model seeks to identify the "attributes" which characterise each of the entities or are typically included in descriptions of those entities. A "Collection" may have a "title" and a "system of arrangement"; a "Collector" would have a "name" and a "biography" (in the case of an individual) or an "administrative history" (in the case of a corporate body). The model is quite complex, and the approach taken is deliberately very general: it is intended to be applicable to a very wide range of collections (both physical and digital) and their descriptions.
Using this general model as a basis, the RSLP Collection Description (CD) schema provides a clearly defined set of elements or properties that can be used to create relatively simple descriptions of collections of many different types . Like the model, the schema actually addresses the description of a number of related entities, although the schema selects only a small subset of the entities identified by the general model. So, a "collection description" constructed using the RSLP CD schema is in fact a set of descriptions of several entities. The schema includes properties to describe:
Wherever possible, the schema uses properties from existing metadata schemas, particularly the Dublin Core Metadata Element Set, including the Dublin Core Qualifiers [20, 21]. The "agent" properties are taken primarily from the vCard schema .
The RSLP Collection Description project recommended an XML encoding for the schema which made use of the syntactic conventions specified by the Resource Description Framework (RDF), the W3C's recommended architecture for metadata, and the project created a simple form-based tool for the creation and editing of descriptions.
However, the semantics of the schema can be usefully deployed independently of the RDF/XML syntax, for example as the basis of an implementation using a relational database. This is the approach most implementers have adopted, with the capacity to generate RDF/XML representations of the descriptions held in the database tables for transfer between systems - for example, where several projects wish to make the collection description records in their separate databases available to a central service. The schema is also extensible: an implementer might adopt the basic model of entities and relationships, but add specific descriptive properties that are pertinent to their particular context. In making such extensions, however, implementers need to maintain the integrity of the existing model and to consider carefully how that "local" data is handled when their descriptions are shared.
Just as the Dublin Core metadata element set is not intended to replace richer standards for resource description at item level, the RSLP CD schema is not a substitute for existing collection description schemas such as the archival description standards mentioned above. Like Dublin Core, however, it offers a simple set of attributes with commonly understood semantics which allows resource managers to disclose and exchange information about their collections.
The use of the RSLP CD schema was promoted to projects within RSLP, and several of those projects have used the schema successfully to describe a diverse range of collections. Use of the RSLP CD schema is also recommended by the guidelines for DNER projects .
The idea of describing aggregates of resources as a unit is not new. However, recent emphasis on the collaborative management of distributed collections and on providing integrated access to distributed collections of heterogeneous items has generated renewed interest in description at collection level. Initiatives such as the RSLP and the DNER highlight the value of a shared and consistent approach to the creation of collection-level descriptions that are both human- and machine-readable, for resource discovery and for resource management. Clearly, this is an important consideration for the NOF-digitise programme. If such collection-level descriptions are to support the discovery of resources from across the holdings of diverse institutions, a common approach to the creation of those descriptions is essential.
Notes and references
 Powell, Andy. Introduction, in Powell, Andy, ed. Collection Level Description: a review of existing practice. An eLib supporting study. UKOLN, August 1999. Available at
 Miller, Paul. Collected Wisdom: Some Cross-domain Issues of Collection Level Description, Dlib, Vol 6 No 9. September 2000. Available at
 Baker, Thomas. "A Grammar of Dublin Core", D-Lib 6 (10), (October 2000). Available at
 Methven, Patricia. An Archival Perspective, in Powell, Andy, ed. Collection Level Description: a review of existing practice. An eLib supporting study. UKOLN, August 1999. Available at
 Murray, Jim. A Library Perspective, in Powell, Andy, ed. Collection Level Description: a review of existing practice. An eLib supporting study. UKOLN, August 1999. Available at
 Dunn, Heather. Collection Level Description - the Museum Perspective, Dlib, Vol 6 No 9. September 2000. Available at
 Powell, Andy. An Internet / Web perspective, in Powell, Andy, ed. Collection Level Description: a review of existing practice. An eLib supporting study. UKOLN, August 1999. Available at
 Hill, Linda L., Greg Janée, Ron Dolin, James Frew and Mary Larsgaard. Collection Metadata Solutions for Digital Library Applications, Journal of the American Society for Information Science, 50 (13): 1169-1181. November 1999
 Powell, Andy and Liz Lyon. The DNER Technical Architecture: scoping the information environment. HTML. Available at
 UKOLN, NOF-digitise Technical Standards and Guidelines Version 3.1, (August 2001). (See section 5.2.1 Metadata and Resource Discovery). Available at
 Powell, Andy, Michael Heaney and Lorcan Dempsey. RSLP Collection Description, Dlib, Vol 6 No 9. September 2000. Available at
 Dublin Core Qualifiers. Available at
 Working with the Distributed National Electronic Resource (DNER) : Standards and Guidelines to build a National Resource. Available at
This paper was commissioned from Pete Johnston by UKOLN on behalf of the New Opportunities Fund in association with the People"s Network and is one of a series of Information Papers that will be produced by the NOF Technical Advisory Service.
Queries about the Information Papers should be addressed to:
UKOLN is funded by Resource: The Council for Museums, Archives & Libraries, the Joint Information Systems Committee (JISC) of the Higher and Further Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.