|Resource description: initial recommendations for metadata formats |
Work Package 3 of Telematics for Research project DESIRE (no. 1004 (RE))
|Title page |
Table of Contents
DESIRE addresses the needs of European researchers to locate and retrieve information relevant to their research activities.
Within the Indexing and Cataloguing component of DESIRE a dual approach has been taken. A robot-based web index is being developed to assist in locating information objects in an indefinitely large information space: it aims to provide exhaustive, 'vacuum-cleaner' coverage of web pages, working with a network-friendly, distributed approach developed within the Nordic Web Index project at the University of Lund <URL:http://nwi.ub2.lu.se/>. At the same time several information gateways are being developed in particular subject areas (social science, art, engineering) which are based on quality-controlled resource catalogues containing full descriptions of resources which meet specified quality criteria. The systems framework for the information gateways will be an enhancement of the software developed in the ROADS project <URL:http://www.ukoln.ac.uk/roads/> funded by the UK Higher Education Funding Councils' Electronic Libraries Programme <URL:http://www.ukoln.ac.uk/elib/>. These approaches are complementary and various linkages between them will be explored.
This working paper fulfils the DESIRE objective to provide "background information and reports to support developers of networked information systems". In particular it draws conclusions and recommendations from the associated review of metadata formats Metadata: an overview of current resource description practice' <URL:http://www.ukoln.ac.uk/metadata/DESIRE/overview/>. The aim is to make recommendations for which metadata formats would be most appropriate for use within the information gateways.
It is clear that the information gateways being created within Desire fall into the discovery range of the spectrum established in the metadata review. Several existing formats exist in this range although they differ in the extent of implementation e.g. SOIF, LDIF, Dublin Core, IAFA templates . What is required is a resource description format that is:
It is recommended that a simple record structure be used based on attribute-value pairs. This allows for creation of records manually by personnel from differing backgrounds with minimal training (keeping in mind the possible involvement of subject specialists, authors, publishers) while also allowing for compatible records to be gathered by indexing robots. This format of record fits well with emerging lightweight search and directory protocols (whois++, LDAP) which are optimised for attribute-value record structures. It will also pave the way for the investigation of interoperability with Warwick Framework implementations.
It is recommended that the ROADS/ IAFA templates should be used as the basis of the subject services record format. The experience of template usage within ROADS should be built upon to ensure effective description and retrieval. In particular:
Ideally the chosen metadata format should accommodate both the rapid changes characteristic of internet resource discovery, as well as allow for extensibility for the inclusion of subject specific and other, as yet unknown, data. For this reason it is recommended that there should be investigation of the Warwick framework as an instance of a flexible, extensible metadata architecture. This should involve implementation, at least at a demonstrator level.