ROADS Interoperability Guidelines
18 January 1999
Summary: This paper gives background on the ROADS project and its interest in interoperability issues. It looks at issues relating to metadata format mappings, the role of the Whois++ protocol within the ROADS software toolkit and interaction with the Z39.50 protocol. This paper also contains an introduction to ROADS templates together with an account of the background to the creation of the ROADS Template Registry, the ROADS Cataloguing Guidelines and some evaluation of template usage within some existing ROADS-based projects.
In a computer science context the term interoperability is used to refer to the transparent management of different applications and software. In a resource discovery context it means the transparent searching of and retrieval of data from diverse systems and in different metadata formats. Providing solutions to interoperability problems will be an important factor in helping integrate the wide range of information services available in a distributed and heterogeneous network environment.
The ROADS project has a long-standing interest in interoperability issues for two main reasons.
The wider interoperability issues have been addressed in several ways by the project.
Firstly, the project has produced metadata mappings (or crosswalks) between ROADS/IAFA templates, Dublin Core, SOIF and the USMARC format. Mappings like these could be used as the basis for the production of specific conversion programs but are also a convenient way of comparing metadata formats.
Core metadata formats like that proposed by the Dublin Core initiative (Dublin Core initiative 1998; Weibel et al. 1998) are well placed to act as intermediaries for semantic interoperability between heterogeneous resource description models. Stu Weibel (1997, p 18) suggests that the promotion of a "commonly understood set of core descriptors will improve the prospects for cross-disciplinary search by unifying related attributes". He additionally suggests that one approach to interoperability in a heterogeneous resource description environment would be to map many description schemas into a common set (like DC) which would give users "a single semantic model for searching". A number of DC crosswalks currently exist. For example, Priscilla Caplan and Rebecca Guenther (1996) have published a mapping from Dublin Core to USMARC. Other people and organisations have produced DC mappings for a variety of other formats including TEI headers, the Nordic MARC formats and UNIMARC. A collection of these metadata mappings is maintained on the UKOLN Web site (Day 1996).
The ROADS software (from version 1) uses the Whois++ protocol to query (and retrieve information from) distributed servers containing structured descriptions (ROADS templates) of Internet resources. In addition, ROADS (version 2) makes use of the centroid facility of Whois++ to facilitate query routing between servers. It may be worth while describing these technologies in more detail
The Whois++ protocol was originally developed for directory services and to operate as a simple (template-based), distributed and extensible information lookup service (Deutsch et al. 1995). Its extensible architecture, however, meant that its developers expected it to find applications in a number of other information service areas. Whois++ also provides a general architecture that is designed for the indexing of distributed databases and then applies that architecture to link together a multiple number of these Whois++ servers into a distributed, searchable wide-area directory service (Weider, Fullton and Spero 1996). Unlike other directory protocols (X.500 or LDAP), Whois++ does not require an hierarchical representation of data space but servers 'refer' the clients to other servers in a Whois++ 'mesh' (Faltstrom, Schoultz and Weider 1996). Queries are routed through this mesh based on 'forward knowledge' held by one server about another. In Whois++, this forward knowledge is maintained using the Common Indexing Protocol (CIP).
CIP is a protocol used between servers in a network to facilitate query routing - the "act of redirecting and replicating queries through a distributed database system towards the servers holding the actual results via reference to indexing information" (Allen and Mealling 1997). The Common Indexing Protocol is based upon the concept of index summaries or centroids. A centroid can be considered as a summary of the structured information in a given server - for Whois++ it could be a simple inverted index of the information contained within a database's templates.
In a ROADS cross-searching context, an 'index server' will periodically visit ROADS subject services and generate an index summary (or centroid). The centroid for each service (or server) will contain all relevant index terms in that database so that an initial search of the index server will determine which of the subject services will have information that matches a given query. If desired, the query can automatically be passed on to all of the subject services whose centroids indicate the existence of relevant index terms and the relevant templates returned for display to the end user. Demonstrations of ROADS cross-searching are currently available on the Web (ROADS project 1998), as are descriptions of the technologies that underlie it (Kirriemuir et al. 1998).
However, in some situations it is desirable to make ROADS databases available to end-user client and intermediate systems that use the Z39.50 search and retrieve protocol. Z39.50 is a protocol that is often used to provide access to catalogues within the library, museum and archive communities.
A ROADS 'how-to' guide on ROADS and Z39.50 is currently being written and this section will be updated when this becomes available. For basic technical details, see the ROADS guide to the 'ROADS Z39.50 plugin' (Brickley 1998).
ROADS-based services use a metadata format known as ROADS templates. They are based on the Internet Anonymous FTP Archive (IAFA) templates that were published in an Internet-Draft in 1994 (Deutsch et al. 1994). For this reason they are sometimes referred to as ROADS/IAFA templates.
The templates themselves are text (ASCII) based and take the form of simple attribute-value pairs separated by a colon and a space. ROADS templates are defined for 15 different resource-types. These are known as Template-Types. Some of these Template-Types (e.g. DOCUMENT, MAILARCHIVE and SERVICE) originate in the original IAFA template specification. Others have been developed specifically for ROADS-based services (e.g. PROJECT). At least one of the others (TRAINMAT) was independently developed and has been published as RFC 2007 (Foster et al. 1996).
Template-Types exist for the following resource types (December 1998):
More information on ROADS-based services use of templates can be found in a paper by Rachel Heery (1996).
Each Template-Type has a number of set attributes. Some of these are specific to one Template-Type, others are not. ROADS templates uses what the IAFA specification calls 'clusters' to group together information on names, addresses and other contact details. Those clusters currently used are used to describe a USER (an individual) or an ORGANIZATION. ROADS-based services can also add new attributes and create new Template-Types. Sample templates can be found in Appendix I.
Experience with ROADS-based gateways demonstrated a need for a metadata registry. The creation of new Template-Types and the adaptation and extension of existing Template-Types by subject services meant that there was no central location where the latest forms of these could be recorded. The IAFA template specification had been, to all intents and purposes, superseded. The answer was to create a metadata registry for ROADS templates (ROADS project 1997)
The ROADS Template Registry takes the form of a list of Template-Types, including all metadata attributes that have been 'approved' for each. The aim of the registry is to preserve flexibility - to allow the creation of new Template-Types and attributes where necessary - but also to prevent the unnecessary proliferation of Template-Types and attributes and to maintain some level of consistency.
Consistency is extremely important in the context of cross-searching and interoperability. It would be possible for a ROADS user to consider creating a new Template-Type for (say) recorded music. It would be desirable to base this on an existing Template-Type (e.g. VIDEO) and to use - wherever possible - attributes and clusters that are common to more than one existing Template-Type.
The ROADS project cannot insist that all users of the ROADS software toolkit use the Template Registry. Some ROADS users may not need to remain interoperable with other ROADS-based services or may have other reasons for not interacting with the Registry. However, users who are interested in interoperability are advised to e-mail details of any additions and changes that they have made (or wish to make) to templates to <email@example.com>.
The ROADS project partners can also provide advice on these matters and on how to 'tweak' the ROADS software to include new Template-Types and attributes.
In practice, interoperability is not just dependent upon consistency in the use of the format itself but is also dependent upon the consistency of the content. For example, in the library community the MARC formats specify a framework for the description of bibliographic items while the content of MARC records will often conform to other standards, usually based on one of the International Standard Bibliographic Descriptions (ISBDs) or cataloguing rules based on them. Typically in the English-speaking world these are the 2nd edition of the Anglo American Cataloguing Rules (AACR2).
The ROADS project has, therefore, attempted to provide some rules for content called the ROADS Cataloguing Guidelines (Day 1998). These were formulated with reference to four main sources of information.
An initial draft of the guidelines were produced in January 1998, and a revised version in July 1998. These provide guidance for the completion of all attributes in the basic DOCUMENT and SERVICE templates.
The creation of ROADS templates is different in many ways from the traditional library cataloguing process (Chapman, Day and Hiom 1998). The content of many fields are either completed by the ROADS software (and need no rules) or are free-text in nature. The main purpose of the ROADS guidelines is to help ensure consistency in the use of capitalisation and punctuation and to provide guidance on the use of specific formats for particular attributes. Several issues were identified.
Some parts of the ROADS Cataloguing Guidelines conform to current subject service practice and is, therefore, not controversial. However converting capitalisation and names to conform with library-based standards like AACR2 may not be a desirable development for some ROADS-based services. However, as with the registry, use of the Cataloguing Guidelines is not mandatory. It exists merely as a tool that could be used by services as a foundation for the formulation of their own cataloguing rules.
In conjunction with the work that was carried out on developing the ROADS Cataloguing Guidelines, an preliminary analysis of templates created by ROADS-based services was carried out by the project (Chapman and Day 1998). Two hundred templates from four eLib subject services (ADAM, History, OMNI and SOSIG) were analysed with regard to their quality and their consistency (their potential for interoperability). A number of issues were raised.
The preliminary RECCI study demonstrated the usefulness of consistency for interoperabilty between services and template quality. Consistency within one database is an important quality issue. Consistency across many databases becomes a prerequisite for interoperability. Further work is planned under the aegis of RECCI, including a new analysis of template usage by ROADS-based services.
These guidelines will hopefully provide an introduction to interoperability issues in the ROADS project and background information on ROADS templates, the template registry and the value of content standards and cataloguing rules for the subject gateway approach to information. Comments are welcome. Please send them to the author at: <firstname.lastname@example.org>.
AACR2 1998 rev., Anglo-American Cataloguing Rules, 2nd ed., 1998 rev. Prepared under the direction of the Joint Steering Committee for Revision of AACR. Ottawa: Canadian Library Association; London: Library Association Publishing; Chicago, Ill.: American Library Association. Publication information at: <URL:http://www.la-hq.org.uk/directory/publications/lap/aacr2e.html>
Allen, J. and Mealling, M., 1998, The architecture of the Common Indexing Protocol (CIP). IETF Internet-Draft. <URL:http://search.ietf.org/internet-drafts/draft-ietf-find-cip-arch-02.txt>
Alvestrand, H., 1995, Tags for the identification of languages. RFC 1766. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc1766.txt>
ANSI/NISO Z39.53-1994, Information sciences - Codes for the representation of languages for information interchange. Bethesda, Md.: National Information Standards Organization (NISO).
Braden, R., ed., 1989, Requirements for Internet hosts - application and support. RFC 1123. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc1123.txt>
Bradshaw, R., 1997, Cataloguing rules for the ADAM database: a procedural manual. <URL:http://www.adam.ac.uk/adam/reports/cat/>
Brickley, D., 1998, ROADS Z39.50 Plugin. <http://www.ilrt.bris.ac.uk/roads/software/zplugin/>
Caplan, P.L. and Guenther, R.S., 1996, Metadata for Internet resources: the Dublin Core Metadata Element Set and its mapping to USMARC. Cataloging and Classification Quarterly, Vol. 22, nos. 3/4, pp. 43-58.
Chapman, A. and Day, M., 1998, ROADS Evaluation of Cataloguing with Connection to Interoperability (RECCI). <URL:http://www.ukoln.ac.uk/metadata/roads/recci/>
Chapman, A., Day, M. and Hiom, M., 1998, Cataloguing practice and Internet subject-based information gateways. Ariadne (Web version), No. 18, December. <URL:http://www.ariadne.ac.uk/issue18/metadata/>
Crocker, D.H., rev., 1982, Standard for the format of ARPA Internet text messages. RFC 822. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc822.txt>
Day, M., 1996, Mapping between metadata formats. <URL:http://www.ukoln.ac.uk/metadata/interoperability/>
Day, M., 1998, ROADS Cataloguing Guidelines. <URL:http://www.ukoln.ac.uk/metadata/roads/cataloguing/cataloguing-rules.html>
Dempsey, L. and Heery, R., 1998, Metadata: a current view of practice and issues. Journal of Documentation, 54 (2), pp. 145-172.
Deutsch, P., Emtage, A., Koster, M. and Stumpf, M., 1994, Publishing information on the Internet with Anonymous FTP. IETF Internet-Draft. <URL:http://info.webcrawler.com/mak/projects/iafa/iafa.txt>
Deutsch, P., Schoultz, R., Faltstrom, P. and Weider, C., 1995, Architecture of the WHOIS++ service. RFC 1835. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc1835.txt>
Dublin Core initiative, 1998, Dublin Core metadata: <URL:http://purl.oclc.org/dc/>
Faltstrom, P., Schoultz, R. and Weider, C., 1996, How to interact with a Whois++ Mesh. RFC 1914. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc1914.txt>
Foster, J., Issacs, M. and Prior, M., 1996, Catalogue of network training materials. RFC 2007. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc2007.txt>
Heery, R., 1996a, Review of Metadata Formats. Program, Vol. 30, No. 4, pp. 345-373. <URL:http://www.ukoln.ac.uk/metadata/review.html>
Heery, R., 1996b, ROADS Templates: how they are used. <URL:http://www.ukoln.ac.uk/metadata/templates.html>
Heery, R., Powell, A. and Day, M., 1997 Metadata Library and Information Briefings, 75. London: South Bank University, Library Information Technology Centre.
Heery, R., Powell, A. and Day, M., 1998, Metadata: CrossROADS and interoperability. Ariadne (Web version), No. 14, March. <URL:http://www.ariadne.ac.uk/issue14/metadata/>
ISBD(ER), 1997, ISBD(ER): International Standard Bibliographic Description for Electronic Resources: revised from the ISBD(CF): International Standard Bibliographic Description for Computer Files. Recommended by the ISBD(CF) Review Group. International Federation of Library Associations and Institutions, IFLA Universal Bibliographic Control and International MARC Programme, (UBCIM Publications, New Series, Vol. 17). München: K. G. Saur.
ISO 639:1988, Code for the representation of names of languages. Geneva: International Organisation for Standardization.
ISO/DIS 639-2, Codes for the representation of names of languages - Part 2: Alpha-3 code. Geneva: International Organisation for Standardization.
Kirriemuir, J., Brickley, D., Welsh, S., Knight, J. and Hamilton, M., 1998, Cross-searching subject gateways: the query routing and forward knowledge approach. D-Lib Magazine, January. <URL:http://www.dlib.org/dlib/january98/01kirriemuir.html>
Knight, J.P. and Hamilton, M., 1995, Overview of the ROADS software. LUT CS-TR 1010. Loughborough: Loughborough University of Technology, Department of Computer Studies. <URL:http://www.roads.lut.ac.uk/Reports/arch/arch.html>
Olson, N.B., ed., 1997, Cataloging Internet resources: a manual and practical guide, 2nd ed. Dublin, Ohio: OCLC Online Computer Library Center. <URL:http://www.purl.org/oclc/cataloging-internet>
ROADS project, 1997, ROADS Template Registry. <URL:http://www.ukoln.ac.uk/metadata/roads/templates/>
ROADS project, 1998, CrossROADS. <URL:http://roads.ukoln.ac.uk/crossroads/>
Weibel, S., Kunze, J., Lagoze C. and Wolf, M., 1998, Dublin Core metadata for resource discovery. RFC 2413. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc2413.txt>
Weider, C., Fullton, J. and Spero, S., 1996, Architecture of the Whois++ Index Service. RFC 1913. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc1913.txt>
Go to Appendix I: Sample ROADS templates
Go to Appendix II: Projects concerning interoperability