ROADS Interoperability Guidelines

ROADS logo

ROADS Interoperability Guidelines

Michael Day
UKOLN: The UK Office for Library and Information Networking,
University of Bath, Bath, BA2 7AY, United Kingdom
http://www.ukoln.ac.uk/
<m.day@ukoln.ac.uk>

18 January 1999

Summary: This paper gives background on the ROADS project and its interest in interoperability issues. It looks at issues relating to metadata format mappings, the role of the Whois++ protocol within the ROADS software toolkit and interaction with the Z39.50 protocol. This paper also contains an introduction to ROADS templates together with an account of the background to the creation of the ROADS Template Registry, the ROADS Cataloguing Guidelines and some evaluation of template usage within some existing ROADS-based projects.

1.		Introduction
2.		ROADS and interoperability
2.1		Metadata mappings
2.2		Whois++ and centroids
2.3		ROADS interaction with Z39.50
3.		ROADS templates
3.1		Background and development
3.2		The ROADS Template Registry
3.3		ROADS Cataloguing Guidelines
3.4		ROADS Evaluation of Cataloguing with Connection to Interoperability
4.		Conclusions
5.		References
Appendix I		Sample ROADS templates
Appendix II		Projects concerning interoperability

1. Introduction

In a computer science context the term interoperability is used to refer to the transparent management of different applications and software. In a resource discovery context it means the transparent searching of and retrieval of data from diverse systems and in different metadata formats. Providing solutions to interoperability problems will be an important factor in helping integrate the wide range of information services available in a distributed and heterogeneous network environment.

2. ROADS and interoperability

The ROADS project has a long-standing interest in interoperability issues for two main reasons.

The distributed subject service approach of the project means that there is a requirement for at least some level of consistency between different ROADS-based services. The ROADS software uses the Whois++ search and retrieve protocol so that it is possible to search simultaneously across more than one ROADS-based service. In this environment, a minimum level of consistency with regard to both WHICH metadata attributes are used and HOW these are used is desirable.
From its inception, the ROADS project has been aware that the software toolkit that it has been developing will not be the only proposed solution to the problems of Internet resource discovery. There is a need to ensure that ROADS-based services can interoperate with other resource discovery systems (e.g. library OPACs, hybrid-library systems, etc.) and different metadata formats.

The wider interoperability issues have been addressed in several ways by the project.

2.1 Metadata Mappings

Firstly, the project has produced metadata mappings (or crosswalks) between ROADS/IAFA templates, Dublin Core, SOIF and the USMARC format. Mappings like these could be used as the basis for the production of specific conversion programs but are also a convenient way of comparing metadata formats.

Core metadata formats like that proposed by the Dublin Core initiative (Dublin Core initiative 1998; Weibel et al. 1998) are well placed to act as intermediaries for semantic interoperability between heterogeneous resource description models. Stu Weibel (1997, p 18) suggests that the promotion of a "commonly understood set of core descriptors will improve the prospects for cross-disciplinary search by unifying related attributes". He additionally suggests that one approach to interoperability in a heterogeneous resource description environment would be to map many description schemas into a common set (like DC) which would give users "a single semantic model for searching". A number of DC crosswalks currently exist. For example, Priscilla Caplan and Rebecca Guenther (1996) have published a mapping from Dublin Core to USMARC. Other people and organisations have produced DC mappings for a variety of other formats including TEI headers, the Nordic MARC formats and UNIMARC. A collection of these metadata mappings is maintained on the UKOLN Web site (Day 1996).

2.2 Whois++ and centroids

The ROADS software (from version 1) uses the Whois++ protocol to query (and retrieve information from) distributed servers containing structured descriptions (ROADS templates) of Internet resources. In addition, ROADS (version 2) makes use of the centroid facility of Whois++ to facilitate query routing between servers. It may be worth while describing these technologies in more detail

The Whois++ protocol was originally developed for directory services and to operate as a simple (template-based), distributed and extensible information lookup service (Deutsch et al. 1995). Its extensible architecture, however, meant that its developers expected it to find applications in a number of other information service areas. Whois++ also provides a general architecture that is designed for the indexing of distributed databases and then applies that architecture to link together a multiple number of these Whois++ servers into a distributed, searchable wide-area directory service (Weider, Fullton and Spero 1996). Unlike other directory protocols (X.500 or LDAP), Whois++ does not require an hierarchical representation of data space but servers 'refer' the clients to other servers in a Whois++ 'mesh' (Faltstrom, Schoultz and Weider 1996). Queries are routed through this mesh based on 'forward knowledge' held by one server about another. In Whois++, this forward knowledge is maintained using the Common Indexing Protocol (CIP).

CIP is a protocol used between servers in a network to facilitate query routing - the "act of redirecting and replicating queries through a distributed database system towards the servers holding the actual results via reference to indexing information" (Allen and Mealling 1997). The Common Indexing Protocol is based upon the concept of index summaries or centroids. A centroid can be considered as a summary of the structured information in a given server - for Whois++ it could be a simple inverted index of the information contained within a database's templates.

In a ROADS cross-searching context, an 'index server' will periodically visit ROADS subject services and generate an index summary (or centroid). The centroid for each service (or server) will contain all relevant index terms in that database so that an initial search of the index server will determine which of the subject services will have information that matches a given query. If desired, the query can automatically be passed on to all of the subject services whose centroids indicate the existence of relevant index terms and the relevant templates returned for display to the end user. Demonstrations of ROADS cross-searching are currently available on the Web (ROADS project 1998), as are descriptions of the technologies that underlie it (Kirriemuir et al. 1998).

2.3 ROADS interaction with Z39.50

However, in some situations it is desirable to make ROADS databases available to end-user client and intermediate systems that use the Z39.50 search and retrieve protocol. Z39.50 is a protocol that is often used to provide access to catalogues within the library, museum and archive communities.

A ROADS 'how-to' guide on ROADS and Z39.50 is currently being written and this section will be updated when this becomes available. For basic technical details, see the ROADS guide to the 'ROADS Z39.50 plugin' (Brickley 1998).

3. ROADS templates

3.1 Background and development

ROADS-based services use a metadata format known as ROADS templates. They are based on the Internet Anonymous FTP Archive (IAFA) templates that were published in an Internet-Draft in 1994 (Deutsch et al. 1994). For this reason they are sometimes referred to as ROADS/IAFA templates.

The templates themselves are text (ASCII) based and take the form of simple attribute-value pairs separated by a colon and a space. ROADS templates are defined for 15 different resource-types. These are known as Template-Types. Some of these Template-Types (e.g. DOCUMENT, MAILARCHIVE and SERVICE) originate in the original IAFA template specification. Others have been developed specifically for ROADS-based services (e.g. PROJECT). At least one of the others (TRAINMAT) was independently developed and has been published as RFC 2007 (Foster et al. 1996).

Template-Types exist for the following resource types (December 1998):

Template-Type		Type of resource described

DATASET		Statistical or other datasets
DOCUMENT		Documents, electronic journal articles, single Web pages, etc.
DUBLINCORE		Experimental template used for extracting DC metadata for Web pages from a ROADS database
EVENT		Events, meetings, workshops, conferences, etc.
IMAGE		Visual resources, etc.
MAILARCHIVE		Archives of electronic mail lists
PROJECT		Projects. UKOLN host some project databases that give access to information on projects
SERVICE		Information services, Web sites (collections of pages), etc.
SOFTWARE		Computer software
SOUND		Audio resources
TRAINMAT		Training materials, as defined by an Joint IETF/TERENA(RARE) Network Training Materials working group (Foster et al. 1996)
USENET		Usenet groups
VIDEO		Video, moving images, etc.

More information on ROADS-based services use of templates can be found in a paper by Rachel Heery (1996).

Each Template-Type has a number of set attributes. Some of these are specific to one Template-Type, others are not. ROADS templates uses what the IAFA specification calls 'clusters' to group together information on names, addresses and other contact details. Those clusters currently used are used to describe a USER (an individual) or an ORGANIZATION. ROADS-based services can also add new attributes and create new Template-Types. Sample templates can be found in Appendix I.

3.2 The ROADS Template Registry

<URL:http://www.ukoln.ac.uk/roads/templates/>

Experience with ROADS-based gateways demonstrated a need for a metadata registry. The creation of new Template-Types and the adaptation and extension of existing Template-Types by subject services meant that there was no central location where the latest forms of these could be recorded. The IAFA template specification had been, to all intents and purposes, superseded. The answer was to create a metadata registry for ROADS templates (ROADS project 1997)

The ROADS Template Registry takes the form of a list of Template-Types, including all metadata attributes that have been 'approved' for each. The aim of the registry is to preserve flexibility - to allow the creation of new Template-Types and attributes where necessary - but also to prevent the unnecessary proliferation of Template-Types and attributes and to maintain some level of consistency.

Consistency is extremely important in the context of cross-searching and interoperability. It would be possible for a ROADS user to consider creating a new Template-Type for (say) recorded music. It would be desirable to base this on an existing Template-Type (e.g. VIDEO) and to use - wherever possible - attributes and clusters that are common to more than one existing Template-Type.

The ROADS project cannot insist that all users of the ROADS software toolkit use the Template Registry. Some ROADS users may not need to remain interoperable with other ROADS-based services or may have other reasons for not interacting with the Registry. However, users who are interested in interoperability are advised to e-mail details of any additions and changes that they have made (or wish to make) to templates to <roads-liaison@bristol.ac.uk>.

The ROADS project partners can also provide advice on these matters and on how to 'tweak' the ROADS software to include new Template-Types and attributes.

3.3 ROADS Cataloguing Rules

<URL:http://www.ukoln.ac.uk/roads/cataloguing/>

In practice, interoperability is not just dependent upon consistency in the use of the format itself but is also dependent upon the consistency of the content. For example, in the library community the MARC formats specify a framework for the description of bibliographic items while the content of MARC records will often conform to other standards, usually based on one of the International Standard Bibliographic Descriptions (ISBDs) or cataloguing rules based on them. Typically in the English-speaking world these are the 2nd edition of the Anglo American Cataloguing Rules (AACR2).

The ROADS project has, therefore, attempted to provide some rules for content called the ROADS Cataloguing Guidelines (Day 1998). These were formulated with reference to four main sources of information.

Cataloguing rules developed by ROADS-based services themselves. In particular those produced by Rebecca Bradshaw (1997) for the Art, Design, Architecture and Media information gateway (ADAM) and Debra Hiom for the Social Science Information Gateway (SOSIG).
ISBD(ER) - the ISBD for electronic resources (1997).
AACR2 - in particular Chapter 9.
OCLC's manual and guide to Cataloging Internet Resources (Olson 1997).

An initial draft of the guidelines were produced in January 1998, and a revised version in July 1998. These provide guidance for the completion of all attributes in the basic DOCUMENT and SERVICE templates.

The creation of ROADS templates is different in many ways from the traditional library cataloguing process (Chapman, Day and Hiom 1998). The content of many fields are either completed by the ROADS software (and need no rules) or are free-text in nature. The main purpose of the ROADS guidelines is to help ensure consistency in the use of capitalisation and punctuation and to provide guidance on the use of specific formats for particular attributes. Several issues were identified.

Chief sources of information: sources of bibliographic information should include: the title display from screen or printout, a Web page, "readme" file, header information, etc. (based on Olson 1997, pp. 7-8).
Capitalisation: the ROADS Cataloguing Guidelines recommended the adoption of AACR2-type capitalisation for the "Title" fields. The title should be transcribed preserving the original wording and spelling and only proper nouns should be capitalised. This recommendation conflicts with the current practice of several ROADS-based services but, if implemented, would aid interoperability with library catalogues.
Date formats: there is a need for the use of a standard format for dates. Currently, it is recommended that services use the format outlined in RFC 822 (Crocker 1982)- as modified by RFC 1123 (Braden 1989) - or the scheme specified in ISO 8601:1988.
Language codes: there is a need for the use of an standard format for language codes. Some services currently use the ISO 639:1988 two letter code. Alternatives could include the three character scheme ANSI/NISO Z39.53-1994 as used in USMARC (which is broadly similar to the draft ISO/DIS 639-2) or RFC 1766 (Alvestrand 1995).
Formats for personal and corporate names: the guidelines recommended the adoption of library practice as embodied in AACR2 and suggested indirect order for personal names. This means that names are entered surname first followed by a comma and the remainder of the name that usually precede the entry element. Adoption of AACR2-style name headings might aid ROADS-based services' interoperability with library catalogues but would also mean that AACR2's rules for complex names could be adopted without devising a new scheme.

Some parts of the ROADS Cataloguing Guidelines conform to current subject service practice and is, therefore, not controversial. However converting capitalisation and names to conform with library-based standards like AACR2 may not be a desirable development for some ROADS-based services. However, as with the registry, use of the Cataloguing Guidelines is not mandatory. It exists merely as a tool that could be used by services as a foundation for the formulation of their own cataloguing rules.

3.4 ROADS Evaluation of Cataloguing with Connection to Interoperability (RECCI)

<URL:http://www.ukoln.ac.uk/roads/recci/>

In conjunction with the work that was carried out on developing the ROADS Cataloguing Guidelines, an preliminary analysis of templates created by ROADS-based services was carried out by the project (Chapman and Day 1998). Two hundred templates from four eLib subject services (ADAM, History, OMNI and SOSIG) were analysed with regard to their quality and their consistency (their potential for interoperability). A number of issues were raised.

Typographic errors: only a relatively small percentage (5%) of the sample templates showed typographic errors, most of these being spelling mistakes.
Language codes: there were inconsistencies with regard to codes used to identify languages. Some services used natural language terms (e.g. "English") while others used ISO 639 two character codes (e.g. "en"). One service used both!
Name formats: most names within the sample were in direct order form (e.g. "David M. Wilson"). This may cause problems for interoperating with library catalogues which usually use some form of indirect form (e.g. "Wilson, David M.").
Classification schemes: potential problems included the use of classification codes without any specification of the scheme being used. There were also there were differences of granularity between and within the classification schemes used.
Value separators: where more than one value had been included in fields like "Keyword" and "Subject-Descriptor", services had variously used commas, semi-colons or blank spaces.

The preliminary RECCI study demonstrated the usefulness of consistency for interoperabilty between services and template quality. Consistency within one database is an important quality issue. Consistency across many databases becomes a prerequisite for interoperability. Further work is planned under the aegis of RECCI, including a new analysis of template usage by ROADS-based services.

4. Conclusions

These guidelines will hopefully provide an introduction to interoperability issues in the ROADS project and background information on ROADS templates, the template registry and the value of content standards and cataloguing rules for the subject gateway approach to information. Comments are welcome. Please send them to the author at: <m.day@ukoln.ac.uk>.

5. References

AACR2 1998 rev., Anglo-American Cataloguing Rules, 2nd ed., 1998 rev. Prepared under the direction of the Joint Steering Committee for Revision of AACR. Ottawa: Canadian Library Association; London: Library Association Publishing; Chicago, Ill.: American Library Association. Publication information at: <URL:http://www.la-hq.org.uk/directory/publications/lap/aacr2e.html>

Allen, J. and Mealling, M., 1998, The architecture of the Common Indexing Protocol (CIP). IETF Internet-Draft. <URL:http://search.ietf.org/internet-drafts/draft-ietf-find-cip-arch-02.txt>

Alvestrand, H., 1995, Tags for the identification of languages. RFC 1766. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc1766.txt>

ANSI/NISO Z39.53-1994, Information sciences - Codes for the representation of languages for information interchange. Bethesda, Md.: National Information Standards Organization (NISO).
A list of codes can be found at: <URL:http://www.oasis-open.org/cover/nisoLang3-1994.html> <URL:http://www.swbv.uni-konstanz.de/wwwroot/metadata/kv_dc014.html>

Braden, R., ed., 1989, Requirements for Internet hosts - application and support. RFC 1123. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc1123.txt>

Bradshaw, R., 1997, Cataloguing rules for the ADAM database: a procedural manual. <URL:http://www.adam.ac.uk/adam/reports/cat/>

Brickley, D., 1998, ROADS Z39.50 Plugin. <http://www.ilrt.bris.ac.uk/roads/software/zplugin/>

Caplan, P.L. and Guenther, R.S., 1996, Metadata for Internet resources: the Dublin Core Metadata Element Set and its mapping to USMARC. Cataloging and Classification Quarterly, Vol. 22, nos. 3/4, pp. 43-58.

Chapman, A. and Day, M., 1998, ROADS Evaluation of Cataloguing with Connection to Interoperability (RECCI). <URL:http://www.ukoln.ac.uk/metadata/roads/recci/>

Chapman, A., Day, M. and Hiom, M., 1998, Cataloguing practice and Internet subject-based information gateways. Ariadne (Web version), No. 18, December. <URL:http://www.ariadne.ac.uk/issue18/metadata/>

Crocker, D.H., rev., 1982, Standard for the format of ARPA Internet text messages. RFC 822. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc822.txt>

Day, M., 1996, Mapping between metadata formats. <URL:http://www.ukoln.ac.uk/metadata/interoperability/>

Day, M., 1998, ROADS Cataloguing Guidelines. <URL:http://www.ukoln.ac.uk/metadata/roads/cataloguing/cataloguing-rules.html>

Dempsey, L. and Heery, R., 1998, Metadata: a current view of practice and issues. Journal of Documentation, 54 (2), pp. 145-172.

Deutsch, P., Emtage, A., Koster, M. and Stumpf, M., 1994, Publishing information on the Internet with Anonymous FTP. IETF Internet-Draft. <URL:http://info.webcrawler.com/mak/projects/iafa/iafa.txt>

Deutsch, P., Schoultz, R., Faltstrom, P. and Weider, C., 1995, Architecture of the WHOIS++ service. RFC 1835. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc1835.txt>

Dublin Core initiative, 1998, Dublin Core metadata: <URL:http://purl.oclc.org/dc/>

Faltstrom, P., Schoultz, R. and Weider, C., 1996, How to interact with a Whois++ Mesh. RFC 1914. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc1914.txt>

Foster, J., Issacs, M. and Prior, M., 1996, Catalogue of network training materials. RFC 2007. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc2007.txt>

Heery, R., 1996a, Review of Metadata Formats. Program, Vol. 30, No. 4, pp. 345-373. <URL:http://www.ukoln.ac.uk/metadata/review.html>

Heery, R., 1996b, ROADS Templates: how they are used. <URL:http://www.ukoln.ac.uk/metadata/templates.html>

Heery, R., Powell, A. and Day, M., 1997 Metadata Library and Information Briefings, 75. London: South Bank University, Library Information Technology Centre.

Heery, R., Powell, A. and Day, M., 1998, Metadata: CrossROADS and interoperability. Ariadne (Web version), No. 14, March. <URL:http://www.ariadne.ac.uk/issue14/metadata/>

ISBD(ER), 1997, ISBD(ER): International Standard Bibliographic Description for Electronic Resources: revised from the ISBD(CF): International Standard Bibliographic Description for Computer Files. Recommended by the ISBD(CF) Review Group. International Federation of Library Associations and Institutions, IFLA Universal Bibliographic Control and International MARC Programme, (UBCIM Publications, New Series, Vol. 17). München: K. G. Saur.

ISO 639:1988, Code for the representation of names of languages. Geneva: International Organisation for Standardization.

ISO/DIS 639-2, Codes for the representation of names of languages - Part 2: Alpha-3 code. Geneva: International Organisation for Standardization.

Kirriemuir, J., Brickley, D., Welsh, S., Knight, J. and Hamilton, M., 1998, Cross-searching subject gateways: the query routing and forward knowledge approach. D-Lib Magazine, January. <URL:http://www.dlib.org/dlib/january98/01kirriemuir.html>

Knight, J.P. and Hamilton, M., 1995, Overview of the ROADS software. LUT CS-TR 1010. Loughborough: Loughborough University of Technology, Department of Computer Studies. <URL:http://www.roads.lut.ac.uk/Reports/arch/arch.html>

Olson, N.B., ed., 1997, Cataloging Internet resources: a manual and practical guide, 2nd ed. Dublin, Ohio: OCLC Online Computer Library Center. <URL:http://www.purl.org/oclc/cataloging-internet>

ROADS project, 1997, ROADS Template Registry. <URL:http://www.ukoln.ac.uk/metadata/roads/templates/>

ROADS project, 1998, CrossROADS. <URL:http://roads.ukoln.ac.uk/crossroads/>

Weibel, S., Kunze, J., Lagoze C. and Wolf, M., 1998, Dublin Core metadata for resource discovery. RFC 2413. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc2413.txt>

Weider, C., Fullton, J. and Spero, S., 1996, Architecture of the Whois++ Index Service. RFC 1913. <URL:http://src.doc.ic.ac.uk/computing/internet/rfc/rfc1913.txt>

Go to Appendix I: Sample ROADS templates

Go to Appendix II: Projects concerning interoperability

Maintained by Michael Day of the UKOLN Metadata Group.
Last updated: 18-Jan-1999

ROADS Interoperability Guidelines

Contents