RECCI - Preliminary Report

ROADS logo

RECCI: ROADS Evaluation of Cataloguing with Connection to Interoperability

Preliminary Report

Ann Chapman and Michael Day
UKOLN: The UK Office for Library and Information Networking,
University of Bath, Bath, BA2 7AY, United Kingdom
http://www.ukoln.ac.uk/
<a.d.chapman@ukoln.ac.uk> <m.day@ukoln.ac.uk>

17 August 1998

1. Introduction

Version 2 of the ROADS software implements tools that will enable cross-searching across different subject services. There, therefore, needs to be an understanding that cross-searching depends upon a semantically consistent use of metadata, in this case ROADS templates. It is for this reason that a ROADS Template Registry exists [1] and why some generic ROADS Cataloguing Guidelines [2] are being developed. It would be useful to identify areas where cataloguing policies might be co-ordinated to optimise cross-searching. For this reason, as part of ROADS Strand 2 (Interoperability) we are looking at the following issues:

Current cataloguing policies and practice
Editorial policies
Impacts on cross-searching and interoperability

This report is a small initial review of selected eLib ROADS-based services (ADAM, History, OMNI and SOSIG). If this exercise is felt to be useful, a larger study of other ROADS-based services could be produced. A statistical analysis of ROADS template use will be added to this study at a later stage.

2. The Initial review

ROADS based services were requested to provide UKOLN with a number of templates which would be assessed with regard to:

Cross-searching ability
Highlighting inconsistencies between the practices of different services
Looking at formats used for dates, languages and names
Looking at subject terms and their use
Checking conformance with subject service cataloguing guidelines
Checking conformance with generic ROADS cataloguing guidelines
Quality
Internal consistency
Typographic errors
Other errors

This preliminary report deals with some of these issues

Future RECCI work will involve services providing information about their own editorial policies:

What internal processes are currently carried out before a template is added to the database?
What is the role of validation processes and regular error checking?

This will help identify "good-practice".

Other information, including an up-to-date statistical analysis of ROADS services' use of templates will also be gathered and the results published in another RECCI report.

3. Methodology

For the preliminary report, 50 random templates were requested from four of the eLib ANR subject services: ADAM, History, OMNI and SOSIG. These were printed out and minutely examined with regard to quality (primarily spelling mistakes) and with regard to interoperability. The entire template was looked at but the main data elements examined were:

Keywords
Subject Descriptors
Language
Names

These appeared to be the elements most important to interoperability.

4. Preliminary findings: Quality

4.1 Typographic errors

A small number of templates (5% across all services) showed typographic errors. Most of these were spelling mistakes. The "Description" and "Keyword" elements accounted for most of these.

In one case, the "Description" element contained angle brackets "<" and ">" - data contained within these will not be displayed in the HTML returned by a ROADS search.

The "URI" element was in two cases potentially problematic:

No URI
URI element contained an e-mail address (not following the mailto protocol) rather than an URL.

4.2 Addresses and telephone numbers

These elements are not important for cross-searching but did raise issues of consistency and (therefore) of quality. The ROADS Cataloguing Guidelines (3.1.11) suggest addresses should be entered in free-text, using the form used in the original resource, where possible, separating each line with commas:

Author-Handle-v1: 000034675 Author-Name-v1: Richard Smith-Jones Author-Work-Phone-v1: +44 (0)1234 567 890 Author-Work-Fax-v1: +44 (0)1234 567 098 Author-Work-Postal-v1: University of Bath, Bath BA2 7AY, UK. Author-Job-Title-v1: Subject Librarian Author-Department-v1: University Library Author-Email-v1: r.m.j.smith-jones@bath.ac.uk Author-Home-Phone-v1: +44 (0)1234 765 890 Author-Home-Postal-v1: 11 City Road, Bath BA1 4QH, UK. Author-Home-Fax-v1: +44 (0)1234 765 098

In the sample, where addresses were input, several templates (8) omitted any line separator. In more cases (19) the country in the address was omitted, although in some of these cases (9) this was included in a following "Country" element. A few addresses appeared to be incomplete ... they gave the street number and road name, but no town or city. This is probably due to reliance on 'cutting-and-pasting' techniques.

5. Preliminary findings: Interoperability

5.1 The "Keyword" element

Keyword separators used were commas (three services) and semi-colon (one service). The ROADS Cataloguing Guidelines (2.26) suggest that the use of either separator is acceptable.

Inverted keyword phrases and keyword subdivisions occurred in the sample from only one service. In this service, the internal consistency was variable: 9 cases in sample of which comma separates elements in 4 cases, semi-colon in 4 cases, and in one case both comma and semi-colon were used.

Capitalisation in two services was consistent in that only proper nouns had initial capitals. The two remaining services were inconsistent. One service had 19 cases where all nouns were capitalised; in 11 cases the initial keyword of the string was capitalised and in 3 cases capitalised and non capitalised nouns occurred in the same string. Another service only gave proper nouns initial capitals (with an internal consistency of 88%) but in 4 cases all keywords were capitalised, and in 2 cases the first keyword in the string was capitalised. Capitalisation is not an issue that will have a major effect on cross-searching or interoperability. Internal consistency is, however, an important quality issue.

5.2 Additional template elements found (not in Registry)

One service had created two new elements "Classmark" and "Classification-Scheme" and used these instead of "Subject-Descriptor" and "Subject-Descriptor-Scheme". These additions were not recorded in the ROADS template registry.

The sample from the same service used several other additional elements: "Historical-Period" (used in 1 case), "Geographical Area" (Used in 15 cases) and "Resource Location" (Used in 48 cases). These do not appear to conflict with other template elements so could be added to the template registry.

5.3 Subject descriptors and classification schemes

Several different classification schemes were used by the services; namely DDC, UDC, NLM, IHR and an adapted DDC called BIZ-DEWEY (developed for Biz/ed).

One service used DDC, and the full name "Dewey Decimal Classification" (no edition given) was placed in the 'scheme' element. Internal consistency was only 46%, as 27 templates did not state which classification scheme was being used. In 4 cases, the descriptors included 'shelfmarks' of three letters (not integrally part of DDC), while in 2 cases no 'descriptor' was entered. This service used fairly detailed classification (the longest found was 12 digits).

Of the (two) services that used UDC, one used a minimum of three digits. The other had the higher-level divisions of UDC represented by 1, 2 or 3 digits. Visually these are more difficult to identify as descriptors. Within the sample, there were 7 cases of 2 digit descriptors and 2 cases of 1 digit descriptors.

Separator punctuation was also an issue with the "Subject-Descriptor" element. Three services used (on occasion) more than one descriptor. In two cases, commas separated these. In the remaining service, multiple descriptors were found in 12 cases; in 4 of these descriptors were separated by commas, and in the remaining 8 by spaces (which are visually difficult to identify as separate descriptors).

The "Subject-Descriptor" elements are usually used by ROADS-based services as the basis for a 'browsable' subject hierarchy. For this reason cross-searching may not be unduly influenced by differing practice by different services. However, for consistency, it would be useful if "Subject-Descriptor-Scheme" were codes chosen from an enumerated list (e.g. the draft ROADS Codes for Subject-Descriptor-Schemes) with (possibly) the version used, e.g. "DDC21". Separators in the form of commas or semi-colons should also be used where multiple descriptions are required.

5.4 The "Language" element

Different subject services had quite different approaches. One service does not appear (from the sample provided) to use the "Language" element at all. Another used a two-character code (ISO 639) with 100% consistency. A third entered the language (all were "English") in full-text. The remaining service used the word "English" in 22 cases (44%), while in the remainder of the templates in the sample (28, 56%) the element was left blank.

The use of a consistent means of identifying the language of a resource would appear to be useful in a cross-searching context, especially with the growing international use of ROADS for subject services. Additionally, in an international context, The use of a standardised code would appear to be more acceptable than using full-text English terms. ISO 639 is one logical choice, although the ROADS Cataloguing Guidelines (2.27) suggest the use of a three-character scheme - Z39.53 (as used by USMARC) or ISO 639-2 - currently still under development - as these are able to represent more languages.

Presumably, English full-text terms could be converted to relevant codes, and two-character codes could be converted to three-character codes - if required. Services that do not use, or only inconsistently use, this element could consider its usefulness in an international context.

5.5 Names

There was very little inconsistency in the use of names - where they were entered. The main alternatives that exist for personal names (e.g. "Author-Name") are direct order and indirect order. Almost all examples of personal names in the sample templates were entered in direct order. Only two templates (from the same service) used inverted order for "Author-Name" and "Admin-Name".

The templates reviewed in the RECCI study were, of course, created prior to the production of the ROADS Cataloguing Guidelines, but they (3.1.8), suggest that personal names should be entered, where possible, surname first followed by a comma and the elements of the name that usually precede the entry element (inverted order). Experience with actual templates suggests that this guideline may need to be revised. Consistency would be desirable, but - presumably - cross-searching would not be unduly effected. Interoperability with other cataloguing systems (e.g. AACR2) that tend to use inverted order for personal names might be a wider issue but problems would exist in any case where there is no obvious distinction made between entry elements (e.g. surname) and parts of names other than entry elements (e.g. forenames).

Organisation names (e.g. "Owner-Name", "Publisher-Name", etc.) in the sample templates were all entered in direct order. This is consistent with the suggestions given in the ROADS Cataloguing Guidelines.

6. References

ROADS Template Registry. <URL:http://www.ukoln.ac.uk/metadata/roads/templates/>
Michael Day, ROADS Cataloguing Guidelines, Version 1, July 1998. <URL:http://www.ukoln.ac.uk/metadata/roads/cataloguing/cataloguing-rules.html>

7. Comments

Any comments on this report should be sent by e-mail to: <m.day@ukoln.ac.uk>.

This page maintained by Michael Day of the Metadata Group at UKOLN, University of Bath.
Created: 17-Aug-1998
Last updated: 10-Sep-1998