This article was first published in:
Journal of Librarianship and Information Science, 26 (4) Dec. 1994 pp201-210
Any citation should use the above details of the hard copy version

UP TO STANDARD? a study of the quality of records in a shared cataloguing database

Ann Chapman

ABSTRACT

Presents results of a research project, carried out by UKOLN: The Office for Library and Information Networking (previously the Centre for Bibliographic Management) on the BLCMP Database, to investigate record quality. To measure the quality of the database a simple count of the number of records edited was used. Investigation of the types and amount of editing required a random sample of 'before' and 'after' editing record pairs (generated by BLCMP) which were then analysed. Finally, random samples provided and annotated by member libraries were used to consider whether editing is undertaken for reasons of accuracy, consistency, functionality or a combination of these factors. Certain factors emerged from the study as indications of the direction for further research.

INTRODUCTION

The last thirty years have seen much change in the provision and usage of bibliographic records. In the 1960's records were mostly on catalogue cards or in printed listings, the users were mainly librarians and access was via linear sequences. Today the term 'catalogue record' is increasingly outdated, the primary bibliographic record is usually held on computer and its presentation and use varies greatly. Output can be as catalogue cards, printed listings, computer printouts or microforms and records are sent electronically from one computer to another. The latter factor has done most to change provision and use of records. The library, information and book trade communities now use such records for a range of purposes: from product monitoring in publishing to book selection and ordering, from circulation control to information retrieval, and records may well be passed on to further users (eg. for Public Lending Right returns). Equally, where once it was the norm for each organization to create its own records, now bibliographic records are exchanged, bought and sold by a number of agencies (eg. British Library National Bibliographic Service (BLNBS), Whitaker, Book Data, OCLC, co-operative union databases, etc.) both nationally and internationally.

The decision to store bibliographic records on computers required the systematic recording of bibliographic data elements. In part, this was achieved by the development, originally in the USA, of MAchine Readable Catalogue (MARC) format records, which assigned 'field tags' to identify standard data elements. Each record comprised a number of fields, each holding a specific part of the bibliographic data; within fields the data was subdivided into subfields. With the MARC format as a basis, there are now a number of national formats (eg. USMARC, UKMARC) and user formats (eg. OCLCMARC). Records originating from BLNBS are known as BNBMARC (British National Bibliography MARC) records. Such MARC format records are now in use worldwide for bibliographic data exchange.

Today, the interrelationship of types of record usage means that the quality of the source record, and any changes made to it after initial creation, can affect users - sometimes in significant and unforeseen ways. An instance of this occurred in 1990, when the BLNBS, in revision proposals for its BNBMARC records, included the suggestion that field 001 be used for BNB record control numbers only, whereas previously the International Standard Book Number (ISBN) had been recorded there when known. Systems which made use of the ISBN in this field as a record control number would have been faced with rewriting software to cope with this change had it been carried out. Another instance of unanticipated effect arose from the inclusion of records from the Copyright Libraries Shared Cataloguing Programme (CLSCP) in the BNBMARC file: BNBMARC practice is to use a lowercase x as the ISBN check digit character but some records from the Programme were created with an uppercase X; this resulted in the validation software in some automated systems having to be changed.

The quality of a record is therefore of importance to the user, but criteria of quality will vary with the type of user and specific use of the record. Whatever the criteria, to be considered as of acceptable quality, a record must contain the information required; insufficient information, data errors and incorrect or inadequate coding of various sorts will all pose problems. A record which is not of acceptable quality either cannot be used at all, or only after modification, which on occasion may prove to be expensive and/or technically difficult. In order to be able to state that a record is of acceptable quality, criteria must be defined beforehand to provide the yardstick or benchmark against which the record is measured. User requirements were discussed at a seminar held in Newbury (Bibliographic records in the book world, 1988) and subsequently further reviewed in a report on the use of data elements in the book world by Dempsey (1989). Satisfying all criteria for all users may produce a record with more information than needed for some or even most of the time, but an advantage of automated systems is that interface programmes can filter excess information for particular applications from the primary record, which should ideally contain all the information required for a range of applications. The primary record should therefore be compiled to provide:

- enough description to identify the item,
- enough access points to facilitate the retrieval of the record
- appropriate detail to the user for the relevant function to be performed.

Any record, whatever information it contains, should be accurate, consistent with other records, and be functional in the system(s) in which it is used.

This article may at first seem to be of specialist interest only to some readers, but there are wider implications than checking the quality of records available on one database might suggest. Once the structure of a record format is defined, there is a responsibility on the users of the format to compile records accurately and consistently, and within a reasonable timescale for those to whom the records are supplied. Today, accurate, consistent records - available when required - must be produced within the framework of an increasing volume of published material, in both traditional and newer forms, and against the background of cutbacks in funding and increased demand for library services in all sectors. In the past it has been suggested that it could cost twice as much to catalogue some items, as it did to buy them; in view of the cost of investment in equipment, systems and expertise, the quality of services purchased must be sufficient to merit funding from scarce resources. Is it cost effective to check all records from a supplier, especially if it is known that a substantial proportion will not need editing? If it can be demonstrated that records from specific supplier(s) consistently achieve a high enough standard, users of records could be more interested in accepting the majority of records as supplied without further checking, apart from specified exceptions, and so reduce their costs. The specified exceptions would need to be identified (some possible areas might be conference proceeding headings, or authors writing under several names or with transliterated names) and might vary with the source of the record. If this option is to be considered, purchasers of records would need to know what standards of quality are being achieved by suppliers, and suppliers would need to know what standards of quality are judged acceptable by purchasers.

At Bath, interest in what users need in records began with research into the relative merits of full and short entry catalogues (Seal, Bryant and Hall, 1982); this research was mainly carried out in academic libraries. The BNBMARC Currency Survey followed up the timeliness factor in assessing availability of records when required. This study using the BLCMP Database provides a first look at the qualities of accuracy, consistency and functionality within records. While the study was carried out on one specific database and its particular set of record sources, the findings indicate wider issues where more information is required. In some cases this could be an expansion of investigations within the study; in other cases the study shows unanticipated areas where further research is required.

THE BLCMP STUDY

Since January 1980 the BNBMARC Currency Survey has measured on a continuing basis the performance of the BLNBS BNBMARC record service. From January 1980 to October 1992 this was undertaken by the Centre for Bibliographic Management (CBM); from November 1992 the survey has been undertaken by UKOLN: The Office for Library and Information Networking, in which CBM was incorporated in November 1992. This survey has been solely concerned with the 'timeliness' of the records created and/or supplied by BLNBS and a description of the methodology and results can be found in Chapman (1992). Other aspects of quality, i.e. accuracy, consistency and functionality, have not been surveyed in any depth by CBM or UKOLN. While discussing with BLCMP ways of extending the 'performance measurement' aspects of UKOLN's work, it emerged that they wished to evaluate the quality of records on their own database and this provided an opportunity for collaboration in this area of mutual interest. The study was undertaken between January 1992 and March 1993. The samples for analysis were taken in April, May and June 1992.

BLCMP, the largest UK library cooperative, was founded in 1969 as a joint venture between the public and university libraries in Birmingham and became a fully independent company in 1977. Currently it has 64 major library customers, including 31 universities and 20 local authorities. The main services offered are:

Talis - BLCMP's new UNIX library system covering all library management activities;
CWIS - a community-wide networked information system;
BLCMP Database - over 12 million records, with online access for cataloguing, reference, acquisitions and retrospective conversion provided as part of the Talis system.

Access to the Database is via BLCMP's own dedicated network, which is also used for electronic transmission of orders from libraries to book suppliers (BLCMP, 1994).

The BLCMP Database comprises a number of files:

the Work in Progress (WIP) file which is a temporary file cumulated over one week and then
used to update the Union file&
the Union file which holds for each bibliographic item a 'general bibliographic record' to which are attached the holdings of that item in each library&
and a range of Potential Requirements Files (PRFs) which at the time of the study comprised:
- BLNBS BNBMARC file (which includes records originating from BLNBS and CLSCP,
- Cataloguing in Publication (CIP) records from Book Data and Whitaker,
- the British Catalogue of Music),
- Library of Congress file(LC),
- Whitaker database file,
- HMSO file,
- British Library Document Supply Centre (DSC) monographs and conferences file

which are all updated regularly. The 'general bibliographic records' on the Union file are either copied from BNBMARC, LC, Whitaker, HMSO or DSC files or are created by one of BLCMP's member libraries. Each record shows its original source and the date it was last modified. BNBMARC amendment records automatically overwrite CIP records from the BLNBS BNBMARC file as long as they have not been edited by a member library.

A member library requiring a bibliographic record for an item being catalogued searches the Database files to establish whether a record is available. If no record is available, the member library creates a record, attaches its holdings and adds it to the WIP file. If a record is available and is acceptable without editing, the library attaches its holdings and adds it to the WIP file. If a record is found which is not acceptable, the member library edits it, then attaches its holdings and adds it to the WIP file. Some records, while not acceptable as found, are not edited at this stage (e.g. a BNB/CIP record which it is assumed will be overwritten later by a full BNBMARC record.)

OBJECTIVES

Definition of the objectives of the study began with the premise that editing of records indicated dissatisfaction with the quality of the record in some respect; therefore the study aimed to evaluate the quality of records on the BLCMP Database by assessing the proportion of records being edited, and by ascertaining the nature of the editing and the reasons for such editing. Three main objectives were defined:

To establish what percentage of records, found on the database for items being catalogued, are edited.
To make a statistical analysis of which fields are being edited and the types of changes made.
To carry out an analysis of a random sample of edited records which have been annotated by the editing libraries to indicate the reasons for such editing. These reasons to be categorised into the following three areas:
- accuracy - syntactical errors, wrong numbers, etc.;
- consistency - conformance to BLCMP standards (i.e. MARC, AACR2 and interpretations, minimum input standard), headings authorities (i.e. British Library Name Authority List, BLCMP Union authority headings, Library of Congress Name Authorities), and in-house standards;
- functionality - amendments required to enable a record to fulfill a particular function.

Overall quality of the database

The objective of this part of the study was to provide a measure of the quality of the database by establishing what percentage of records, found on the database for items being catalogued, are edited. BLCMP developed software to analyse a WIP file at the end of the week before the Union file was updated. WIP file records are of two types: records newly created that week and records which were found in one of the Database files. The software identified those records which had been found in the database files and subsequently edited. This is in the context of an average 'hit-rate' of 90% according to the BLCMP Hit-rate Survey (an internal monitoring survey of the percentage of records available on the Database when required for current cataloguing purposes). This analysis was made for three weeks each month for three months (monthly internal system processes prevented analysis every fourth week). The results showed:

a record is available in at least one of the database files for approximately 90% of items
approximately 80% of records found on the database are used without editing by member libraries

Edits made and fields edited

This part of the study was concerned with investigating the fields being edited and the types of changes made. For this a random sample of edited records with details of the changes recorded was required. To achieve this a team at BLCMP developed software to print out 'before' and 'after' images of edited records. A random sample of 'before' and 'after' record pairs was printed for three weeks each month for three months and 1310 records in total were analysed by UKOLN. The printouts were used to identify for each pair of records (a) the source of the record edited, (b) the fields which were edited and (c) the type of changes made. This information was recorded in a tabular form and analysed using the SPSS statistical software package.

Source of record edited

When the record pairs for edited records were analysed it was found that records for non book materials (chiefly serials, music tapes and compact discs and music scores) had also been selected along with records for printed monographs - all records for such non book material were created by BLCMP libraries. As shown in Figure 1, when edited records for monographs only were analysed, the figures indicated that BNBMARC records and BLCMP created records jointly accounted for 80% of such records, in roughly equal proportions. The BLCMP created records included some records input from bulk retrospective conversion exercises which were 'legitimately' substandard. It is not known how many records of this type are on the BLCMP file, nor how many occurred in this part of the sample (since the fact that a record is an upgrade of a retrospective conversion record is not identified in the record).

Figure 1: Source of edited records for printed monographs

This is a pie chart (figure1.gif)

Which fields were edited

A MARC format record comprises a number of fields, each identified by a three digit 'field tag', and holding a specific data element. Within fields, the data is further divided into subfields, each identified by a two character code. For example, the imprint field is tagged 260, and has subfields $a place of publication, $b publisher and $c date of publication. When referring to fields in the analyses, x characters are used to denote variable characters; eg. 1xx is used to denote 100, 110 and 111 main entry heading fields. The sequence of fields is numerical, beginning with 0xx tagged control fields, which contain brief, coded information on the item, including its ISBN, country and date of publication, type of publication (eg. reprint) and language(s) of text. Tag 1xx fields are main entry fields and contain personal or corporate author details. Fields with 2xx tags have title information (240, 245), edition details (250), and imprint (260). 3xx fields give physical details such as size and collation, and details of price. Series information is recorded in 4xx fields and the 5xx fields allow a variety of bibliographic notes to be made (eg. on content, language, physical condition, etc.), while 6xx fields are used to record subject entries. The 7xx fields have additional entries (eg. 700 for second personal author, 745 for additional title). 8xx is used for additional series information, while the 9xx fields are used for local numbers and information, and by BNBMARC for cross references.

Comparison of the 'before' and 'after' record pairs identified the edit(s) made to each field within the records for each source. The aim was to see whether particular fields were more likely to be edited irrespective of source, and/or whether records from particular sources were more likely to be edited in certain fields. For this analysis it was jointly agreed by CBM and BLCMP to group the fields (eg. combining the various author fields into one group) as follows: 1xx, 7xx (Author), 24x, 74x (Title), 6xx (Subject), 250, 260, 300 (Publication & physical description), 4xx, 8xx (Series), 5xx (Notes), Other.

In considering the analysis comparing fields edited and the source of the record edited, it should be noted that the number of records from some sources was very low. In the sample of 1310 records, only 20 were taken from the HMSO file and 67 from the Whitaker file; this reflects the fact that records from these two files are only chosen by member libraries if a record is not available in one of the other files, as it is expected that records from either HMSO or Whitaker files will require more editing than those from other files. Overall, the author (37% of cases), title (39% of cases) and publication and physical description (39% of cases) fields were the most often edited, see Figure 2. BLCMP and BNBMARC source records were each close to the overall average figure for the author, and publication and physical description field figures. For the title field, BLCMP source records were close to the average, but BNBMARC source records showed variation. BNB/CIP records were close to the average, BNB/NBS at 10% less than the average and BNB/CLSCP at 15% less; while Whitaker source records were edited twice as often at double the average figure, they formed only a small proportion of the sample. Figures 3 and 4 show the percentage of edits by source in the author and title fields respectively.

Figure 2: Edits by field edited

This is a bar chart (figure2.gif)

Figure 3: Edits of author fields by source

This is a bar chart (figure3.gif)

Figure 4: Edits of title fields by source

This is a bar chart (figure4.gif)

Another analysis compared the field edited and the type of material (printed monographs, serials, cassettes both music and spoken, etc). As noted above, some findings did not show an accurate picture since some material types were represented by only a few records (eg. maps, special instructional material and microforms). For all fields, printed monograph records were edited at figures very close to the average for all material types; all other individual material types showed a wider variation.

Within the sample, a substantial proportion of the records for monographs were from two sources - BLCMP (with 432 records) and BLNBS (with 322 records). With a similar number of records contributed to the sample, a simple comparison of record quality can be made. Table 1 shows that the percentage of edits made in each group of fields is virtually the same and shows no significant difference between the two sources.

Table 1: Edits by field edited and source

Table1

Types of changes made

The types of changes agreed by BLCMP and UKOLN for analysis were:

new field inserted
field deleted
some data added to an existing field
some data deleted from an existing field
spelling
transliteration changes
capitals and punctuation
changing '+' to 'plus'
other data changes
subfield coding changes
indicator changes

The amount of editing in each record varied from one edit in a field to several edits in a field, from single edits in a number of fields to a combination of several fields edited each with one or more edits. A cross tabulation comparing the types of editing and the fields edited indicated that 44% of edited records had a new field inserted and 51% had data added to an existing field, while field deletion was recorded in 13% of these records and deletion of some data in a field, capitals editing and subfield coding changes were each recorded in 14% of records. Filing indicators were changed in 9% of records, while 3% had spelling changes and 2% transliteration changes. Other data changes not specified were made in 6% of records. The change of '+' to 'plus' did not occur in the matched pairs sample.

Tables analysing the types and numbers of edits within fields were also produced. This analysis gave a variety of information. For example, the author fields were edited in 570 records. In 106 of these records (19%) a new author field was inserted and in 21 of the 570 cases more than one new field was inserted; spelling changes were made in 11 records (2%) but there were no multiple occurrences within records; subfield coding was edited in 92 records (16%) with only 3 multiple edits.

One edit type did not occur in this sample of 1310 records and appeared only once in the sample of 439 records for Part C and that was changing the '+' character to the word 'plus'. BLCMP use '+' as an end of field designator, so use of this character in the data elements corrupts the record. With only one occurrence in 1749 records, this is only a minor problem.

Additionally, it was decided to look at some specific changes that may be made to records and separate analyses were made of these. The changes investigated were: exchange of main and added entry points, total change of bibliographic record data, capitals in Whitaker database records and the addition of subtitles.

Main entry point

When changes of main entry point to added entry point and vice versa were looked at, only 22 records out of 1310 (2%) were edited to exchange main and added entry points. The number of records in which this occurs is therefore very small. With computerised record retrieval search techniques, if both main and added headings are searchable, the outcome of the search will not be affected by incorrect assignment of headings, though the layout of data displayed in response to the search probably will be.

Total data change

Some records were edited to amend all the data in the record. This is a situation that occurs sometimes with errors in ISBN allocation or use, when the record retrieved by the ISBN describes a different item to the one for which a record is sought. Only 3 records out of 1310 (0.2%) were totally changed.

Capitalisation

The decision of the British Library to use Whitaker CIP records has resulted in some comment regarding the issue of capitalisation of initial letters of all title words when standard library practice is to capitalise the initials of only the first word and any proper nouns. It was decided therefore to analyse the sample to see if it could be determined to what extent these additional capitals are removed from the Whitaker database records. Two things should be borne in mind when considering the result of this analysis. Firstly, BLCMP member libraries do not choose a record from the Whitaker database file if a BNBMARC, LC or locally created record is available on file, because Whitaker records are assumed to require more editing to bring them to an acceptable cataloguing standard. Within BLCMP, Whitaker records are not intended for use as full cataloguing records, and their primary value is at the ordering stage because of their timeliness and price information. Secondly, BNB/CIP records from Whitaker were not analysed in this way as they are usually accepted as found with the knowledge that they will later be overwritten by a BNBMARC record when it becomes available; indeed, if a member library edits the BNB/CIP

Only 67 records out of 1310 monograph records were Whitaker database file records - too small a number on which to base conclusions; however, analysis showed that of the 67 records, 41 had capitals removed from the 24x and 74x fields, and 13 had other types of amendment in the 24x and 74x fields. In addition, of the 67 records, 54 had some sort of amendment in field(s) other than 24x and 74x, and of these 29 were in the publication and physical description fields, with most of the rest being in headings fields (see Table 2). These amendments may be accounted for in some part by the fact that pre-publication records are based on the information supplied by the publisher and, while accurate at the time received, some of the information may be changed by the publisher before publication.

Table 2: Edits to Whitaker records

Table 2

Addition of subtitle

The use of subfield $b in the title field and what information should be included in it is an area where varying opinions are held. The addition of subfield $b was measured in an attempt to guage some of the impact of the British Library's decision to catalogue 50% of BNBMARC records to AACR2 level 1 only (from January 1988), which entailed the omission of this subfield. (BLNBS have since changed their policy regarding subtitles; from September 1993 all subtitles have been included on BNBMARC records.) In a survey on currency with coverage, Dempsey (1989) found that 50% of respondents who acquired BNBMARC records, and 78% of all respondents, said they would add 245$b to a record that omitted it. Without knowing how many items in the sample had subtitles which had been omitted, it was not possible to see whether practice matched intention, but only to indicate the extent to which this field was added within the sample. Additionally, in some cases it may not be easy for a cataloguer to decide whether title information should all be in $a or split between $a and $b. In the sample as a whole, only 20% of edited records had amendments of this subfield; when the source of the record was investigated there was no indication of any link between record source and this type of amendment. Amendments in 245$b are either $b added as a new subfield with extra information not previously in the record (47% of amendments) or data moved from $a into $b (53% of amendments).

Reasons for editing

The final part of the study was of especial interest to UKOLN and concerned how far records were edited because unedited records would adversely affect the functioning of system(s). BLCMP member libraries were each asked to submit a set of randomly selected records during a specified week, and were requested to annotate these records with their reasons for the editing carried out. There was a very good response with forty-six out of the fifty-eight member libraries supplying a total of 439 sample records. The number of sample records sent by each library varied: some libraries sent less than ten records because there had been few edits that week; two libraries sent two sets of ten records from different sites. In addition, some libraries also indicated either how many records they had edited that week or estimated the average number of records they edit each week. A follow-up letter with a brief set of questions was later sent to elicit further information.

For any particular edit there may be more than one reason for making the edit. For instance, it may be evident that the place of publication has been added to the imprint field of a record but different libraries may have added it for one of a number of different reasons. One library may add it because they see it as of value to their acquisitions system; another library finds its absence causes a problem with their 'New Books Listing' program which expects the field to be present; a third library sees the place of publication as useful to its end users. With the reason for an edit known, it was possible to categorise the reason under 'accuracy', 'consistency' or 'functionality' or, as in many of the cases, a combination of these reasons. The edits within each of these categories were then examined to see whether any trends could be identified.

Accuracy

One group of edits dealt with the correction of spelling errrors, mostly in author and title fields, while in the same fields there was a smaller group of edits for transliteration changes. Errors of both of these types also affect system functioning as they will lead to mis-filing and non-retrieval of records. Another group of edits corrected the date of publication - usually because the date was in a non-standard form - and the edits were found mainly in Whitaker records. The date of publication can be included as a search term when using the BLCMP Database; some libraries have been under the impression that the date must be in the standard form to function, though this is not the case. Edits in the collation areas were often corrections of page numbers in CIP records. A further group of edits replaced missing words, corrected wrong words (not spelling errors but instances such as 'introduction' being replaced by 'instruction') or corrected word order in the title field.

Consistency

Firstly, those records which were created by BLCMP libraries were considered. Locally created records input online should conform to BLCMP standards and therefore not need editing, but records originating from bulk retrospective conversions can be 'legitimately' substandard. (It is not known how many of these records are still in this form; over the years the number will have been steadily reduced as items are withdrawn from stock or the records upgraded.) A variety of different edits were carried out on author, imprint, and title fields, and some new fields were added. One group of edits concerned records where all the data in the 'before' records was recorded in upper case characters, a characteristic of some retrospective catalogue conversions; these records were amended to mixed case and their minimal information was supplemented.

Secondly, the influence of individual library's 'house rules' was looked at. A number of libraries indicated that they did amend for consistency with 'house rules' but mostly qualified this by stating that it was (or should be) the same as BLCMP practice. BLCMP member libraries are issued with guidelines setting out BLCMP practice.

Thirdly, to what standards are records amended for consistency? The main reasons cited on this are: author data is amended to be consistent with name authority listing; 958/959/979 fields (considered redundant information after the ordering stage) are deleted from Whitaker database file records; 3xx fields are amended for non-book material to be consistent with BLCMP standards for such material; and capital letters of words in titles are amended, and records entirely in upper case are changed to mixed case, both for consistency with standard practice.

Functionality

A number of issues were identified here, including the fact that functionality edits are often linked to accuracy and/or consistency edits. As noted above, the filing order of names in output listings is affected by inconsistency or inaccuracy of names whether by incorrect data input, spelling error or even punctuation differences. These errors also affect other fields - eg. title fields - and, whether in author or title fields, affect the retrieval of records by search terms.

The follow-up letter asked libraries if records were amended for functionality because an unedited record would affect the functioning of one of the library systems - acquisitions, cataloguing, circulation, ILL or any other. From the responses received, 7 libraries amended because of acquisition system functioning, 24 for cataloguing, 11 for circulation, 3 for ILL and 6 for other systems (5 specified OPAC operation and 1 specified production of 'New Books Listing').

Some fields or subfields were noted as needing to be present in a record to enable other programs to function correctly. One library noted that its acquisition system was affected by HMSO records with fields not in the usual order; the order generated then includes wrong information and must be created manually. Another library had similar problems with its New Books Listing if 260$a was not present in a record; the program generating this listing substitutes indicators when 260$a is not present, necessitating subsequent editing of the listing. A further problem was identified regarding LC records which BLCMP receive from BLNBS, who convert the LC records from USMARC to UKMARC format. It has been noted that the 263 field in such records was rejected by the BLCMP validation software and the records required editing before they could be accepted onto the Union file. Incorrect indicators in the 1xx and 245 fields were edited to ensure correct filing for output listings. There seems to be a particular problem in assigning correct indicators for items in the Welsh language when the title begins with 'Y' which is the definite article in Welsh.

Non-editing of fields

As noted in the introduction, user requirements for bibliographic records vary. Libraries can have different perceptions on the relative importance of some information in a bibliographic record. In an attempt to establish whether there was any difference in outlook between the BLCMP member libraries, they were asked in the follow-up letter whether there were fields which would not be edited unless the record was being edited for another reason (eg. if the record does not have 260$a would you amend it anyway, or only when the record requires editing in (say) fields 1xx and 245?). Of the 19 responses received to this question, the largest groupings were libraries who would not edit 260 or 3xx fields unless editing a record for some other field; thus 8 libraries only edit 260/260$a if other fields are being edited, while 4 libraries only edit 300 fields in such cases and 6 libraries only edit 260 and 300 fields when editing other fields. All other instances were quoted only by one library for each instance (eg. edits in 4xx, 5xx, 008 fields).

From the annotated records supplied for this section, and from the responses to the follow-up letter, a selection of comments from the member libraries was compiled to illustrate some of the points of concern and to reflect the views of the member libraries on some of the issues involved; this was included in the report as an appendix.

IN CONCLUSION

Overall, the standard of records available to member libraries in this large shared cataloguing database meets the majority of member libraries' needs, with records found for approximately 90% of items, and around 80% of the records found being used without editing.

Within the 20% of records which are edited, records for printed monographs are most often edited in the 1xx/7xx fields (40% of cases) and the 250/260/300 fields (also 40%), while for edited serials records the fields most often edited are 24x/74x (62% of cases) and 250/260/300 (26%). The types of edit most often carried out were the addition of data to an existing field (51%) and the addition of new fields (44%). However, there are areas where BLCMP is taking a fresh look at practices and guidelines, including the point that some libraries carried out edits which could adversely affect other libraries using the same record.

Though the study indicated that there are differences in record quality between records from different sources, and that BLCMP members seem more likely to use either BLNBS or BLCMP created records, the generation of a random sample of edited records prevented any detailed comparison of records from one source with records from another. For a better comparison, either equal numbers of records from each source, or matched sets of records for specific items, could be analysed.

All participating users in the study were libraries, though they were using the records in several systems - ordering, cataloguing, circulation and online searching by end-user; therefore issues which arose with the records from the Whitaker database were the only ones which indicated different user requirements on either side of the publisher/bookseller : library interface; here it was possible to identify some areas of concern: capitalisation within title, standardisation of date format, and fields (958/959/979) considered necessary for one set of users and unnecessary for another set. The capitalisation issue was acknowledged by some member libraries as not being for functional reasons, but for the appearance of the record; however, one library stated that it was editing for consistency with AACR2. The standardisation of date format highlighted an area where BLCMP member library cataloguers were under a mistaken impression that non-standard format dates would not function in searches; the question now arises - will such dates now be left, or will they still be edited, presumably for consistency with AACR2? The deletion of fields 958/959/979 concerned fields with information more relevant to the book trade. Information in a bibliographic record should be always relevant to the record, but information about an item valid for only a limited period - ie. the 'in print' statement - is misleading when no longer true. The query now is: should this information be monitored and updated/deleted where necessary, and if it should, at what stage should this be done?

The study highlighted the area of record provision for non-book material - an increasing section of stock for many libraries. Records in the sample with edits transferring data from one field/subfield to another, indicated that some cataloguers may be unsure about which fields/subfields to use, possibly through lack of experience in this area. All records for such material within this study were created by BLCMP libraries, but to what extent do external sources of records for such material exist, by whom are they provided, and what standards are, or will be, applied to them?

Consistency of application is needed to effectively use a shared system, whether it be 2 or 3 parts of a multi-site library or the 58 members of BLCMP. Records found with another member library's holdings attached should already have been edited by that library but sometimes still needed editing; some libraries were deleting incorrect data (when it was in a field they did not use) instead of editing it, and the correct data was then being reinserted by other libraries (an area where this occurred was the collation field). Participating in the study made one library decide to review its cataloguing as some 'misguided' editing was found when they selected edited records to annotate for the study. The variation in the number of records typically edited per week, as noted by some libraries in the final part of the study, prompts the question as to whether all participating libraries are undertaking editing to the same extent, or are they editing to the extent required by their own system. As one library commented "The first user of a record is obliged, if necessary, to edit the record to bring it into line with agreed standards: this may well mean adding features to the record which the editing library may not actually use in its local systems. However, ...most member libraries would consider this a price worth paying for the evident advantage of having a shared database."

Working in a system where records coming from different sources are created to different standards, means that different procedures may need to be used depending on the source of the record; one library put in a plea for "one document which summarised all the procedures and pointed to where full documentation is to be found (if it exists)." On the seeming lack of consistency in the notes from member libraries, another library thought that "BLCMP needs to reiterate agreed policy, and we (the members) need to restate it to all our cataloguers."

There are also more specific issues that have emerged from the study.

1. Increasing use of automated systems has changed the approach to record retrieval. The increase in access points to any one record increases the risk of retrieval being affected by errors in data - each access point must be accurate and consistent.
2. The specific issue of indicators in records for items in the Welsh language raises the question as to whether there is a similar problem with other - especially non-European - languages; some libraries have substantial holdings of material in these languages.
3. The 250/260/300 field findings have thrown up an interesting divergence. In the response to the follow-up letter, libraries indicated which fields they would not edit unless other fields in the records were being edited, and these publication and physical description fields were the most often cited; actual edits as found in the study indicated that these fields are often edited.
4. The study showed that records which are the result of retrospective conversions may have specific characteristics - those encountered in the study were in upper case characters only and had minimal information. As more libraries automate, with greater numbers of small, specialist libraries with non-mainstream stock now able to have automated systems, and with the increasing ability for users to search catalogues of institutions other than their own over networks, the quality of retrospectively converted records needs to be addressed.

The Bath unit appreciated the opportunity to cooperate with BLCMP on this study which provides a starting point for further research into the quality of records. The possible area for research is wide and cannot be tackled all at once; a number of studies on different aspects will be required to present a complete picture - studies on specific record sources, comparative studies on other multi-source databases, investigation of locally created records, assessment of particular types of editing, studies on records for 'non-book' materials and more.

Full details of the study are in British Library R & DD Report 6120

ACKNOWLEDGEMENTS

Thanks are due to the following:
At BLCMP: Terry Willan was involved in the design of the study, set up procedures and provided details of the BLCMP systems mentioned and Celia Burton produced the 'before' and 'after' pair samples and co-ordinated the collection of the sample of annotated records.
At UKOLN: Steven Prowse produced the statistical analyses of the data collected and produced the figures for this paper.
Thanks also to Philip Bryant, Lorcan Dempsey and Terry Willan for comments on earlier drafts of this paper though the author is responsible for this final version.

REFERENCES

Bibliographic records in the book world: needs and capabilities (1988)
Proceedings of a seminar held on 27-28 November 1987, at Newbury. Compiled by D. Greenwood. (BNB Research Fund Report 33)

BLCMP (1994) BLCMP Company Profile ( also BLCMP News (occasional newsheet) and product literature)

Chapman, Ann (1992) Why MARC surveys are still a hot bibliographic currency. Library Association Record, 94, (4), 248-254.

Dempsey, Lorcan (1989) Bibliographic records: use of data elements in the book world. Bath: Bath University Library. (BNB Research Fund Report 40)

Dempsey, Lorcan (1989) Currency with coverage- a survey report. London: MARC Users Group.

Seal, Alan, Bryant, Philip and Hall, Carolyn (1982) Full and short entry catalogues. Bath: Bath University Library.

Bibliographic Management

Converted to HTML by Isobel Stark of UKOLN
Last updated on 23rd August 1996

This article was first published in: Journal of Librarianship and Information Science, 26 (4) Dec. 1994 pp201-210 Any citation should use the above details of the hard copy version