A review of metadata: a survey of current resource description formats
Work Package 3 of Telematics for Research project DESIRE (RE 1004)
Table of Contents
UNIMARC conforms to the ISO 2709 standard and to ISBD standards. UNIMARC: (the Universal MARC Format) was developed and published in 1977. The primary purpose of UNIMARC is to facilitate the international exchange of bibliographic data in machine-readable form between national bibliographic agencies, but in the UNIMARC Manual, first published in 1987, it was stated explicitly that UNIMARCs objectives would not only be conversion, but also a model for the development of new machine-readable bibliographic formats.
The latest edition of the UNIMARC Manual (2nd edition) was published in 1994.
A draft version of the UNIMARC Guideline 3 for Computer Files was issued in June 1995. These guidelines result from meetings of the IFLA Permanent UNIMARC Comittee and the requirements of the International Standard Bibliographic Description for Computer Files, ISBD(CF). A new draft of this Guideline is expected in July 1996. (The changes in the draft of the 2nd ed. of ISBD(CF), now being circulated for comment, will be applied to UNIMARC).
The proliferation of national formats and the difficulty that resulted for the exchange of data was the main reason for the creation of an international MARC format which would accept, in principle, records created in any MARC format and act as a common format in terms of conversion. Since 1977 several national libraries have undertaken projects to convert from an existing national format to UNIMARC or have adapted UNIMARC for their national format needs. It covers monographs, serials, and cartographic materials, music, sound recordings, graphics, projected and video material, with provisional fields for computer files.
UNIMARC was first developed as an exchange format and offers several options for description, so that records created on the basis of different cataloguing rules can all be included.
The format is supervised by the Permanent UNIMARC Committee (PUC), under the auspices of the IFLA Universal Bibliographic Control and International MARC (UBCIM) Programme. Changes will be made only through the Permanent UNIMARC Committee. The content is described in the UNIMARC Manual (latest ed. 1994).
Although the PUC tries to maintain the standard, libraries implement the format in different ways, e.g. linking (4XX) can be used or not. In particular French libraries work with a variety of interpretations of the format.
Data in the records is contained in fields identified by a three digit tag. Fields containing data with a similar funtion is organised into groups identified by the first number in the tag. UNIMARC consists of the following nine blocks:
· 0XX Identification block
· 1XX Coded information block
· 2XX Descriptive information block
· 3XX Notes block
· 4XX Linking entry block
· 5XX Related title block
· 6XX Subject analysis block
· 7XX Intellectual responsibility block
· 8XX International use block
· 9XX National use block
UNIMARC deals with all the necessary bibliographic data. The following will concentrate on the adaptation of the format to enable input of data pertaining to online resources.
The Guideline 3 specifies the use of existing fields for the description of computer files, but in addition any other data elements from UNIMARC may be used in a record for a computer file. The probable need for additional fields or content designators and for redefinition of existing fields in the near future is acknowledged. Those should be brought to the attention of the IFLA UBCIM Programme Office.
Fields of which the use for computer files is specified include:
· Title (field 200): Title as it appears on container, box, opening screen, formal title screen, first display of information, header of the file etc.
· Parallel title (field 510): Title in another language appearing on the computer file.
· Author(s) (fields 200$f $g and/or 700, 701, 710, 711): Authors, programmers of the computer files as listed on the computer file.
· Author affiliation(s) (fields 700$p, 701$p, 710$p, 711$p): Institutional affiliations of the authors, programmers at the time the computer files were written or programmed.
· Edition statement (field 205): Any word or phrase indicating that the information was available previously in a different form.
· Publication, distribution (field 210).
· Physical description of the computer file (fields 215, 230): To be omitted for remotely accessed computer files, because there is no physical item.
· Accompanying materials (fields 215, 307): User handbooks.
· Series (fields 225, 410).
· Availability information (fields 345, 010, 011): Price units, stock number, agency for ordering a copy of the computer files.
Apart from the above mentioned fields, some of the (extra) information should be put in different fields of the Note block (3XX). This concerns the following data:
· Type of computer file (field 336)
· Technical details of computer file (field 337)
· Notes pertaining to title and statement of responsibility (field 304)
· Notes pertaining to edition: (Licensed by...) (field 305)
· Notes pertaining to publication, distribution (Shareware, etc.) (field 306)
· Notes pertaining to series (field 308)
· Notes pertaining to availability (field 310)
· Contents notes
· Users/Intended audience note
The 1XX block provides fields for:
· Coded data
· Qualifying data
· Language of computer file
· Target audience
· Publication date
· Country of publication or production
· Coded data relating to computer files: program, representational, textual.
The 6XX Subject analysis block is used for subject data constructed according to various systems, both verbal and notational (e.g. UDC, DDC, Library of Congress Classification).
In Guideline 3 no special field is provided yet for information pertaining to location. USMARC 856 is being examined to see if it can be adopted for UNIMARC.
Field 135 is the provisional Coded Data Field for Computer Files. For type of computer file and technical details fields 336 and 337 in the Notes block are defined.
· In field 135, a one-character code indicates the type of data file:
· a = numeric
· b = computer program(s)
· c = representational (pictorial or graphic information)
· d = text
· u = unknown
· v = combination
· z = other
· Type of computer file (field 336): contains information characterizing the type of computer file. In addition to a general descriptor (e.g. text, computer program, numeric), more specific information, such as the form or genre of textual material (e.g. biography, dictionaries, indexes) may be recorded in this field.
· Technical details note (field 337): This field is used to record technical information about a computer file, such as the presence or absence of certain kinds of codes or the physical characteristics of the file (e.g. recording densities, parity, or blocking factors). For software, data such as the software programming language, the number of source program statements, computer requirements (e.g. computer manufacturer and model, operating system, or memory requirements), and peripheral requirements (e.g. number of tape drives, number of disk or drum units, number of terminals, or other peripheral devices, support software or related equipment) can be recorded.
No fields are specified for information pertaining to the host. USMARC practice may be adopted for UNIMARC.
There are no fields for record review date and creation date.
Availability information is included in fields 345 (Acquisition information note), 010 (ISBN), 011 (ISSN). Further notes pertaining to availability go in field 310 (Notes pertaining to binding and availability).
The relevant USMARC fields are being examined for this purpose.
Field 801 (Originating Source), subfield $g, contains an abbreviation for the cataloguing code used for bibliographic description and access. The Manual gives a list of the accepted codes in an appendix. Other codes may be registered with the IFLA UBCIM Programme.
For data interchange in UNIMARC, ISO character set standards should be used.
Character positions 26-29 and 30-33 of field 100 subfield $a are used to designate the the default and additional graphic character sets used in the record. Sets approved for use with UNIMARC are:
· ISO 646 (IRV), Basic Latin set
· ISO 5426-1980, Extended Latin set
· ISO Registration #37, Basic Cyrillic set
· ISO DIS 5427, Extended Cyrillic set
· ISO 5428-1980, Greek set
· ISO 6438-1983, African coded character set
The 4XX block is reserved for making tagged links to indicate relationship between objects.
UNIMARC is a more concise version of the MARC format and compared with USMARC offers less richness of data, e.g. in description of materials.
The Guideline for Computer Files seem to have been formulated with offline products in mind ie. CD-ROMS, diskettes. No special fields such as URLs are specified for metadata specific for networked resources. In a table showing the data elements prescribed by ISBD(CF) and their corresponding UNIMARC locations, 'Access points: Technical details access' is referred to blocks 6XX and 7XX. The Guidelines are still being developed to be better suited for online materials as well.
In International Cataloguing and Bibliographic Control (vol. 24, no 4, oct/dec 1995), a quarterly published by the IFLA/UBCIM programme, an overview is given of international UNIMARC Users and Experts. This list is the result of a questionnaire sent out in 1993 and updated in 1995. 35 of the total of 62 institutions that had replied indicated that they were currently using the UNIMARC format. Thirty of those institutions are located in European countries, the other five in the USA (Library of Congress), China (National Library), India (National Library), Japan (National Diet Library), South Africa (The State Library)
Central and East European countries especially seem to be interested in formats that guarantee easy access to international communities and in recent years there has been a growing interest in UNIMARC.
The need for easy exchange of information is recognized within the MARC communities. There is a programme for the harmonisation of the national MARC formats of Canada (CANMARC), the UK (UKMARC) and the United States (USMARC). Also, the British Library and the Library of Congress are committed to the development of UNIMARC. The Commission of the European Communities funded UseMARCON (User Controlled Generic MARC Convertor) project aims to develop a toolbox capable of converting bibliographic records from any MARC format into any other MARC format through a central conversion format.
|Next||Table of Contents|