A review of metadata: a survey of current resource description formats

A review of metadata: a survey of current resource description formats
Work Package 3 of Telematics for Research project DESIRE (RE 1004)

Although the Pica+ format was in its design influenced by several MARC formats (INTERMARC, USMARC, UKMARC and UNIMARC) and follows ISBD standards, it doesn't conform to the ISO 2709 standard and therefore cannot be considered as a genuine MARC format (although internationally it sometimes seems to be considered as such).

The Pica+ format is not documented for external use. Cataloguers use the diagnostic format, which is a more user friendly presentation of the underlying Pica+ format. The diagnostic format is described in the Richtlijnen voor de aanlevering van gegevens (Rules for the input of data).

Constituency of use

Pica, the Dutch Centre for Library Automation, is a nonprofit organization providing systems and services for the majority of Dutch academic and public libraries and for a number of library networks in Germany. Circa 200 Dutch libraries use their shared cataloguing system and about 400 libraries are connected to NCC/IBL, Pica's interlibrary loan and document delivery system.

Ease of creation

Pica is an extensive format, applied within libraries by specialist staff.

Progress towards international standardisation

Pica+ is not an international standard. For growing international cooperation and information exchange Pica has had to conform to exchange formats like UNIMARC and standard protocols like Z39.50.

Exchange between Pica+ and different MARC formats is possible (e.g. the USMARC records from RLG). Conversion programs for Pica+ to UNIMARC were made at the request of the German partners of Pica, and also to enable the STCN (Short Title Catalogue, Netherlands) database to be uploaded in the European database for publications from the handpress period that is being developed by CERL (Consortium of European Research Libraries).

Format issues

Designation

The Pica+ format has four digit tags (three numerals, followed by A-Z or @) and subfields. The diagnostic format has four digit numeric tags, the so called kmc's (kenmerkcodes or identification codes) and the subfields are marked with control signs that often correspond with ISBD punctuation. The Pica format has three levels, which in Pica+ (but not in the diagnostic format) are distinguished by the first digit, 0, 1 or 2:

· 0XXX General bibliographic level: contains all the fields that can be shared by all Pica users

· 1XXX Local level: the fields that can be used within one library or organisation. These fields are not visible to other users of the cataloguing system.

· 2XXX Copy level: fields to be used for one specific copy of an item. These fields are also invisible to other users.

The fields are divided in six groups, based on the kind of information. In the diagnostic format the tags of the fields within one group start mostly (but not all) with the same digit. The groups are:

1) Administrative data for the system and coded information

2) Discriminating data like ISBN

3) ISBD data (descriptive elements necessary to build an ISBD description, mainly used for export of titles, but also used for presentation)

4) Title and author index entries

5) Subject description: keywords and classification

6) Miscellaneous (administration, shelf number, acquisition etc.)

Content

In the Edoc project (KB, Pica and Surfnet BV) the Pica format used in the Shared Cataloguing System (Gemeenschappelijk Geautomatiseerd Catalogiseersysteem, GGC) was adapted to make the description and retrieval of online resources within the existing infrastructure possible. Richtlijnen voor het catalogiseren van online resources (Rules for cataloguing of online resources) were issued (Leiden 1995). These rules are a subset of the rules covering the description of audiovisual materials (Richtlijnen voor de aanlevering van gegevens: audiovisueel materiaa'). Only the rules that are different from the existing rules are included in this subset.

The format is still being tested, evaluated and adapted by a special Working Group: WG-FER, (Working Group Format for Electronic Resources).

New Richtlijnen voor het catalogiseren van Computer Files (Cataloguing rules for Computer Files, i.e. Online and Offline), which will replace the old guidelines for the cataloguing of - online and offline - resources, are being developed by this working group.

The free text general annotation field (4201) is being used for information about the (electronic) resource, for which there is still no field specified e.g. information about fees, passwords, subscription to discussion lists, login procedures etc. Possibly in the future the need for specified fields for some kinds of data will lead to further adaptation of the format.

Basic descriptive elements

The format is quite detailed in dealing with the various bibliographic data elements. The content of the fields is ruled by the Dutch interpretation of the ISBD standard.

Subject description

There are several fields (tags starting with 5 or 6) for subject description, on a general and local level. A shared system for subject description has been developed, consisting of a basic classification system (Nederlandse Basisclassificatie), and an additional keyword thesaurus (GTT).

URIs

There is one field (4083) for access (location en file) data of the online resource. The file format will be given, followed by one or more of the following subfields:

· =A URL

· =B file name

· =C path

· =D file size

· =E compression format

· =M connection type

· =N port number / protocol

· =O gopher type

· =P host computer name

· =Q host computer IP address

· =R name and location host organisation

· =S email address host organisation

· =T email address contact

File format and URL (subfield =A) are mandatory, the other subfields are optional. The whole field can be repeated.

Resource format and technical characteristics

· Field 4083 (see above).

· Field 4060: material type, e.g. text, image, video, audio, multimedia, software. If none of those apply, the type document' should be used.

· Field 4251 is specified for system requirements. This field is not yet specified in the above mentioned 'Richtlijnen', but is a result of recent WG-FER decision. Different subfields are specified for online and offline resources.

· A special field (4084) is specified for location and file data of images linked to the described document. Primary goal of this field is to present related images inside the same description. (If the described document has an image format, this will be noted in field 4083). In 4084 the file format will be mentioned, followed by one or more of the following subfields (only subfield =A is mandatory):

· =A URL

· =B file name

· =C path

· =D file size

· =E compression type

Host administrative details

The official guidelines will be changed. Probably the new rule will be that fields for the publishers and 4030 and 4031 contain the geographical location and the name of the host, identified by the addition [host]. Still an item of discussion in WG-FER.

Administrative metadata

According to the official guidelines the annotation field (4201) is to used for record-last-verified (date of last check of availability of the online resource), and for record-last-update (date the description was changed for the last time), but this is also a point of discussion in WG-FER. The record-last-verified will be cancelled (every record has indications of creation and change dates anyway). The usefulness of record-last-changed is still being discussed.

Provenance/Source

Bibliographical links can be made from descriptions of printed to online versions and vice versa.

Terms of availability/copyright

In addition to the ISBN/ISSN fields, that contain also price information, the annotation field 4201 (free text) can be used for other data pertaining to availability and copyright.

There is no separate field for copyright statements.

Rules for the construction of these elements

The rules for the content within the fields are based on the Regels voor de titelbeschrijving (Cataloguing Rules), based on ISBD and created by the Federatie van Organisaties op het gebied van het Bibliotheek-, Informatie- en Dokumentatiewezen (FOBID). Pica adapted these rules for use within the Pica system.

Encoding

The character set used by Pica is a modified version of the INTERMARC character set, and is based partly on ISO standards 646, 8859-9 and 5426.

Multi-lingual issues

Field 1500 (mandatory) is a coded field for the language of the publication, the original text (in case of translations), the language of the abstract or the language of subtitles.

Ability to represent relationships between objects

There are several fields specified for the relation with other records or items. Those fields are not restricted to one special block of tags, like the 4XX block in UNIMARC. Links are made via the identification number of the record being linked to. There are two kinds of links: links to other bibliographic records (parts, supplements, other issues etc.) and links to records from authority files (e.g. the thesauri of author names, keywords).

Page maintained by: UKOLN Metadata Group
Last updated: 06-Aug-1998

PICA+

Environment of use

Documentation