Interoperability between metadata formats
Mapping Dublin Core to UNIMARC
This mapping was produced for Project BIBLINK: Linking Publishers and National Bibliographic Services, and has also been published in the project deliverable for Work Package 4 - Format Conversion Feasibility .
The UNIMARC format was first published in 1977 to facilitate the international exchange of bibliographic data in machine-readable form between bibliographic agencies. More information on UNIMARC can be found in the document UNIMARC: an introduction available from the format maintainers: the Universal Bibliographic Control and International MARC Core Programme of the International Federation of Library Associations (IFLA UBCIM) . The UNIMARC mappings are based on the information given in the 2nd edition of the UNIMARC manual (with update 1), published in 1996 .
Dublin Core (DC) was primarily designed to provide a simple resource description format for networked resources . The examples in this paper show DC elements embedded in <META> tags in HTML documents but this is not supposed to imply that Dublin Core metadata will only exist in this format or will only be used in the context of the World Wide Web. Dublin Core is designed to be syntax independent so the precise form of syntax used in the examples will not affect the mapping tables themselves.
These mappings were originally produced to support the development of a demonstrator in the BIBLINK project. This project aims to establish electronic links between national bibliographic agencies (national libraries) and publishers of electronic material. The demonstrator will take publishers' metadata (some in the form of an extended DC known as BIBLINK Core) and convert it into UNIMARC and from UNIMARC into other MARC formats used by the participating libraries. These mappings were part of the preliminary work that was carried out to test the feasibility of this conversion.
|Title||200 $a Title Proper|
200 $e Other Title Information (for subtitle)
517 $a Other Variant Titles (for other titles)
|Creator||700 $a Personal Name - Primary Intellectual Responsibility, or if more than one:|
701 $a Personal Name - Alternative Intellectual Responsibility
710 $a Corporate Body Name - Primary Intellectual Responsibility, or:
711 $a Corporate Body Name - Alternative Intellectual Responsibility
200 $f First Statement of Responsibility
|Subject||610 $a Uncontrolled Subject Terms|
606 Topical Name Used as Subject (for LCSH and MeSH)
686 Other Classification Systems
|Description||330 $a Summary or Abstract|
|Publisher||210 $c Name of Publisher, Distributor, etc.|
|Contributors||701 $a Personal Name - Alternative Intellectual Responsibility|
711 $a Corporate Body Name - Alternative Intellectual Responsibility
200 $g Subsequent Statement of Responsibility (if role known)
|Date||210 $d Date of Publication, Distribution, etc.|
|Type||608 Form, Genre or Physical Characteristics Heading|
|Format||336 $a Type of Computer File (provisional)|
|Identifier||001 (mandatory for UNIMARC)|
020 (National Bibliography Number)
300 $a General Note (for URL)
|Source||324 Original Version Note|
|Language||101 Language of the Item|
300 General Note
|Relation||300 General Note|
|Coverage||300 General Note|
|Rights||300 General Note|
Part of the reason for producing mapping tables between metadata formats is to discover areas where there are important problems. These problem areas can be very significant when the mapping is from a relatively simple metadata format to a more complex one. This is certainly the case with this mapping from Dublin Core to UNIMARC. MARC formats, when they are used for bibliographic data tend to be closely tied to particular cataloguing rules like AACR2. For example, the distinction between main and added entries defined in AACR2 for choosing access points becomes formalised in the distinction in USMARC between fields 100 (Main Entry -- Personal Name) and 700 (Added Entry -- Personal Name). Caplan and Guenther, in their Dublin Core-USMARC mapping, point out that DC CREATOR, which does not embody the concepts of main and added entry, cannot be easily mapped to USMARC . UNIMARC similarly contains fields for Primary Intellectual Responsibility (700, 710 and 720), Alternative Intellectual Responsibility (701, 711, and 721) and Secondary Intellectual Responsibility but in practice is more flexible than USMARC. It suggests that if the given cataloguing code does not embody the concept of main entry "all persons, corporate bodies or families having equal responsibility may be coded as if they has alternative responsibility" .
In this section each of the Dublin Core (DC) metadata elements will be taken in turn and any difficulties noted. The definitions of the DC elements are taken from the Reference Description issued by OCLC .
The name given to the resource by the CREATOR or PUBLISHER.
200 $a Title and statement of responsibility $a Title Proper
UNIMARC field 200 is mandatory and non repeatable, so if there is more than one DC TITLE, all titles after the first could possibly be mapped to: 517 Other Variant Titles $a Variant Title.
Subtitles, if they can be identified, should be mapped to: 200 $e Other Title Information. Subtitles and parallel titles should not be included under 200 $a.
200 1#$aDublin Core Metadata Element Set$eReference Description
200 1#$aOCLC/NCSA Metadata Workshop Report
517 1#$aDublin Core Report
The person(s) or organization(s) primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources.
Qualifier possible: TYPE.
If TYPE=personal: 700 Personal Name - Primary Intellectual Responsibility (Indicator 2=0 or 1) or 701 Personal Name - Alternative Intellectual Responsibility (Indicator 2 = 0 or 1) $a Entry Element, or:
If TYPE=corporate: 701 Corporate Body Name - Primary Intellectual Responsibility (Indicator 1 = 0, Indicator 2 = 2) or 711 Corporate Body Name - Alternative Intellectual Responsibility (Indicator 1 = 0, Indicator 2 = 2) $a Entry Element.
200 $f First Statement of Responsibility. The CREATOR/s could be added to the 200 Title and Statement of Responsibility field, especially where a qualifier defines their specific role.
The first problem encountered is how to distinguish between personal names and corporate names for the sake of knowing which UNIMARC field to use. It would help if some distinction could be made in DC using the TYPE qualifier.
There is some debate whether DC CREATOR should be mapped to 700/710 for Primary Intellectual Responsibility or 701/711 Alternative Intellectual Responsibility. As DC elements have no concept of main entry and are repeatable there is no easy way of determining from the DC CREATOR any concept of who is primarily responsible for the intellectual content. In this instance the UNIMARC manual suggests Alternative Intellectual Responsibility. A possible compromise might be to map to 700/710 when there is a single DC CREATOR element and to repeated 701/711 fields if there is more than one.
700/701 Indicator 2 specifies whether a personal name is entered in direct order (Indicator 2 =0) or whether it is entered with inversion (Indicator 2 =1). Conversion software would have to be aware of this.
710/711 Indicator 1 distinguishes between corporate names (= 0) and meetings (= 1).
710/711 Indicator 2 denotes the order of the entry for a corporate name: In inverted form (= 0), under place or jurisdiction (= 1) or in direct order (=2).
This is an example only. In practice the corporate bodies might better be described as DC CONTRIBUTORS rather than DC CREATOR
<META NAME="DC.Title" CONTENT="OCLC/NCSA Metadata Workshop Report">
<META NAME="DC.Creator.CorporateName" CONTENT ="Online Computer Library Center">
<META NAME="DC.Creator.CorporateName" CONTENT="National Center for Supercomputing Applications">
<META NAME="DC.Creator.PersonalName" CONTENT="Stuart Weibel">
<META NAME="DC.Creator.PersonalName" CONTENT="Jean Godby">
<META NAME="DC.Creator.PersonalName" CONTENT="Eric Miller">
<META NAME="DC.Creator.PersonalName" CONTENT="Ron Daniel">
200 1#$aOCLC/NCSA Metadata Workshop Report
701 #0$aStuart Weibel
701 #0$aJean Godby
701 #0$aEric Miller
701 #0$aRon Daniel
711 02$aOnline Computer Library Center
711 02$aNational Center for Supercomputing Applications
The topic of the resource, or keywords or phrases that describe the subject or content of the resource. The intent of the specification of this element is to promote the use of controlled vocabularies and keywords. This element might well include scheme-qualified classification data (for example, Library of Congress Classification Numbers or Dewey Decimal numbers) or scheme-qualified controlled vocabularies (such as MEdical Subject Headings or Art and Architecture Thesaurus descriptors) as well.
Qualifier possible: SCHEME.
610 Uncontrolled Subject Terms (Indicator 1 = 0 No level specified) $a Subject Term.
If SCHEME=LCSH: 606 Topical Name Used as Subject (Indicator 1 = 0 No level specified) $a Entry Element $2 lc.
If SCHEME=MESH: 606 Topical Name Used as Subject (Indicator 1 = 0 No level specified) $a Entry Element $2 mesh.
If SCHEME=UDC: 675 Universal Decimal Classification (UDC) $a Number.
If SCHEME=DDC: 676 Dewey Decimal Classification (DDC) $a Number.
If SCHEME=LCC: 680 Library of Congress Classification $a Class Number.
If SCHEME=NLM: 686 Other Class Number $a Class Number $2 usnlm.
If SCHEME=NAL: 686 Other Class Number $a Class Number $2 usnal.
If SCHEME=[Other identified classification scheme]: 686 Other Class Number $a Class Number. $2 System code.
Appendix G in the UNIMARC Manual lists the subject systems codes (thesaurus and classification) permitted for subfield $2 of fields 606 and 686. Other codes would have to be registered with the IFLA UBCIM Office.
With schemes like LCSH, it may not be possible to represent all the relevant UNIMARC subfield coding: $x for topical subdivision, $y for geographical subdivision, $z for chronological subdivision, etc.
UNIMARC contains additional fields in the subject analysis block for 600/601 for personal and corporate names used as subject and 604/605 for names and titles used as subject. The use of the personal and corporate names in this context are problematic because they would have to be entered in access point form and defined by a relevant SCHEME. 604 uses the structure of the UNIMARC 4-- Linking Entry fields and is probably too complex for use here while 605 is also rather complicated.
A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. Future metadata collections might well include computational content description (spectral analysis of a visual resource, for example) that may not be embeddable in current network systems. In such a case this field might contain a link to such a description rather than the description itself.
330 Summary of Abstract $a Text of Note
300 ##$aClassification schemes have a role in aiding information retrieval in a network environment, especially for providing browsing structures for subject-based information gateways on the Internet. Advantages of using classification schemes include improved subject browsing facilities, potential multi-lingual access and improved interoperability with other services. Classification schemes vary in scope and methodology, but can be divided into universal, national general, subject specific and home-grown schemes. What type of scheme is used, however, will depend upon the size and scope of the service being designed.
The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity. The intent of specifying this field is to identify the entity that provides access to the resource.
210 Publication, Distribution, etc. $c Name of Publisher, Distributor, etc.
If an access point for the publisher, the name should be entered in a 7-- field.
210 ##$cOnline Computer Library Center
Person(s) or organization(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specifed in the CREATOR element (for example, editors, transcribers, illustrators, and convenors).
Qualifier possible: TYPE.
If TYPE=personal: 702 Personal Name - Secondary Intellectual Responsibility (Indicator 2 = 0 or 1) $a Entry Element, or:
If TYPE=corporate: 712 Corporate Body Name - Secondary Intellectual Responsibility (Indicator 1 = 0, Indicator 2 = 2) $a Entry Element.
200 $g Subsequent Statement of Responsibility. The CREATOR/s could be added to the 200 Title and Statement of Responsibility field, especially where a qualifier defines their specific role.
Much the same difficulty applies as to the CREATOR element. There are problems between identifying corporate and personal names if no TYPE qualifier is used.
The date the resource was made available in its present form. The recommended best practice is an 8 digit number in the form YYYYMMDD as defined by ANSI X3.30-1985 or ISO 8601-1988. In this scheme, the date element for the day this is written would be 19961203, or December 3, 1996. Many other schema are possible, but if used, they should be identified in an unambiguous manner.
Qualifier possible: TYPE
210 Publication, Distribution, etc. $d Date of Publication, Distribution, etc.
Other types of dates are specified in the UNIMARC format. For example 005 Version Identifier contains the date and time of the last record transaction, i.e. when the record itself was last changed. This information may be contained in administrative metadata attached to the Dublin Core but is outside the scope of this mapping. Publication dates also occur in the 100 General Processing Data field.
The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. It is expected that RESOURCE TYPE will be chosen from an enumerated list of types. A preliminary set of such types can be found at the following: <URL:http://www.roads.lut.ac.uk/Metadata/DC-ObjectTypes.html>
608 Form, Genre or Physical Characteristics Heading $a Entry Element $2 System Code
The subfield $2 identifies the system from which the form heading is derived. Any specific code used would have to be registered with the IFLA UBCIM Office by submitting registration details - suggested code, author, title, imprint and maintaining agency.
The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. The intent of specifying this element is to provide information necessary to allow people or machines to make decisions about the usability of the encoded data (what hardware and software might be required to display or execute it, for example). As with RESOURCE TYPE, FORMAT will be assigned from enumerated lists such as registered Internet Media Types (MIME types). In principle, formats can include physical media such as books, serials, or other non-electronic media.
336 Type of Computer File (Provisional) $a Text of Note
There is a separate field (337) in UNIMARC for a Technical Details Note (Computer Files) but this would be used for a more precise textual description of the character-sets or formats used.
There is no 856 field in UNIMARC at present although a current rewriting of the electronic resource fields of UNIMARC involves a suggested 856 for consideration by the Permanent UNIMARC Committee.
String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers, such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element.
Qualifier possible: SCHEME.
For first or only DC.identifier: 001 Record Identifier
If SCHEME=ISBN: 010 International Standard Book Number $a Number (ISBN)
If SCHEME=ISSN: 011 International Standard Serial Number (ISSN) $a Number (ISSN)
If SCHEME=SICI: 014 Article Identifier $a Article Identifier $2 sici
If SCHEME=BIBLID: 014 Article Identifier $a Article Identifier $2 biblid
If SCHEME=CODEN: 040 CODEN (Serials) $a CODEN
If SCHEME=BNB: 020 National Bibliography Number
If SCHEME=URL: 300 General Note
The UNIMARC Identification Block (0--) contains numbers that identify the record or the item recorded in it. It specifically includes fields for ISBNs, ISSNs and CODENs. Field 001 Record Identifier is mandatory in every record and contains characters uniquely associated with the record. Examples include ISBNs, BNB numbers, LC Control Numbers, etc. It is possible that an URL might be suitable for this purpose.
UNIMARC Identifiers will not handle URIs, URNs, IP addresses or other standard identifier codes as USMARC does using 024 and 856. Until UNIMARC develops the relevant fields it is possible that a URL should be made visible by placing it in a 300 General Notes field, maybe with the text: "Identifier: URL:" preceding it
Note that also: 013 International Standard Music Number, and 015 International Standard Report Number, have recently been added to UNIMARC.
The work, either print or electronic, from which this resource is derived, if applicable. For example, an html encoding of a Shakespearean sonnet might identify the paper version of the sonnet from which the electronic version was transcribed.
324 Original Version Note $a Text of Note
The text: "Source:" could be generated for the start of this note.
ISBD punctuation is recommended for this field.
As a note, this field cannot be searched. The relevant field in the 4-- Linking Entry Block, 455 "Reproduction of" is probably too complex for use in this type of mapping process but could be added by human cataloguers.
Note, however, that the 4-- Linking entry block will in future take two forms - the complex one outlined in the UNIMARC manual and the one used for UNIMARC 4-- by the US, Portugal and France, where elements are introduced by subfield marks and not by $1[tag].
Language of the intellectual content of the resource. Where practical, the content of this field should coincide with the Z39.53 three character codes for written languages.
Qualifier possible: SCHEME.
If no SCHEME: 300 General Notes $a Text of Note
If SCHEME=USMARC: 101 Language of the Item (Indicator 1: fill character) $a Language of Text, Soundtrack, etc.
101 is a mandatory field in UNIMARC where the item has language.
101 uses coded information, the codes being taken from the list developed for use in USMARC records by the Library of Congress. The latest LC list is the authoritative version, although a list of codes are listed in UNIMARC Manual Appendix A.
There is no specific language note field, so textual information could be entered as a 300 General Note, possibly with the generated text: "Language:".
Relationship to other resources. The intent of specifying this element is to provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves. For example, images in a document, chapters in a book, or items in a collection. A formal specification of RELATION is currently under development. Users and developers should understand that use of this element should be currently considered experimental.
Possible qualifiers: SCHEME, TYPE.
300 General Notes $a Text of Note
The text "Relation:" could be generated in the text of 300$a
There is no specific note field in UNIMARC that specifically includes relationships as defined here. There is a contents note (327), but this relates specifically to the item being catalogued and ISBD data element definitions and punctuation are recommended.
There is a Linking Entry Field for Other Related Work (488) which could be used if a less complex 4-Linking Entry Block is adopted.
The spatial locations and temporal durations characteristic of the resource. Formal specification of COVERAGE is currently under development. Users and developers should understand that use of this element should be currently considered experimental.
Possible qualifier: TYPE.
300 General Notes $a Text of Note
The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way. The intent of specifying this field is to allow providers a means to associate terms and conditions or copyright statements with a resource or collection of resources. No assumptions should be made by users if such a field is empty or not present.
Qualifiers possible: URL, URN.
300 General Notes $a Text of Note
There is a UNIMARC notes field for Acquisition/Information Note (345) that could possibly be used with adaptation. 354$d Terms of Availability is currently used to record the price of an item.
If the RIGHTS data is a URL and UNIMARC included an 856 field like USMARC it could be mapped to the equivalent of USMARC 856$3 Materials Specified, with a $u.
This Dublin Core record is for a WWW page and is encoded for embedding in <META> tags in the header of an HTML 4.0 file. It is for illustrative purposes only and may not conform to the most recent recommended DC practice.
<META NAME="DC.Title" CONTENT="UKOLN metadata">
<META NAME="DC.Creator.PersonalName" CONTENT="Andy Powell">
<META NAME="DC.Creator.Email" CONTENT="A.Powell@ukoln.ac.uk">
<META NAME="DC.Creator.PersonalName" CONTENT="Michael Day">
<META NAME="DC.Creator.Email" CONTENT="M.Day@ukoln.ac.uk">
<META NAME="DC.Subject" SCHEME="DDC" CONTENT="025.05">
<META NAME="DC.Subject" SCHEME="LCSH" CONTENT="Library information networks">
<META NAME="DC.Description" CONTENT="A web page that provides an introduction to metadata and describes the work of UKOLN: the UK Office for Library and Information Networking in the area of resource discovery">
<META NAME="DC.Publisher" CONTENT="UKOLN The UK Office for Library and Information Networking ">
<META NAME="DC.Contributor.CorporateName" CONTENT="UKOLN Metadata Group">
<META NAME="DC.Date" CONTENT="19970626">
<META NAME="DC.Type" CONTENT="text/html">
<META NAME="DC.Identifier" CONTENT="http://www.ukoln.ac.uk/metadata/intro.html">
<META NAME="DC.Language" CONTENT="EN">
<BODY> ... </BODY></HTML>
Using the mappings defined above a UNIMARC records (of sorts) can be produced:
200 1#$aUKOLN metadata
210 ##$cUKOLN The UK Office for Library and Information Networking$d1997
300 ##$aIdentifier: URL:http://www.ukoln.ac.uk/metadata/intro.html
330 ##$aA web page that provides an introduction to metadata and describes the work of UKOLN the UK Office for Library and Information Networking in the area of resource discovery
606 ##$aLibrary information networks$lcsh
701 #0$aAndy Powell
701 #0$aMichael Day
711 #0$aUKOLN Metadata Group
This record was created using the following mapping:
210 ##$c[DC.publisher]$d[DC.date (first four digits only)]
300 ##$aIdentifier: URL:[DC.identifier]
606 ##$a[DC.subject SCHEME=LCSH]$lcsh
676 ##$a[DC.subject SCHEME=DDC]
701 #0$a[DC.creator TYPE=PersonalName]
711 #?$a[DC.contributors TYPE=CorporateName]
The only real difficulty in this mapping is that both 701 and 711 fields have to assume that the data in the DC.creator or DC.contributors elements are written in natural order. The fact that the Dublin Core record also contains TYPE qualifiers for the creator and contributors field also means that the mapping is probably more consistent than it would be with less well defined DC records.
The other major problem with this mapping is that the UNIMARC record is missing the mandatory 24 digit Record Label and the 35 digit 100 General Processing Data field.
A Dublin Core record might be able, given the right circumstances, to produce a reasonably comprehensive descriptive UNIMARC record. However the record produced may not be a valid UNIMARC record because it is missing some mandatory fields:
Constructing the Record Label and the General Processing Data will be the biggest problems. The Record Label is always 24 characters long and contains general information which may be needed in processing the record. It is based on ISO 2709. The General Processing Data (100) field contains 35 characters of fixed-length data including the date entered on file, the publication date, character sets and the language of cataloguing.
With UNIMARC validity in mind, it might be possible to produce better UNIMARC records from Dublin Core format records if the conversion was mediated in some way by human beings. This could improve the quality of the UNIMARC records produced but with significant costs in terms of finance and time.
Dublin Core elements are extensible and it is possible that publisher DC records might include a variety of organisation specific elements. This information would be lost during the transmission process unless specific ways of dealing with them are produced. The Warwick Framework, or some other software framework which holds together different types of metadata might be a partial solution to this problem.
UNIMARC, like other MARC formats, is an evolving format. In the future it might become more useful for holding data originally held in DC or other similar formats. The proposed addition of an equivalent of USMARC 856 is one instance of this . Periodic re-mappings might be necessary in an operational environment.
UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Services Committee of the UK Higher Education Funding councils, as well as by project funding from JISC's eLib Programme and the European Union. UKOLN also receives support from the University of Bath, where it is based.
A paper covering similar issues to this one has been written by Alan Hopkinson of Middlesex University for the 1998 IFLA Conference. See: Alan Hopkinson, UNIMARC and Metadata: Dublin Core. 64th IFLA General Conference, Amsterdam, 16-21 August 1998.
This work was carried out for Work Package 4 (Format Conversion Feasibility) of the BIBLINK Project. BIBLINK is funded by the European Union's Telematics Applications Programme.
More information on BIBLINK can be found on the project's Web pages: <URL:http://hosted.ukoln.ac.uk/biblink/>
Maintained by: Michael Day of UKOLN The UK Office for Library and Information Networking, University of Bath.
Document created: 3-Jul-1997.
Last updated: 17-May-1999.
[UKOLN Metadata] [UKOLN Mapping Between Metadata Formats]