5. Inter-operability between ROADS templates and the Dublin Metadata Core Element Set: Part 1. Mapping Dublin Core v. 0.1 to ROADS templates


DRAFT by Michael Day, UKOLN


5.1 Introduction

The Dublin Metadata Core Element Set (Dublin Core) was devised as a simple set of data elements so that Internet publishers and authors would be able to create their own metadata records. The Dublin Core elements were originally agreed at an OCLC/NCSA Metadata Workshop held in March 1995 at Dublin, Ohio. Weibel, et al. (1995) comment that "automatically generated records often contain too little information to be useful, while manually generated records are too costly to create and maintain for the large number of electronic documents currently available on the Internet". Dublin Core elements are designed to mediate between these extremes, although links could be made from them to more complicated records.

Weibel, et al. (1995) explains that the semantics of the Dublin Core metadata elements were intended to be "clear enough to be understood by a wide range of users". However, the short definitions given are reproduced below in Table 3.1.


Table 5.1: The thirteen metadata elements agreed at the Dublin Workshop, with short definitions from Weibel, et al. (1995).

Subject: 	The topic addressed by the work
Title: 		The name of the object
Author: 	The person(s) primarily responsible for the intellectual
		content of the object
Publisher: 	The agent or agency responsible for making the object
		available
OtherAgent: 	The person(s), such as editors and transcribers, who
		have made other significant intellectual contributions
		to the work
Date: 		The date of publication
ObjectType: 	The genre of the object, such as novel, poem, or dictionary
Form: 		The data representation of the object, such as Postscript
		file or Windows executable file
Identifier: 	String or number used to uniquely identify the object
Relation: 	Relationship to other objects
Source: 	Objects, either print or electronic, from which this object
		is derived, if applicable
Language: 	Language of the intellectual content
Coverage: 	The spatial locations and temporal durations characteristic
		of the object

The Dublin Core elements have already been mapped to USMARC (MARBI 1995, Caplan and Guenther 1996). A summary table of this mapping is included below (Table 3.2). The process raised many questions about the format of author, etc.


Table 5.2: Mapping of Dublin Core to USMARC in Discussion Paper 86 (MARBI, 1995).

Dublin Core elementUSMARC
Subject 653 Index Term -- Uncontrolled, or
650 Subject Added Entry -- Topical Term.
Title 245 Title Statement.
Author 100 Main Entry -- Personal Name, or
110 Main Entry -- Corporate Name, or
700 Added Entry -- Personal Name, or
710 Added Entry -- Corporate Name.
Publisher 260$b Name of Publisher, Distributor, etc.
OtherAgent 700 Added Entry -- Personal Name, or
710 Added Entry -- Corporate Name.
Date 260$c Date of Publication, Distribution, etc.
ObjectType Leader/06 Type of Record.
Form 538 System Details Note.
Identifier 010 LC Control Number, or
020 ISBN, or
022 ISSN, or
024 Other Standard Identifier, or
856$u Uniform Resource Locator.
Relation 772 Parent Record Entry, or
773 Host Item Entry, or
775 Other Edition Entry, or
776 Additional Physical Form Entry, or
780 Preceding Entry, or
785 Succeeding Entry, or
787 Nonspecific Relationship Entry.
Source 786 Data Source Entry, or
776 Additional Form Entry.
Language 041 Language Code, or
546 Language Note.
Coverage Spatial: 034 Coded Cartographic Mathematical Data, or
255 Cartographic Mathematical Data.
Temporal: 045 Time Period of Content, or
513 Type of Report and Period Covered Note.

With the ROADS-based subject services using ROADS / IAFA templates (Deutsch, et al.1994; Weider 1994; ROADS 1995), mapping of Dublin Core to these and vice versa should provide an interesting examination of Dublin Core's potential role as an interchange format between metadata types, and in particular with relation to the ROADS project (Heery 1996; Knight and Hamilton 1996).

Dublin Core, however, has not reached its final form. In September 1996 a draft revised version 0.2 of the Core was produced (Kunze 1996), looking to rationalise the scheme and reduce the number of elements. For information, in the draft document the elements have been reduced to ten, each with an abbreviated form and numeric equivalent:

Author		au	1
Title		ti	2
Subject		su	3
Type		ty	4
Date		da	5
Form		fm	6
Language	la	7
Resource	rs	8
Contributor	co	9
Relation	rn	10

In the draft document, there are also three qualifiers:

Role		ro	30
Scheme		sc	31
Flags		fl	32

These are currently still under discussion and Core v. 0.2 will not take this exact form. The mappings in this report are from Dublin Core v. 0.1 to ROADS / IAFA templates, a summary of which can be found in table 5.3.


Table 5.3: Mapping of Dublin Core to IAFA templates.

Dublin Core elementIAFA template
SubjectKeyword, Subject-Descriptor-Scheme and Subject-Descriptor
TitleTitle
AuthorAuthor-name (From Author (USER)* cluster)
PublisherPublisher-name (From Publisher (ORGANISATION)*cluster)
OtherAgentNo direct equivalent
DateCreation-date
ObjectTypeCategory
FormFormat-v* or Requirements
IdentifierURI-v*, ISBN, and ISSN
RelationNo direct equivalent
SourceNo direct equivalent
LanguageLanguage
CoverageNo direct equivalent


5.2 Comments on the mappings

Subject

In Dublin Core a SCHEME sub-element can be used to note which controlled indexing terms are being used, or which classification system is in use. e.g.:

Subject (scheme=LCSH): UNIX (Computer system)
Subject (scheme=Dewey Decimal System): 004.251 Supercomputers--systems design

If the sub-element includes a well known indexing or classification system, then this could be extracted and placed in the ROADS template "Subject-Descriptor-Scheme" and the data itself could be attached in an "Subject-Descriptor". Presumably, well used indexing or classification schemes could be in an authority file so that the machine could identify them accurately. Alternatively, the SCHEME sub-element could map directly to the ROADS "Subject-Descriptor-Scheme", and the attached data in "Subject-Descriptor". However, this would rely on abbreviations for the schemes being used in a consistent manner.

If no SCHEME sub-element is used, the subject terms could be assumed to be suitable for the ROADS template "Keywords". In Dublin Core, the Subject element can contain "any word or phrase that describes the intellectual content of the object" (Weibel, et al. 1995, 5.1). This does not accurately describe a keyword, but as long as the string is searchable in some way this should not matter greatly.

In Dublin Core the data elements are repeatable. Subject elements containing one or more SCHEME sub-elements are possible. All will have to map to their relevant place in an ROADS template.

e.g.:

Keywords:
Keywords:
Subject-Descriptor Scheme-v1: DDC
Subject-Descriptor Scheme-v2: LCSH
	Subject-Descriptor-v1: 004.251 Supercomputers--systems design
	Subject-Descriptor-v2: UNIX (Computer system)

Title

This should map neatly across to the ROADS template "Title". Dublin Core Title elements include subtitles. Dublin Core titles can be qualified with a SCHEME, if necessary.

Author

Dublin Core Author elements are defined as the "person(s) or agent primarily responsible for the intellectual content of the work" (Weibel, et al. 1995, 5.3). Dublin Core Authors elements could be mapped to the Author-Name part of the ROADS template Author-(USER)* cluster. Differences in format are not as crucial here as it would be mapping to a more complex scheme like MARC. If a Dublin Core SCHEME is added, e.g.:

Author (scheme=USMARC): 100 1 Doyle, Arthur Conan $c Sir $d 1859-1930,

things get more complicated.

Publisher

Dublin Core Publisher elements are defined as the "entity responsible for making the object available" (Weibel, et al. 1995, 5.4). Dublin Core publisher elements could be mapped to the Publisher-Name part of the ROADS template Publisher- (ORGANISATION)* cluster.

OtherAgent

The OtherAgent element is intended to describe roles like editing, illustrating, compiling, etc. It can take the form of a free text string:

OtherAgent: Transcribed by the University of Maryland at College Park Libraries 
Humanities Electronic Text Center

or could be defined by a ROLE sub-element:

OtherAgent: (role=Editor): Harnad, Stevan
OtherAgent: (role=Illlustrator): Bailey, Sian

Whichever is used, there is no obvious place in ROADS template based records where this data could be accurately mapped.

Date

The Dublin Core Date of publication element is intended to "reflect the date at which the object became available in its current form" (Weibel, et al. 1995, 5.6). It can take several formats, and can (again) be defined by SCHEME, e.g.:

Date: May 6, 1995
Date (scheme=ANSI X3.30-1985): 950506

It should map to the ROADS template element Creation-Date. There are potential problems with compatibility between date formats. ROADS templates do not specify what form of date should be used. Dublin Core uses a "scheme" so that ANSI X3.30- 1985 dates could be used. A conversion program might have to convert this format to a more human-readable form for the ROADS template.

ObjectType

As the genre or category of the object, it would probably best map to the ROADS template "Category". There are no problems with semantics, although it might be thought best that there might be an authority list of the most well-used terms.

Form

The Dublin Core Form element is intended to provide information about the hardware and software requirements to display or operate the object. To this extent, examples like Windows 3.1 executable file, HTML file or ASCII file, would best map to the ROADS template "Format-v*". If there is more than one format given in a Dublin Core record, ROADS templates would have to automatically generate additional Format-v* elements. If, however, Dublin Core Form elements are free text descriptions of how the object can be displayed or operated, it would map better to the ROADS template "Requirements".

Identifier

The identifier in Dublin Core is the unique string or number used to identify an object. This includes things like ISBNs and ISBNs, as well as URLs. The type of identifier would be given as a scheme.

Identifier (scheme=ISBN) = 0-19-097636-X
Identifier (scheme=URL) = http://www.ukoln.ac.uk/metadata/home.html

ROADS Templates include attributes for ISBN, ISSN and URI-v*. With schemes present, URLs, ISBNs and ISSNs could be adequately mapped to ROADS. Other identifiers would not, however, necessarily fit into the ROADS templates.

Relation

The Dublin Core relation element gives the relationship of the object to other objects. This could be to other documents in a hierarchy, or maybe to the parent electronic journal, although other relationships are possible. Dublin Core assumes that 2 sub- elements will be required, Type and Identifier.

Relation (type=ContainedIn) (identifier=URL) =
http://www.dlib.org/dlib/october96/

ROADS templates do not currently contain relation elements. The Dublin Core relation element will not therefore map to ROADS templates. However discussion is currently taking place within the ROADS project to ensure that basic relationships (e.g. Parent and Child relationships) can be identified in some way.

Source

The Dublin Core source element refers to the object from which the object being catalogued is derived, e.g. the previous version of a document. It takes the form of an identifier.

Source (scheme=ISBN) = 0-8018-4281-6
Source (scheme=URL) = http://www.ub2.lu.se/tk/demos/DO9603-manus.html

The source element in ROADS templates are designed to give information as to the source of the object. It is not used in the SERVICE template, but can be included in the DOCUMENT template. This is not necessarily related to an identifier as defined by Dublin Core but is usually a short form of text. A short form of text could presumably be inserted, e.g. "Derived from":

Source: Derived from http://www.ub2.lu.se/tk/demos/DO9603-manus.html
Source: Derived from ISBN 0-8018-4281-6

But this does not always add much to a potential user's understanding. This mapping remains problematic.

Language

Language in Dublin Core specifies the language of the intellectual content of the object.

Language = English
Language = Swedish

Again, abbreviations can be used and the source included as a scheme

Language (scheme=USMARC) = spa

In ROADS templates, the Language-v* template is used for the language in which the object is written. It can also be used for the programming language in a SOFTWARE template.

Coverage

The Dublin Core coverage element describes spatial and temporal characteristics of an object. It would be used for GIS or geospatial data, or something requiring time elements.

Coverage (type = spatial) = The Atlantic Ocean
Coverage (type = spatial, scheme = LATLONG0 = {West - 180,
East = 180, North = 90, South = 90}
Coverage (type = temporal, scheme = ANSI X3.30-1985) =
{Begin = 19910101, Eng = 19930601}

There is no ROADS templates equivalent of this element, although it could provide part of the Description.


5.3 Examples of Mapping

Example 1

The 1995 OCLC Dublin Core metadata workshop report gave some examples of records encoded using the Dublin Core. The first was created by a subject specialist without specific library cataloguing experience.

Dublin Core record:
Subject: IETF, URI, Uniform Resource Identifiers
Title: A Unifying Syntax for the Expression of Names and Addresses of Objects 
on the Network as used in the World-Wide Web.
Title: (Subtitle) Universal Resource Identifiers in WWW
Author: Berners-Lee, T.
Publisher: CERN
Date: 1994
Object-Type: Internet RFC
Form (scheme=IMT): text/plain
Identifier(scheme=URL): gopher://gopher.es.net:70/0R0-57601-/pub/rfcs/rfc1630.txt
Relation (type=child)(identifier=URL): http://ds.internic.net/ds/dspg1intdoc.html
Relation (type=sibling)(identifier=URL): http://ds.internic.net/rfc/rfc1738.txt
IAFA / ROADS template record:
Author-Name: Berners-Lee, T.
Category: Internet RFC
Creation-Date: 1994
Format: text/plain
Keyword: IETF, URI, Uniform Resource Identifiers
Publisher-Name: CERN
Title: A Unifying Syntax for the Expression of Names and Addresses of 
Objects on the Network as used in the World-Wide Web.
Title: Universal Resource Identifiers in WWW
Template-Type: DOCUMENT
URI-v1: gopher://gopher.es.net:70/0R0-57601-/pub/rfcs/rfc1630.txt

Notes:

Most of the record maps quite easily onto the ROADS template. Somehow it will have to work out whether a DOCUMENT, SERVICE or other Template-Type is required.

The Title and Title (subtitle) in Dublin Core is confusing, and would result in two Title elements in ROADS templates. If, however, conversion software could recognise the (subtitle), then it could conceivably add the relevant syntax:

Title: A Unifying Syntax for the Expression of Names and 
Addresses of Objects on the Network as used in the World-Wide 
Web: Universal Resource Identifiers in WWW

The Relation elements in Dublin Core are completely ignored.


Example 2

Dublin Core record
Subject: 
	scheme=LCSH:	Internet (Computer network)
			Cataloging of computer files
			Information networks
			Computer networks
			Libraries--Communication systems
			Information storage and retrieval systems

Title:			Assessing Information on the Internet: Toward
			Providing Library Services for Computer 
			Mediated Communication

Author:			Martin Dillon
Author:			Erik Jul
Author:			Mark Burge
Author:			Carol Hickey

Publisher:		OCLC

Date:			1994

Identifier:
	Scheme=OCLC:	155653163X

Object type:
	Scheme=AACR2:	monograph

Form: 			7 postscript files
			1 Unix tar file

Relation:		For a Web page listing Internet accessible
			OCLC research publications go to: 
			http://www.oclc.org/oclc/menu/reschdoc.htm

Language:		English

Source:
	scheme=DublinCore:	Subject:
					scheme=LCSH:
						Internet (Computer network) 
						Cataloging of computer files
						Information networks
						Computer networks
						Libraries--Communication systems
						Information storage and retrieval
							systems
				Title: 	Assessing Information on
					the Internet: Toward 
					Providing Library Services 
					for Computer Mediated 
					Communication
				Author: 	Martin Dillon
				Author: 	Erik Jul
				Author: 	Mark Burge
				Author: 	Carol Hickey
				Identifier:
					scheme=OCLC Technical Report
					Number:		1234567
				Date: 	1993
				Object type:
					scheme=AACR2:	monograph
				Form:
					Scheme=AACR2:	1 v. (various
							pagings) : ill.
							; 29 cm.
				Publisher:	OCLC
ROADS template record:
Author-Name: Carol Hickey
Author-Name: Erik Jul
Author-Name: Mark Burge
Author-Name: Martin Dillon
Category: monograph
Creation-Date: 1994
Format-v1: 7 postscript files, 1 Unix tar file
Language-v1: English
Publisher-Name: OCLC
Subject-Descriptor Scheme-v1: LCSH
Subject-Descriptor-v1: Cataloging of computer files
Subject-Descriptor-v1: Computer networks
Subject-Descriptor-v1: Information networks
Subject-Descriptor-v1: Information storage and retrieval systems
Subject-Descriptor-v1: Internet (Computer network)
Subject-Descriptor-v1: Libraries--Communication systems
Template-Type: DOCUMENT
Title: Assessing Information on the Internet: Toward Providing 
Library Services for Computer Mediated Communication

Notes:

The complex Dublin Core Source element did not map to the ROADS template.

Relation element missing from ROADS template.


Example 3

Dublin Core record
Title: On the Pulse of Morning
Author: Maya Angelou
Publisher: University of Virgina Library Electronic Text Center
OtherAgent: Transcribed by the University of Virginia Electronic Text 
Center
Date: 1993
Object: Poem
Form: 1 ASCII file
Source: Newspaper stories and oral performance of text at the 
presidential inauguration of Bill Clinton
Language: English
ROADS template record
Author-Name: Maya Angelou
Category: Poem
Creation-Date: 1993
Format-v1: 1 ASCII file
Language-v1: English
Publisher-Name: University of Virginia Library Electronic Text Center
Source: Newspaper stories and oral performance of text at the 
presidential inauguration of Bill Clinton
Template-Type: DOCUMENT
Title: On the Pulse of Morning

Notes:

The DC OtherAgent element, without a further qualifier (e.g. publisher, compiler) does not map onto a ROADS attribute. In this case, however, this is not a major problem as the transcriber is also the publisher.


5.4 References


Maintained by: Michael Day of UKOLN.
Last updated: 18 November 1996.