Note: this is a pre-print version of an article published in Ariadne issue 27. Please read and cite the published version.
http://www.ariadne.ac.uk/issue27/metadata/
Andy Powell, UKOLN, University of
Bath
Ann Apps, MIMAS, University of Manchester
This article proposes a mechanism for embedding machine parsable citations into Dublin Core (DC) metadata records [1] based on the OpenURL [2]. It suggests providing partial OpenURLs using the DC Identifier, Source and Relation elements together with an associated 'OpenURL' encoding scheme. It summarises the relevance of this technique to support reference linking and considers mechanisms for providing richer bibliographic citations. A mapping between OpenURL attributes and Dublin Core Metadata Element Set (DCMES) [3] elements is provided.
The OpenURL provides a mechanism for encoding a citation for an information resource, typically a bibliographic resource, as a URL. The OpenURL is, in effect, an actionable URL that transports metadata or keys to access metadata for the object for which the OpenURL is provided. The target of the OpenURL is an OpenURL resolver that offers localized services in an open linking environment. The OpenURL resolver is typically referred to as the user's Institutional Service Component (ISC). The remainder of the OpenURL transports the citation.
The citation is provided by either using a global identifier for the resource, for example a Digital Object Identifier (DOI) [4], or by encoding metadata about the resource, for example title, author, journal title, etc., or by some combination of both approaches. It is also possible to encode a local identifier for the resource within the OpenURL. In combination with information about where the OpenURL was created, this allows software that receives the OpenURL to request further metadata about the information resource. However, this article focuses on the OpenURL metadata encoding mechanism rather than on the specific details of how OpenURLs are processed and used by resolvers and other software.
Originally known as the SFX-URL, the OpenURL's roots lie in the SFX research on reference linking in hybrid library environments [5]. At the time of writing, the OpenURL is most appropriate for citing bibliographic resources, although this is expected to change as the OpenURL develops and moves through the standardization process. Furthermore, the OpenURL has been developed primarily to support 'reference linking' applications. On its own, it does not provide enough richness to form the basis for detailed, full bibliographic citations, for example it includes only the first author of the work.
An OpenURL comprises two parts, a BASEURL and a QUERY. The BASEURL identifies the OpenURL resolver that will provide context sensitive services for the OpenURL. The BASEURL is specific to the particular user that is being sent the OpenURL - it typically identifies the ISC offered by the institution to which the user belongs. Services that embed OpenURLs in their Web interfaces, for example in their search results, must develop mechanisms for associating a BASEURL with each end-user. One way of doing this is to store the BASEURL in a cookie in the user's Web browser, another is to store the BASEURL along with other user preferences.
The QUERY part can be made up of one or more DESCRIPTIONs. Each DESCRIPTION comprises the metadata attributes and values that make up the citation for the resource. A full breakdown of the components of the DESCRIPTION is not provided here. See the OpenURL specification for full details [6].
Here is an example OpenURL:
http://resolver.ukoln.ac.uk/openresolver/?sid=ukoln:ariadne&genre=article &atitle=Information%20gateways:%20collaboration%20on%20content &title=Online%20Information%20Review&issn=1468-4527&volume=24 &spage=40&epage=45&artnum=1&aulast=Heery&aufirst=Rachel
In this example the BASEURL is <http://resolver.ukoln.ac.uk/openresolver/>, the URL of the UKOLN OpenResolver demonstrator service. The rest of the OpenURL is the QUERY, which is made up of a single DESCRIPTION of an article entitled 'Information gateways: collaboration on content' by Rachel Heery. The article was published in 'Online Information Review' volume 24.
Notice that, because the OpenURL is a URL, it is encoded in such a way that special characters, for example space characters, are represented by a percentage sign followed by two hex digits. This process is known as mandatory escape encoding.
(Note that all the OpenURL examples in this article have been split across multiple lines for display purposes. Note also that the optional OpenURL 'sid' attribute, set here to 'ukoln:ariadne', indicates the service that generated the OpenURL. For simplicitly other example OpenURLs in this article do not contain a 'sid' attribute.)
This article makes two proposals. Firstly, that an OpenURL may be given as the value of a DC Identifier element as a way of providing a citation for the resource being described by the DC record. Secondly, that an OpenURL may also be given as the value of a DC Source or Relation element as a way of providing citations for resources that are related to the resource being described.
The mechanism used in both cases is the same - a partial OpenURL is placed in the element value. A partial OpenURL is an OpenURL without a BASEURL. This is because, at the time at which the OpenURL is placed into the DC element value, there is no knowledge of which end-user(s) will receive the OpenURL. It is therefore not possible or sensible to embed the BASEURL part of the OpenURL in the element value. Only the DESCRIPTION part of the OpenURL should be placed in the element value.
Furthermore, a DC encoding scheme [7] of 'OpenURL' should be used to indicate that the value forms part of an OpenURL. Because the DESCRIPTION part of an OpenURL does not, on its own, form a URL, full mandatory escape encoding of the DESCRIPTION is not required. However, any ampersand ('&') characters that appear in the value of OpenURL attributes must be encoded as '&'.
Software that processes DC metadata records containing OpenURL DESCRIPTIONs will have to unencode any encoded '&' characters, add a BASEURL and escape encode the resulting string in order to deliver full OpenURLs to the end-user.
In order to provide a citation for the resource being described by a DC record, place an OpenURL DESCRIPTION for the resource in the value of a DC Identifier element and indicate a scheme of 'OpenURL'.
Here is an example, encoded using the XHTML <meta> tag:
<meta name="DC.Identifier" scheme="OpenURL" content="genre=article &atitle=Information gateways: collaboration on content &title=Online Information Review&issn=1468-4527&volume=24 &spage=40&epage=45&artnum=1&aulast=Heery&aufirst=Rachel" />
Note that the 'OpenURL' scheme is not yet formally recognised by the Dublin Core Metadata Initiative as a recommended Dublin Core qualifier.
A fuller set of XHTML <meta> tags for this resource might be:
<meta name="DC.Title" content="Information gateways: collaboration on content" /> <meta name="DC.Creator" content="Heery, Rachel" /> <meta name="DC.Identifier" scheme="OpenURL" content="genre=article &atitle=Information gateways: collaboration on content &title=Online Information Review&issn=1468-4527&volume=24 &spage=40&epage=45&artnum=1&aulast=Heery&aufirst=Rachel" />
In this case some information is duplicated in both the OpenURL DESCRIPTION and DC elements. This article makes no recommendations about whether it is sensible to duplicate the metadata in this way.
Note that for some applications, the citation provided by the OpenURL DESCRIPTION will not be sufficiently detailed. In such cases, a rich citation for the resource being described by the metadata record may only be achieved by combining the OpenURL DESCRIPTION with DCMES elements and possibly elements from other namespaces.
In order to provide a citation for a resource that is related to the resource being described, place an OpenURL DESCRIPTION for the related resource in the value of a DC Source or Relation element and indicate a scheme of 'OpenURL'.
For example, imagine that an HTML version of the journal article mentioned above is made available on the Web. Its embedded metadata might be:
<meta name="DC.Title" content="Information gateways: collaboration on content"> <meta name="DC.Creator" content="Heery, Rachel"> <meta name="DC.Format" content="text/html"> <meta name="DC.Identifier" content="http://www.ukoln.ac.uk/~lisrmh/infogate.html"> <meta name="DC.Source" scheme="OpenURL" content="genre=article&sid=ukoln: &atitle=Information gateways: collaboration on content &title=Online Information Review&issn=1468-4527&volume=24 &spage=40&epage=45&artnum=1&aulast=Heery&aufirst=Rachel"> <meta name="DC.Relation.references" scheme="OpenURL" content="id=doi:10.1045/december99-dempsey&genre=article &atitle=International Information Gateway Collaboration: report of the first IMesh Framework Workshop &title=D-Lib Magazine&issn=1082-9873&date=1999-12&volume=5 &artnum=12&aulast=Dempsey&aufirst=Lorcan">
This DC record refers to two related resources - the original journal article from which the Web version is derived (using DC Source) and an article published in D-Lib Magazine that is cited in the article (using DC Relation).
The example OpenURLs shown above are ideal for supporting 'reference linking' applications. However, in some cases more detailed citation information may be required.
Consider this example DC record for a journal article:
<meta name="DC.Title" content="International Information Gateway Collaboration: report of the first IMesh Framework Workshop"> <meta name="DC.Creator" content="Lorcan Dempsey"> <meta name="DC.Creator" content="Tracy Gardner"> <meta name="DC.Creator" content="Michael Day"> <meta name="DC.Creator" content="Titia van der Werf"> <meta name="DC.Publisher" content="Corporation for National Research Initiatives"> <meta name="DC.Date" content="1999-12"> <meta name="DC.Type" content="article"> <meta name="DC.Language" content="en-us"> <meta name="DC.Rights" content="Copyright (c) 1999 Lorcan Dempsey, Tracy Gardner, Michael Day, and Titia van der Werf"> <meta name="DC.Identifier" scheme="DOI" content="10.1045/december99-dempsey"> <meta name="DC.Identifier" content="http://www.dlib.org/dlib/december99/12dempsey.html"> <meta name="DC.Identifier" scheme="OpenURL" content="id=doi:10.1045/december99-dempsey &genre=article &atitle=International Information Gateway Collaboration: report of the first IMesh Framework Workshop &title=D-Lib Magazine&issn=1082-9873&date=1999-12&volume=5 &artnum=12&aulast=Dempsey&aufirst=Lorcan">
Notice that there is information contained in the DC elements that is not available in the OpenURL - for example the names of multiple authors. There is also information in the OpenURL that is not available in the DC elements, and that could not be embedded into DC elements - for example the volume and article numbers. There is information that is more accessible for machine parsing in the OpenURL such as the author's family and given names. Finally, there is some information that is duplicated in both the DC elements and in the OpenURL.
(Note: in the general case, one can imagine information about the affiliations of the authors also being embedded into the DC metadata, though details of the mechanism to do this have not yet been agreed by the DCMI.)
In some cases it might be useful to remove the duplicated information from the DC record. One approach would be to remove attributes from the OpenURL DESCRIPTION, where that information is available in other DC elements. So, in the DC record above, the 'atitle' and 'id' attributes might be removed. In other cases it might also be possible to remove the 'date', 'aufirst' and 'aulast' attributes as well. Software that processes the DC record could attempt to reconstruct a full OpenURL by adding information to the partial DESCRIPTION based on the DC element values.
However, in many cases, particularly where metadata is embedded into a resource dynamically based on a back-end database, the cost of duplicating information in both DC elements and the OpenURL is probably not very high. Clearly, where metadata and OpenURLs are created and maintained manually, there will be consistency implications for any duplicated information.
The table below gives the definitions of the current OpenURL attributes:
The table below provides a mapping from OpenURL attributes to unqualified DC elements.
The table shows OpenURL attributes against the genres for which they are allowed to be used. Mappings to DC elements are shown at appropriate points. An X in the table indicates that the OpenURL attribute may be used with the particular genre, but that there is no sensible DC mapping at that point.
The OpenURL 'genre' can be mapped to the DC Type element, although the list of OpenURL genres does not correspond with the list of types in the recommended DCMIType encoding scheme qualifier [8].
Note that five (author-related) OpenURL attributes are shown mapping to the DC Creator and Contributor elements. In general, several of these OpenURL attributes must be combined to form a complete DC Creator or Contributor value (for example aufirst and aulast). Depending on the formatting of a DC Creator or Contributor element value, mapping back from DC to these OpenURL attributes may be difficult because of the problems of splitting a single name into multiple components.
A richer crosswalk would be possible using qualified Dublin Core elements but this has not been presented here.
A request for fast-track standardization of the OpenURL was approved by NISO during its December 2000 SCD meeting. The expectation is that "NISO's aim will be to move rapidly towards a Draft Standard for Trial Use". Work is currently underway with NISO to establish a Steering Committee to work on the standardization. However, at the time of writing no firm timescales had been established.
It is anticipated that there will be some changes to the OpenURL specification during the standardization process. The nature of the changes will be:
(The authors would like to thanks Herbert Van de Sompel, Cornell University for providing background information for this section.)
The DC Citation Working Group was set up in November 1998 and was responsible for identifying standard methods for including bibliographic citation information about resources in their own metadata, and related problems of identifying resource version information. The group concentrated specifically on an article's placement within a journal, volume, and issue. The group has made several proposals for qualifiers to the Dublin Core Metadata Element Set (DCMES) to achieve this aim. Specifically:
JournalTitleFull
JournalTitleAbbreviated
JournalVolume
JournalIssueNumber
JournalPages
with the associated semantic definitions of these terms. While this set does not cover every eventuality it deals with the vast majority of cases and will give (together with the article metadata in DC.Title, DC.Creator and DC.Date) complete information for any reference-citation record that anyone might want to extract.
It is worth noting that the working group's proposed structured-value set can be mapped directly to available OpenURL attributes as follows:
Proposed structured value | OpenURL attribute |
---|---|
JournalTitleFull | title |
JournalTitleAbbreviated | stitle |
JournalVolume | volume |
JournalIssueNumber | issue |
JournalPages | spage, epage, pages |
More recently the working group began discussing a related problem of how to capture bibliographic citation information about conference papers, with a view to including other bibliographic genre in the future. OpenURLs provide a way to encode citation information for books, book parts, conference proceedings and papers. However, some conference proceedings are also journal issues. In this case, to capture citation information for an article as both a conference item and a journal item, it would be necessary to include two OpenURLs within repeated DC Identifier elements.
Therefore, the OpenURL DESCRIPTION appears to offer all the functionality identified by the working group for encoding bibliographic citations for simple resource discovery, albeit using a less human-readable syntax than that proposed by the working group. However, it may not offer the required functionality for individual Dublin Core based applications.
(The authors would like to thank Cliff Morgan, John Wiley & Sons, Ltd. (previous chair of the DC-Citation Working Group) for supplying background information for this section.)
The main purpose of this article has been to propose the adoption of an 'OpenURL' encoding scheme for the DC Identifier, Source and Relation elements. By doing this, the DCMI will provide users of DC metadata with a simple method of encoding machine-readable citations for bibliographic resources within their metadata, in particular supporting a mechanism for linking between digital resources and non-digital resources. We have also provided a crosswalk between unqualified DC and the OpenURL attributes and shown how a combination of both OpenURLs and DC metadata can be used to provide richer citations than those provided by either technology on its own.
Maintained by: Andy Powell
Last updated: