Title:

Guidelines for assigning identifiers to metadata terms

Creator:
Andy Powell
UKOLN, University of Bath, UK
Date Issued:
2004-08-01
Identifier:
http://www.ukoln.ac.uk/metadata/dcmi/term-identifier-guidelines/
Replaces:
 
Is Replaced By:
Not applicable
Latest Version:
http://www.ukoln.ac.uk/metadata/dcmi/term-identifier-guidelines/
Status of Document:
This is a DRAFT DCMI Recommended Resource.
Description of Document: This document provides some simple guidelines for assigning identifiers to non-DCMI metadata terms (elements, element refinements, encoding schemes and vocabulary terms).

1. Introduction

The DCMI Abstract Model [DCMI-AM] requires that all terms (elements, element refinements, encoding schemes and controlled vocabulary terms) used in metadata application profiles that are compliant with the model must be assigned a URI [RFC3986] that identifies the term. An XML namespace [XML-NAMES] is a collection of names, identified by a URI, that are used in XML documents as element types and attribute names. By convention, all DCMI recommended encodings [DCMI-ENCODINGS] use a concatenation of an XML namespace URI and the term name to provide a mechanism for encoding the term URI. The use of XML namespaces and URI to uniquely identify metadata terms allows those terms to be unambiguously used across applications, promoting the possibility of shared semantics. As indicated in the DCMI Namespace Policy [DCMI-NAMESPACE], DCMI has adopted this mechanism for the identification of all DCMI terms.

This document provides some simple guidelines for assigning URIs to metadata terms in non-DCMI namespaces. This includes non-DCMI elements, element refinements, encoding schemes and controlled vocabulary terms.

Although these guidelines are mainly intended for metadata application profiles that conform with the DCMI Abstract Model, it is hoped that they are generic enough that they may be useful in the context of other metadata applications as well.

2. Guidelines

All metadata terms must be assigned a URI. The use of fragment identifiers in the URI used to identify metadata terms is optional and is left to the discretion of the implementor.

For the purposes of encoding, the term URI may be partitioned into an XML namespace URI and the term name. Note that, for convenience, it is commonly the case that XML namespace URIs end with either a '#' (hash) or '/' (slash) character.

Groups of related terms (for example, all the terms within a controlled vocabulary) should be assigned URIs within the same XML namespace.

All XML namespace and term URIs should resolve to human and/or machine-readable descriptions of the namespace or term.

Any valid URI [RFC3986] may be used to identify a metadata term. However, the use of a registered URI scheme is recommended [URI-SCHEMES].

All XML namespace and term URIs should be assigned with the intention of them being unique and persistent. This means that the URI must not be used to identify anything else and that it should be expected to last as long as the Internet.

3. Strategies for assigning URIs

Four simple strategies for assigning URIs to metadata terms are described below.

3.1 Using service or project URLs

Where a term is created within the context of a particular project, service or other initiative, the use of a project or service-specific URL may be appropriate. This is probably the simplest strategy in terms of ease of assignment and resolution. However, it is also the most prone to lack of persistence.

Example 1: http://myservice.org/terms/price
An existing service is delivered using the myservice.org DNS domain name. The service creates a new property called price for use in its metadata application profile. The service defines an XML namespace URI within its existing URL space (http://example.org/terms/) and therefore assigns the term the following URI: http://example.org/terms/price.
Example 2: http://myproject.org/metadata/vocabs/color#Red
A project Web-site is delivered using the myproject.org DNS domain name. The project team build up a new controlled vocabulary of colors for use within their metadata application profile. They define an XML namespace URI within their existing URL space (http://myproject.org/metadata/vocabs/color#). For the vocabulary term Red, the term URI is therefore http://myproject.org/metadata/vocabs/color#Red

Notice that example 1 defines a metadata property while example 2 defines a term within a controlled vocabulary. Remember that in example 2 it will probably also be necessary to define an encoding scheme name for the vocabulary itself, for example http://myproject.org/metadata/terms/Color.

3.2 Using PURLs

A similar approach, but one that is likely to offer more persistent URIs, is to use PURLs [PURL]. A PURL is a Persistent Uniform Resource Locator. Functionally, a PURL is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. This provides a level of resilience aginst changes in project or service URLs. The use of PURLs to identify metadata terms has already been adopted by a number of metadata-related initiatives such as DCMI itself and RDF Site Summary (RSS) 1.0 [RSS10].

Example 1: http://purl.org/rss/1.0/link
RDF Site Summary is a lightweight multipurpose extensible metadata description and syndication format. The core metadata terms used by RSS are declared within an XML namespace (http://purl.org/rss/1.0/). For example, the property called link has been assigned the URI http://purl.org/rss/1.0/link. Other terms are declared within separate groupings, known in RSS as modules. Each module makes use of one or more separate XML namespaces.
Example 2: http://purl.org/rdn/terms/dateReviewed
The UK JISC-funded Resource Discovery Network has developed a small metadata application profile in order to describe the status of its catalogue records. One of the new terms in the application profile is called dateReviewed. All the new terms have been defined within an RDN XML namespace (http://purl.org/rdn/terms/). Therefore, the URI assigned to the dateReviewed property is http://purl.org/rdn/terms/dateReviewed.

Note that in example 1, the RSS implementors have chosen to embed a version number into the XML namespace URI. This allows them to use the same term name within a new XML namespace in future versions of the application profile. This has advantages in some scenarios. However, implementors should be cautious when using this technique because it may result in URIs being assigned to new terms that have the same semantics as existing terms.

3.3 Using "info" URIs

The "info" URI scheme provides a "mechanism for assigning URIs to information assets that have identifiers in public namespaces" but that do not have an appropriate existing URI scheme [INFO-URI-SPEC] [INFO-REGISTRY]. The phrase 'information assets' includes all the metadata terms discussed here. Thus, it is appropriate to consider assigning "info" URIs to metadata terms.

Example 1: info:ddc/22/eng//004.678
The terms that make up the Dewey Decimal Classification [DEWEY] have been assigned "info" URIs such that info:ddc/22/eng// can be considered to be an XML namespace URI and "004.678" can be considered to be a Dewey term name. Thus the URI that has been assigned to that term is info:ddc/22/eng//004.678. Note that the information asset identified by this term is in the English-language Dewey Decimal Classifications (22nd Ed.) and is the classification "Internet".

Note that, somewhat confusingly, the draft "info" URI specification uses different terminology from that used here. In the terminology of the specification, ddc is the "info URI namespace component" and 22/eng//004.678 is the "info URI identifier component".

Note also that "info" URIs can not be resolved using current Web browsers (i.e. by using a simple HTTP GET request). Indeed, "info" URIs are designed to be non-dereferencable - i.e. it is not possible to dereference an "info" URI in order to retrieve a representation of the identified resource. Unfortunately, this has serious consequences on their utility for identifiying metadata terms. Since it is not possible to easily obtain a representation of the identified term (typically some metadata about the term), it is not possible to obtain any information about the relationships between the identified term and other terms. This means that the "info" URI is of limited use in the context of the Semantic Web, since it is not possible for software applications to reason automatically based on knowledge about the relationships between multiple metadata terms.

At the time of writing, "info" was not a registered URI scheme.

3.4 Using xmlns.com

xmlns.com provides a network space for simple Web namespace management. "The rationale for registering xmlns.com was to secure a short, memorable domain suitable for naming concepts for use in RDF and XML vocabularies" [XMLNS]. The FOAF vocabulary [FOAF] uses xmlns.com to provide an XML namespace URI for its terms.

Example 1: http://xmlns.com/foaf/0.1/firstName
The firstName term within the FOAF vocabulary uses the http://xmlns.com/foaf/0.1/ XML namespace URI and has been assigned the URI http://xmlns.com/foaf/0.1/firstName.

Note that, at the time of writing, the status and ownership of the xmlns.com domain was slightly unclear and it is therefore not possible to be sure of the long term persistence of URIs based on this domain.

4. Conclusions

All terms used in metadata application profiles must be assigned a URI before they can be used in the encoding syntaxes recommended by DCMI. It is recommended that implementors assign URIs to terms following the guidelines provided here. Of the four strategies for assigning URIs to terms listed in this document, the use of PURLs is recommended for the identification of all metadata terms.

References

[DCMI-AM]
DCMI Abstract Model
http://dublincore.org/documents/abstract-model/

[XML-NAMES]
Namespaces in XML, W3C Recommendation, 14 January 1999
http://www.w3.org/TR/REC-xml-names

[RFC3986]
IETF (Internet Engineering Task Force) RFC 3986: Uniform Resource Identifiers (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter. January 2005.
http://www.ietf.org/rfc/rfc3986.txt

[DCMI-ENCODINGS]
DCMI Encoding Guidelines
http://dublincore.org/resources/expressions/

[DCMI-NAMESPACE]
Namespace Policy for the Dublin Core Metadata Initiative (DCMI), 26 October 2001
http://dublincore.org/documents/dcmi-namespace/

[URI-SCHEMES]
Uniform Resource Identifier (URI) SCHEMES
http://www.iana.org/assignments/uri-schemes

[PURL]
PURLS
http://purl.org/

RDF Site Summary 1.0
http://purl.org/rss/1.0/spec

[INFO-URI-SPEC]
The "info" URI Scheme for Information Assets with Identifiers in Public Namespaces, 9 July 2004
http://info-uri.info/registry/docs/drafts/draft-vandesompel-info-uri-02.txt

[DEWEY]
Dewey Decimal Classification
http://www.oclc.org/dewey/

[INFO-REGISTRY]
"info" URI registry
http://info-uri.info/

[XMLNS]
xmlns.com
http://xmlns.com/

[FOAF]
FOAF Vocabulary Specification
http://xmlns.com/foaf/0.1/


Valid XHTML 1.0!Valid CSS!

Metadata associated with this resource: http://dublincore.org/documents/term-identifier-guidelines/index.shtml.rdf