Identifiers

From DigiRepWiki

When developing repository services, assigning unique identifiers is a fundamental issue. There are various things to 'identify' in the context of repositories: the digital object, datastreams, information package etc. There may be different criteria for the choice of identifier scheme for different things: metadata records, resource or 'work', different representations or 'manifestations' of the work (datastreams), content package (complex objects) etc.

For example DOIs may be appropriate for published scholarly journal articles whereas info URIs, or other schemes, may be appropriate for repository content packages.

Therefore, it is important from the outset to ask: 'what needs to be identified?'

The decision on the choice of identifier scheme might depend on:

  • what you are identifying (certain sorts of content are traditionally identified by certain schemes, various constituent items are assigned

identifiers by the repository)

  • whether the item being identified is part of a 'data flow' where other agencies, apart from the repository itself, will be involved (other

agencies having legacy commitments to certain identifier schemes)

  • what level of persistence is required for the identifier?


The JISC Information Environment Technical Standards Version 1.1 provide guidelines on identifiers and their resolution:

"Every significant item that is made available through a JISC IE network service should be assigned a URI [1] that is reasonably persistent. This means that item URIs should not be expected to break for a period of 10-15 years after they have first been used. For this reason, JISC IE service components should not hardcode file format, server technology, service organisational structure or other information that is likely to change over a 10-15 year period into item URIs. If items become unavailable during that period, then the URI should resolve to a Web page that explains why the item is no longer available and what actions the end-user can take to obtain a copy of the item or similar resources. Furthermore, item URIs should not contain end-user-specific information, i.e. all item URIs should work for all end-users (albeit allowing for appropriate authentication challenges to be inserted into the process by which the URI is resolved)."
"Resources that comprise a collection of items that are packaged together for management or exchange purposes should be packaged using the IMS Content Packaging Specification [2] if they are 'learning objects' (i.e. resources are primarily intended for use in a learning and teaching context and that have a specific pedagogic aim) or the Metadata Encoding & Transmission Standard (METS) [3]." (Andy Powell, JISC Information Environment Technical Standards Version 1.1 http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/standards/)
1. Naming and Addressing: URIs, URLs (http://www.w3.org/Addressing/)
2. IMS Content Packaging Specification (http://www.imsglobal.org/content/packaging/)
3. Metadata Encoding & Transmission Standard (METS) (http://www.loc.gov/standards/mets/)


Identifiers for learning objects

For learning objects, the UK LOM Core recommends the use of URI identifiers. From their guidelines:

  • "implementers may choose from a range of persistent, globally unique identifier schemata which include, but are not restricted to, URI, URN, PURL, Handle, DOI, POI, ISSN, ISBN, XRI.
  • In order to facilitate interoperability within distributed environments it is recommended that the chosen scheme is encoded in the form of a URI." (UK Learning Object Metadata Core. Draft 0.2, May 2004 http://www.cetis.ac.uk/profiles/uklomcore)

This document also recommends the following resources:

This set of guidelines looks at ARK, DOI, Handle, ISBN, ISSN, PURL, POI, SICI. They "do not reflect consensus within the elearning and IEEE LOM community".
A revised version of the above is also available, providing guidelines for Dublin Core metadata only (http://www.ukoln.ac.uk/metadata/dcmi/identifiers/).


CETIS (http://www.cetis.ac.uk) held an international meeting on identifiers in 2003:

Article about the meeting, with link to the final report.
Additional resources for the meeting, including
This discussion paper looks at issues and requirements for the identification of learning objects.


HTTP vs. non-HTTP URIs as identifiers

See:

Slides (http://indico.cern.ch/materialDisplay.py?contribId=7&sessionId=1&materialId=slides&confId=0514)

Persisent URIs

including the following presentation:


Schemes

Uniform Resource Identifier (URI) SCHEMES (http://www.iana.org/assignments/uri-schemes)
The official IANA Registry of URI Schemes, including the well-known ftp, http etc. and the new ‘info’ scheme.


INFO

info URI scheme (http://info-uri.info/)
"INFO URI solves problems with identifying information assets, including documents and terms from classification schemes. The scheme is a consistent and reliable way to represent and reference such standard identifiers as Library of Congress Control Numbers on the Web so that these identifiers can be "read" and understood by Web applications." (NISO press release http://www.niso.org/news/releases/pr-InfoURI-11-05.html)


PURL/POI

A proposal for PURL-based resource identifiers, developed to provide a relatively persistent identifier for resources described by metadata in OAI-compliant repositories. POI takes advantage of the requirement to assign OAI identifiers to OAI items disclosed through OAI-PMH, using this oai-identifier as the basis for a POI.

A practical implementation of this can be seen in the POI – URL lookup tool developed for the RDN/LTSN interoperability project (http://www.rdn.ac.uk/poi/).


DOI


Handle


USAGE


aDORE

Content Identifiers in the aDORe repository architecture are expressed as URIs. Digital Objects, or their constituent datastreams, may have identifiers associated with them before they are ingested into aDore, for example DOIs are often assigned to journal articles. For further information, see:

  • Herbert Van de Sompel, Jeroen Bekaert, Xiaoming Liu, Luda Balakireva and Thorsten Schwander. aDORe: A Modular, Standards-Based

Digital Object Repository. The Computer Journal Advance Access published, 24 June 2005 (http://comjnl.oxfordjournals.org/cgi/rapidpdf/bxh114v1.pdf)

"This paper describes the aDORe repository architecture designed and implemented for ingesting, storing, and accessing a vast collection of Digital Objects at the Research Library of the Los Alamos National Laboratory. The aDORe architecture is highly modular and standards-based." (Abstract)


CORDRA

The CORDRA (Content Object Repository Discovery and Registration/Resolution Architecture) (http://cordra.lsal.cmu.edu/) uses the Handle system. CORDRA™ documents (http://cordra.lsal.cmu.edu/cordra/docs/) include the following:

  • Daniel R. Rehak. The Appropriate Version Problem: Separating Learning Designs and Course Structures from Learning Object Versions, Variants and Copies. Draft, Version V1.00.20050205
"This report outlines requirements for representing and specifying content versions and variants in learning design and course representations. Adapting the FRBR model (Functional Requirements for Bibliographic References), it presents a model used to represent content versions."
  • CORDRA™ Identifiers. Draft specification, Version: V1.00.20050101
Describes the representation of CORDRA identifiers as Handles.
  • Encoding CORDRA™ Identifiers in URI Syntax. Draft Specification/Profile, Version: V1.00.20050101
Describes the mechanism used in CORDRA to encode a Handle within URI syntax.

The documents page notes that the above documents are will be deposited into the CORDRA document repository, at which time they will be assigned persistent identifiers.


eBank

eBank UK (http://www.ukoln.ac.uk/projects/ebank-uk/) is a project to investigate the issues surrounding provenance and the use and re-use of original data for research and learning purposes, and will result in the development of an eBank UK pilot service for the benefit of the HE and FE communities. eBank UK has selected DOIs.

Links to a selection of identifier schemes and identifier resolution services.


For further information about the functional entities for 'work', 'expression' and 'manifestation', see:

FRBR is used in the document by Daniel R. Rehak listed above (see the CORDRA information).