UKOLN AHDS An Introduction To Persistent Identifiers



What are Persistent Identifiers?

An identifier is any label that allows us to find a resource. One of the best-known identifiers is the International Standard Book Number (ISBN), a unique ten-digit number assigned to books and other publications. On the Internet the most widely known identifier is the Uniform Resource Locator (URL), which allows users to find a resource by listing a protocol, domain name and, in many cases, file location.

A persistent identifier is, as the name suggests, an identifier that exists for a very long time. It should at the very least be globally unique and be used as a reference to the resource beyond the resource's lifetime. URLs, although useful, are not very persistent. They only provide a link to the resource's location at the moment in time they are cited, if the resource moves they no longer apply. The issue of 'linkrot' on the Internet (broken links to resources), along with the need for further interoperability has led to the search for more persistent identifiers for digital resources.

Principles for Persistent Identification

The International Digital Object Identifier (DOI) Foundation [1] states that there are two principles for persistent identification:

  1. Assign an ID to a resource: Once assigned the number must identify the same resource beyond the lifetime of the resource or identifier.
  2. Assign a resource to an ID: The resource should persistently continue to be the same thing.

Uniform Resource Identifiers

A Uniform Resource Identifier (URI) is the string that is used to identify anything on the Internet. URLs along with Uniform Resource Names (URNs) are both types of URI. A URN is a name with global scope and does not necessarily imply a location. A URN will include a Namespace Identifier (NID) Code and a Namespace Specific String (NSS). The NID specifies the identification system used (e.g. ISBN) and the NSS is local code that identifies a resource. For someone to find a resource using a URN they must use a resolver service.

Persistent URLs

Persistent URLs (PURLs) [2] have been developed by the Online Computer Library Centre (OCLC) as an interim measure for Internet resources until the URN framework is well established. A PURL is functionally a URL, but rather than pointing at a location points at a resolution service, which redirects the user to the appropriate URL. If the URL changes it just needs to be amended in the PURL resolution service

Example: http://purl.oclc.org/OCLC/PURL/summary
This is made up of the protocol (http), the resolver address (http://purl.oclc.org/) and the user-assigned name (OCLC/PURL/summary).

Digital Object Identifiers

The Digital Object Identifier (DOI) system was initiated by the Association of American publishers in an attempt to assist the publishing community with copyright and electronic commerce. DOIs are described by the International DOI Foundation, who manage them, as persistent, interoperable, actionable identifiers. They are persistent because they identify an object as a first-class entity (not just the location), they are interoperable because they are designed with the future in mind and they are actionable because they allow a user to locate a resource by resolution using the Handle System. The Handle System, developed by the Corporation for National Research Initiatives (CNRI) includes protocols that enable a distributed computer system to store handles of digital resources and resolve them into a location. DOIs can be assigned by a Registration Agency (RA), which provides services for a specific user community and may charge fees. The main RA for the publishing community is CrossRef [3].

Example: 10.1000/123456
This is made up of the prefix (10.1000) which is the string assigned to an organisation that registering DOIs and the suffix (123456) which is a unique (to a given prefix) alphanumeric string, which could be an existing identifier.

Using Persistent Identifiers

While DOIs hold great potential for helping many information communities enhance interoperability they have yet to reach full maturity. There are still many unresolved issues, such as their resolution (how users use them in to receive a Web page), registration of the DOI system, the persistence of the International DOI Foundation as an organisation and what exactly their advantages are over handles or PURLs. Until these matters are resolved they will remain little more than a good idea for most communities.

However the concept of persistent identifiers is still imperative to a working Internet. While effort is put into finding the best approach there is much that those creating Web pages can do to ensure that their URIs are persistent. In 1998 Tim Berners-Lee coined the phrase Cool URIs to describe URIs which do not change. His article explains the methods a Webmaster would use to design a URI that will stand the test of time. As Berners-Lee elucidates "URIs don't change: people change them." [4].

References

  1. International DOI Foundation,
    <http://doi.org/>
  2. PURL,
    <http://purl.org/>
  3. CrossRef,
    <http://www.crossref.org/>
  4. Cool URIs Don't Change, W3C,
    <http://www.w3.org/Provider/Style/URI.html>