The PURL-based Object Identifier (POI)

Andy Powell, UKOLN, University of Bath
Jeff Young, OCLC
Thom Hickey, OCLC

$Id: intro.html,v 1.8 2004/02/16 08:40:38 lisap Exp $

1. Introduction

This document desribes the PURL-based Object Identifier (POI) - a simple specification for resource identifiers based on the PURL system [1]. The use of the POI is closely related to the use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [2] and with the OAI identifier format (oai-identifiers) [3] used within that protocol.

The POI has been developed with the following criteria in mind:

The primary intention of the POI is as a relatively persistent identifier for resources that are described by metadata 'items' in OAI-compliant repositories. Where this is the case, POIs are not explicitly assigned to resources - a POI exists implicitly because an OAI 'item' associated with the resource is made available in an OAI-compliant repository. However, POIs can be explicitly assigned to resources independently from the use of OAI repositories and the OAI-PMH if desired. As such, the POI can be seen as a possible mechanism for implementing cool URIs [4].

A separate document provides some POI resolver guidelines [5]. All POI assigners are strongly encouraged to configure the PURL system to resolve their POIs.

2. POI specification

A POI is a PURL and a PURL is a URI. The POI syntax is a restriction of the "general, absolute URI" syntax: <scheme>:<scheme-specific-part>, defined in RFC 2396 [6]. The following description uses the same notational conventions as RFC 2396, and the same definitions of digit, alpha, alphanum, reserved, unreserved and uric.

All POIs conform to the PURL template below:

  "http://purl.org/poi/" namespace-identifier "/" local-identifier

where the components are as follows:

  namespace-identifier = domainname-word "." domainname
  domainname = domainname-word [ "." domainname ]
  domainname-word = alpha *( alphanum | "-" )

  local-identifier = 1*uric

Any uric elements are permitted in the local-identifier. Since characters in the reserved set do not have any special meaning in the local-identifier component, they are permitted unescaped. All characters not included in the unreserved and reserved sets must be escaped. Characters in the unreserved and reserved sets must not be escaped. The following definitions are copied from RFC 2396 for convenience:

  uric        = reserved | unreserved | escaped
  reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
  unreserved  = alphanum | mark
  mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

To avoid the possibility of inconsistently generated escaped characters in a POI, the hex digits must use uppercase for the letters A though F. This is a further restriction on RFC 2396. Thus, escaped and hex are defined as follows:

  escaped     = "%" hex hex
  hex         = digit | "A" | "B" | "C" | "D" | "E" | "F"

Examples:

  http://purl.org/poi/bath.ac.uk/lisap-2003-1286544
  http://purl.org/poi/rdn.ac.uk/12345-67890
  http://purl.org/poi/arXiv.org/hep-th/9901001
  http://purl.org/poi/foo.org/some-local-id-53
  http://purl.org/poi/xtcat.oclc.org/OCLCNo/ocm41020136

It is not necessary to explicitly register each assigned POI with the PURL system, as is the case with most PURLs. However, POI assigners are strongly encouraged to configure the PURL system to resolve their POIs as described in the POI resolver guidelines.

3. POI assignment algorithm

If the party assigning the POI offers an OAI-compliant repository containing metadata about the resource identified by the POI and uses oai-identifiers, it is not necessary to explicitly assign a POI. The POI is based on the oai-identifier for the 'item' that is assigned as part of running the OAI-compliant repository. Mapping between POIs and oai-identifiers is described below.

Where parties need to explicitly assign a POI, the algorithm is as follows:

  1. Select a DNS domain over which the assigner has some control (i.e. a DNS that they own, or a DNS domain within which they are allowed to create 'http' URIs), the namespace-identifier.
  2. Construct a string that uniquely 'names' the resource within that DNS domain, the local-identifier.
  3. If necessary, apply character escaping the local-identifier as described above.
  4. Combine the two, using the PURL template above.

This document makes no recommendations for how the local-identifier should be assigned, nor for techniques to make such names unique within their domain.

4. Mapping between POIs and oai-identifiers

To construct a POI from an available oai-identifier the following algorithm should be used:

  1. Strip off the 'oai:' URI scheme prefix.
  2. Convert the colon (':') separator between the namespace-identifier and the local-identifier to a forward slash ('/').
  3. Prepend the 'http://purl.org/poi/' prefix.

Examples:

  oai:bath.ac.uk:lisap-2003-1286544
    ---> http://purl.org/poi/bath.ac.uk/lisap-2003-1286544

  oai:rdn.ac.uk:12345-67890             
    ---> http://purl.org/poi/rdn.ac.uk/12345-67890

  oai:arXiv.org:hep-th/9901001          
    ---> http://purl.org/poi/arXiv.org/hep-th/9901001

  oai:foo.org:some-local-id-53          
    ---> http://purl.org/poi/foo.org/some-local-id-53

  oai:xtcat.oclc.org:OCLCNo/ocm41020136 
    ---> http://purl.org/poi/xtcat.oclc.org/OCLCNo/ocm41020136

To construct an oai-identifier from an available POI, use the reverse algorithm:

  1. Strip off the 'http://purl.org/poi/' prefix.
  2. Convert the first forward slash ('/') to a colon (':').
  3. Add the 'oai:' URI scheme prefix.

References

  1. The PURL system
    <http://purl.org/>
  2. The Open Archives Initiative Protocol for Metadata Harvesting
    <http://www.openarchives.org/OAI/openarchivesprotocol.html>
  3. Specification and XML Schema for the OAI Identifier Format
    <http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm>
  4. Guidelines For URI Naming Policies
    <http://www.ariadne.ac.uk/issue31/web-focus/>
  5. POI resolver guidelines
    <http://www.ukoln.ac.uk/distributed-systems/poi/resolver-guidelines/>
  6. Uniform Resource Identifiers (URI): Generic Syntax
    <http://www.ietf.org/rfc/rfc2396.txt>

Andy Powell
Last updated: 16-Feb-2004

[distributed systems] [UKOLN]