Contents | 1 | 2 | 3 | 4 | 5 | 6 | Previous | Next


3. Main Technical Ideas of OAI-PMH

Contents of this part of OAI for Beginners, the Open Archives Forum online tutorial


The Open Archives Initiative (OAI) Top

Recap of the main ideas of OAI

Basic functioning of OAI-PMH

 


OAI: general assumptions Top

There are two groups of 'participants': Data Providers and Service Providers.

Data Providers (open archives, repositories) provide free access to metadata, and may, but do not necessarily, offer free access to full texts or other resources. OAI-PMH provides an easy to implement, low barrier solution for Data Providers.

Service Providers use the OAI interfaces of the Data Providers to harvest and store metadata. Note that this means that there are no live search requests to the Data Providers; rather, services are based on the harvested data via OAI-PMH. Service Providers may select certain subsets from Data Providers (e.g., by set hierarchy or date stamp). Service Providers offer (value-added) services on the basis of the metadata harvested, and they may enrich the harvested metadata in order to do so.


OAI-PMH: overview and structure model Top

OAI-PMH structure model The OAI-PMH protocol is based on HTTP. Request arguments are issued as GET or POST parameters. OAI-PMH supports six request types (known as "verbs"), e.g.,
http://archive.org?verb=ListRecords&from=2002-11-01.

Responses are encoded in XML syntax. OAI-PMH supports any metadata format encoded in XML. Dublin Core is the minimal format specified for basic interoperability.

Error messages are HTTP-based.

Data Providers may define a logical set hierarchy to support levels of granularity for harvesting by Service Providers. Date stamps flag the last change of the metadata set, and thus provide further support for granularity of harvesting.

OAI-PMH supports flow control.

 

 

 


Seven key definitions Top

Harvester:
client application issuing OAI-PMH requests

Repository:
network accessible server, able to process OAI-PMH requests correctly

Resource:
object the metadata is "about", nature of resources is not defined in the OAI-PMH resources may be digital or non-digital

Item:
component of an repository from which metadata about a resource can be disseminated; has an unique identifier

Record:
metadata in a specific metadata format

Identifier:
unique key for an item in a repository

Set:
optional construct for grouping items in a repository

 


Protocol details Top

-- Records --

A record is the metadata of a resource in a specific format. A record has three parts: a header and metadata, both of which are mandatory, and an optional about statement. Each of these is made up of various components as set out below.

header (mandatory)
 ­  identifier (mandatory: 1 only)
 ­  datestamp (mandatory: 1 only)
 ­  setSpec elements (optional: 0, 1 or more)
 ­  status attribute for deleted item

metadata (mandatory)
 ­   XML encoded metadata with root tag, namespace
 ­   repositories must support Dublin Core, may support other formats

about (optional)
 ­    rights statements
 ­    provenance statements


-- Datestamps --

A datestamp is the date of last modification of a metadata record. Datestamp is a mandatory characteristic of every item. It has two possible levels of granularity:
YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ.

The function of the datestamp is to provide information on metadata that enables selective harvesting using from and until arguments. Its applications are in incremental update mechanisms. It gives either the date of creation, last modification, or deletion. Deletion is covered with three support levels: no, persistent, transient.


-- Metadata schema --

OAI-PMH supports dissemination of multiple metadata formats from a repository. The properties of metadata formats are:
   –    id string to specify the format (metadataPrefix)
   –    metadata schema URL (XML schema to test validity)
   –    XML namespace URI (global identifier for metadata format)

Repositories must be able to disseminate unqualified Dublin Core. Further arbitrary metadata formats can be defined and transported via the OAI-PMH. Any returned metadata must comply with an XML namespace specification. The Dublin Core Metadata Element Set contains 15 elements. All elements are optional, and all elements may be repeated.

The Dublin Core Metadata Element Set:

Title Contributor Source
Creator Date Language
Subject Type Relation
Description Format Coverage
Publisher Identifier Rights

-- Sets --

Sets enable a logical partitioning of repositories. They are optional ­ archives do not have to define Sets. There are no recommendations for the implementation of Sets. Sets are not necessarily exhaustive of the content of a repository. They are not necessarily strictly hierarchical. It is important and necessary to have negotiated agreements within communities defining useful sets for the communities.


-- Request format --

Requests must be submitted using the GET or POST methods of HTTP, and repositories must support both methods. At least one key=value pair: verb=RequestType (where RequestType is some type of request such as ListRecords) must be provided. Additional key=value pairs depend on the request type.

example for GET request: http://archive.org/oai?
verb=ListRecords&metadataPrefix=oai_dc

The encoding of special characters must be supported; for example, ":" (host port separator) becomes "%3A"


-- Response --

Responses are formatted as HTTP responses. The content type must be text/xml. HTTP-based status codes, as distinguished from OAI-PMH errors, such as 302 (redirect) and 503 (service not available) may be returned. Compression codes are optional in OAI-PMH, only identity encoding is mandatory. The response format must be well-formed XML with markup as follows:

  1. XML declaration
    (<?xml version="1.0" encoding="UTF-8" ?>)
  2. root element named OAI-PMH with three attributes
    (xmlns, xmlns:xsi, xsi:schemaLocation)
  3. three child elements
    1. responseDate (UTC datetime)
    2. request (the request that generated this response)
    3. a) error (in case of an error or exception condition)
      b) element with the name of the OAI-PMH request

-- Flow control --

flow contol illustrationFour of the request types return a list of entries. Three of them may reply with 'large' lists.

OAI-PMH supports partitioning. Those managing a repository make the decisions on partitioning: whether to partition and how.

The response to a request includes:
    incomplete list
    resumption token
       +   expiration date,
            size of complete list,
            cursor (optional)

For a new request with same request type:
    resumption token as parameter
    all other parameters omitted!

The response includes the next (which may be the last) section of the list and a resumption token. That resumption token is empty if the last section of the list is enclosed.


-- Errors and exceptions --

Repositories must indicate OAI-PMH errors by the inclusion of one or more error elements. The defined error identifiers are:

badArgument
badResumptionToken
badVerb
cannotDisseminateFormat
idDoesNotExist
noRecordsMatch
noMetaDataFormats
noSetHierarch


Request types Top

There are six different request types:

Identify
ListMetadataFormats
ListSets
ListIdentifiers
ListRecords
GetRecord

A harvester is not required to use all types. However, a repository must implement all types. There are required and optional arguments, depending on request types. Each request type will now be described.


-- Identify --

function
    description of an archive

example
    archive.org/oai-script?verb=Identify

parameters
    none

errors / exceptions
    badArgument (e.g. archive.org/oai-script?verb=Identify&set=biology)

response format

Element Example Ordinality
repositoryName My Archive 1
baseURL http://archive.org/oai 1
protocolVersion 2.0 1
earliestDatestamp 1999-01-01 1
deleteRecords no, transient, persistent 1
granularity YYY-MM-DD, YYYY-MM-DDThh:mm:ssZ 1
adminEmail oai-admin@archive.org +
compression deflate, compress *
description oai-identifier, eprints, friends, … *

‡ Ordinality: 1 = mandatory, 1 only; + = mandatory, 1 only; * = optional, 0 or more


-- ListMetadataFormats --

function
    retrieve available metadata formats from archive

example
    archive.org/oai-script?verb=ListMetadataFormats&
        identifier=oai:HUBerlin.de:3000218

parameters
    identifier (optional)

errors / exceptions
    badArgument
    idDoesNotExist
        e.g. archive.org/oai-script?verb=ListMetadataFormats
               &identifier=really-wrong-identifier
    noMetadataFormats


-- ListSets --

function
    retrieve set structure of a repository

example
    archive.org/oai-script?verb=ListSets

parameters
  resumptionToken (exclusive)

errors / exceptions
    badArgument
    badResumptionToken
        e.g. archive.org/oai-script?verb=ListSets
              &resumptionToken=any-wrong-token
    
noSetHierarchy


-- ListIdentifiers --

function
    abbreviated form of ListRecords, retrieving only headers

example
    archive.org/oai-script?verb=ListIdentifiers&
                metadataPrefix=oai_dc&from=2002-12-01

parameters
    from (optional)
    
until (optional)
    
metadataPrefix (required)
    
set (optional)
    resumptionToken (exclusive)

errors / exceptions
    badArgument (e.g. ?&from=2002-12-01-13:45:00)
    
badResumptionToken
    
cannotDisseminateFormat
    
noRecordsMatch
    
noSetHierarchy


-- ListRecords --

function
    harvest records from a repository

example
    archive.org/oai-script?verb=ListRecords&
                metadataPrefix=oai_dc&set=biology

parameters
    from (optional)
    until (optional)
    metadataPrefix (required)
    set (optional)
    resumptionToken (exclusive)

errors / exceptions
    badArgument
    badResumptionToken
    cannotDisseminateFormat
    noRecordsMatch
    noSetHierarchy


-- GetRecord --

function
    retrieve individual metadata record from a repository

example
    archive.org/oai-script?verb=GetRecord&
                identifier=oai:HUBerlin.de:3000218&
                metadataPrefix=oai_dc

parameters
    identifier (required)
    metadataPrefix (required)

errors / exceptions
    badArgument
    cannotDisseminateFormat
    idDoesNotExist


Example 1: response to ListIdentifiers request Top

This example shows the response to a ListIdentifiers request that specifies a date range, a metadata format, a set, and a Data Provider.


Example 2: response to GetRecord request Top

This example shows a response to a GetRecord request for an individual record specified by identifier.


Sources of further information Top

-- Web sites and email lists --

Open Archives Initiative (OAI) ­ official site
http://www.openarchives.org/

OAI-PMH protocol specification
http://www.openarchives.org/OAI/openarchivesprotocol.html

OAI general mailing list
http://www.openarchives.org/mailman/listinfo/OAI-general/ 

OA-Forum expert reports and reviews of organisational and technical issues
Links from http://www.oaforum.org/documents/

Dublin Core
http://dublincore.org/


Contents | 1 | 2 | 3 | 4 | 5 | 6 | Previous | Next


Copyright © 2003 University of Bath. All rights reserved.
Author: Leona Carpenter (co-ordinating author) for OA-Forum and UKOLN
Last modified: 14 Oct 2003 16:36
Authored in CALnet