JISC Information Environment Architecture

OAI FAQ

JISC IE UKOLN

What does OAI stand for?

OAI stands for the Open Archives Initiative - an initiative that develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. In the context of the JISC IE architecture, what you are probably interested in is the OAI Protocol for Metadata Harvesting. If you want to know more about the OAI, read their FAQ at <http://www.openarchives.org/documents/FAQ.html>

OK, what is the OAI Protocol for Metadata Harvesting?

The OAI Protocol for Metadata Harvesting (OAI-PMH) is a simple protocol that supports the regular gathering of metadata records from one service by another. In other words, it enables the transfer of metadata from one system to another. The OAI-PMH is a simple protocol. Firstly, it is based on common underlying Web standards - HTTP, XML and XML schemas - which means that it is fairly easy to implement if you are already running a Web server. Secondly, the number of operations supported by the protocol is very small. In essense, the OAI-PMH only allows one service to ask another service for a copy of all its metadata records, or for some of its metadata records, where some is defined in terms of a named sub-set (known in OAI as a set) or in terms of those records modified during a particular time period.

In the terminology used by the OAI-PMH, a data provider makes data available for gathering and a service provider gathers that metadata and makes it available for searching. In terms of the client-server model, the data provider is a server and the service provider is a client.

What is an OAI repository?

An OAI repository is a database that supports the OAI-PMH and that is therefore an OAI data provider. A repository may store both full-text and metadata (e.g. an e-print archive) or only metadata (e.g. a subject gateway catalogue).

What kind of metadata can be shared using the OAI-PMH?

The OAI-PMH can be used to exchange any kind of metadata provided that it can be encoded as XML and provided that you have an XML schema that defines the way in which that encoding is done.

So, the OAI-PMH isn't limited to exchanging Dublin Core metadata?

Correct. You can exchange any metadata you like provided it is based on XML (see above). So, for example, you can use the OAI to exchange Dublin Core (DC) metadata, IMS metadata (IEEE LOM), XrML or ODRL rights statements, etc.

The OAI-PMH does mandate that you expose your metadata as simple (i.e. unqualified) DC as well as in its native format. So, for example, supposing you have a repository of learning resources, each of which has some associated IMS metadata. You might use the OAI-PMH to allow people to harvest your IMS metadata from the repository (assuming that you have a suitable IMS metadata XML schema). However, to be compliant with the OAI-PMH you must also provide an unqualified Dublin Core view of each of your metadata records. How you do this is up to you. You may choose to convert your IMS metadata records to DC on the fly as part of the software that delivers your OAI repository. Alternatively, you may choose to convert all your records once, store them in two separate formats within your back-end database, then make both formats available within the OAI-PMH.

The reason for mandating the use of unqualified DC is that it provides a base level of interoperability between services, even if they know nothing about the native metadata format used by the other service.

Is the OAI-PMH limited to e-print archives?

No (see above). The OAI-PMH can be used to exchange many kinds of metadata between many kinds of services. The OAI-PMH has its roots in the e-prints archives community, but is certainly not limited to applications within that domain.

Is the OAI-PMH a search protocol?

No. The OAI-PMH provides no mechanisms for sending a query to a repostory. A repository may choose to implement sets (i.e. partition itself into sub-sets) based on 'saved searches' of a back-end database in some way. But the saved-searches are not exposed through the protocol in any way, nor is there any way within the protocol to request gathering of arbitrary 'sets'.

What is an OAI aggregator?

An OAI aggregator is both a service provider and a data provider. It is a service that gathers metadata records from multiple data providers and then makes those records available for gathering by others using the OAI-PMH.

If I use the OAI-PMH, does that mean I have to make all my metadata freely available?

No, not necessarily. The 'open' in OAI doesn't mean freely available. Data providers can choose to restrict who can gather metadata records from them based on the IP address of the service provider, or on more complex mechanisms such as HTTP Basic Authentication or SSL.

If I do make my metadata available using the OAI-PMH, aren't I going to lose lots of traffic to my Web site?

By exposing your metadata records for gathering by other services, you are allowing people to find your content without the need to visit your Web site and use your search engine. This may result in less hits on your Web site home page. However, your metadata records will typically contain the URLs of the resources held on your site. Therefore, supporting the OAI-PMH may actually result in more hits on your site - but with people going direct to your resources, rather than via your home page.

Remember that you can choose to limit how much information you expose using the OAI-PMH. For example, you may choose to expose only a limited simple DC metadata record using the OAI-PMH, forcing people to visit your site if they want to see the full metadata record.

Why can't I simply make my content available to Google and let people find my stuff that way?

You can, and in many cases this will be a perfectly appropriate thing to do. This is particularly true for freely available full text resources.

However, in some cases, for example where most of your resources are not text-based, exposing them to Google may not help much. In other cases, you may not want (or be allowed) to make the content freely available (to end-users or search engines!). In these situations, OAI may be more appropriate. By making your metadata freely available using the OAI-PMH, you can allow people to discover your resources.

In many cases, exposing your metadata using OAI and exposing your full-text resources for indexing by Google may be the most effective thing to do. The DP9 software package from Old Dominion University is interesting in this respect. It provides an OAI gateway service that can be used to expose the contents of OAI repositories to Web crawlers.

Where can I get OAI-PMH software?

The OAI maintains a list of OAI software tools.

If you are setting up an e-print archive then consider using the University of Southampton's eprints.org software package.

Where can I see OAI in action?

Again, the OAI maintains a list of registered data providers (though you should remember that there will be lots of other non-registered data providers as well). There are also a smaller number of registered service providers (with the same proviso as above).

The RDN uses the OAI-PMH internally to share metadata records between the RDN gateways, as described in An OAI Approach to Sharing Subject Gateway Content (a poster paper given at the 10th WWW conference in Hong Kong).