Contents | 1 | 2 | 3 | 4 | 5 | 6 | Previous | Next


5. XML Schemas and Support for Multiple Record Formats in OAI-PMH

Contents of this part of OAI for Beginners, the Open Archives Forum online tutorial


Basics of XML schemas for OAI-PMH Top

OAI-PMH uses XML Schemas to define record formats. You can exchange any metadata you like using OAI-PMH as long as you can encode it as XML and define an XML Schema for it. OAI-PMH mandates the oai_dc schema as a minimum standard for interoperability.

OAI-PMH documentation also describes the use of XML schema for other formats, and provides additional XML schemas for:


Closer look at oai_dc, the mandated XML schema for OAI-PMH Top

oai_dc is the simple metadata schema (based on unqualified Dublin Core) used as the mandatory ?Lowest Common Denominator? metadata record format in OAI-PMH. It defines a container schema that is OAI-specific, and is hosted on the OAI Web site. It imports a generic DCMES (DC Metadata Element Set) schema. The generic DCMES schema is hosted on the DCMI (Dublin Core Metadata Initiative) Web site.

The same model could be used for a qualified Dublin Core schema; that is, a container schema hosted by OAI and referencing the generic schema hosted by DCMI.

oai_dc – an example from a record

This is an example oai_dc record, as viewed via the Repository Explorer, showing the beginning of a full GetRecord response.

<?xml version="1.0" encoding="UTF-8"?>
 <OAI-PH xmlns="http://www.openarchives.org/OAI/2.0/"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
          http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
   <responseDate>2003-03-15T16:16:51+01:00</responseDate>
   <request verb="GetRecord" metadataPrefix="oai_dc" identifier="oai:HUBerlin.de:3000476">http://edoc.hu-berlin.de/OAI-2.0</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:HUBerlin.de:3000476</identifier>
        <datestamp>1997-07-18</datestamp>
        <setSpec>pub-type</setSpec>
</header>
      <metadata>
        <oai_dc:dc
            xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
            http://www.openarchives.org/OAI/2.0/oai_dc.xsd">

          <dc:title>Melanchthon in seiner Zeit. In: Philipp Melanchthon 1497-1997</dc:title>
          <dc:creator>Selge, Kurt-Victor</dc:creator>
...

Three important things to notice picked out above:

The namespace for the oai_dc format:
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"

The namespace for DCMES elements:
xmlns:dc="http://purl.org/dc/elements/1.1/"

The container schema associated with the oai_dc namespace:
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
     http://www.openarchives.org/OAI/2.0/oai_dc.xsd"

Thus, the oai_dc container schema for the http://www.openarchives.org/OAI/2.0/oai_dc/ namespace imports the DCMES schema from http://dublincore.org/schemas/xmls/simpledc20021212.xsd. It also defines a container element called 'dc' that lists the elements within the 'dc' container (from the DCMES namespace / schema) that are allowed in oai_dc.


Other metadata schemas may be used Top

oai_dc is a simple format providing baseline interoperability. There are a number of reasons why it may not be suitable for your repository, service or community to share only oai_dc.


Adding new elements when oai_dc is not enough Top

Creating a new schema by extending the oai_dc schema to add new elements involves the following tasks:

  1. Create a name for the new schema
  2. Create namespaces
  3. Create the schema for the new elements
  4. Create a 'container schema'
  5. Validate your schema / records
  6. Add to your repository?s "ListMetadataFormats"
  7. Add to your repository?s other verbs
  8. Test it worked and is valid

Next, we'll use a simple scenario to demonstrate these eight tasks step-by-step. Suppose we have a test repository containing some photos:

http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/oai/nph-oai2.cgi

Currently the repository has metadata using oai_dc. We want to add an "Equipment Used" element, as this is not part of the DCMES.


Step 1: Name your format

The new metadata format needs a name. In this case, we'll choose the name "wp_dc" - following OAI's naming of "oai_dc" as a convention. (The two-letter code, 'wp', is short for 'workshop photos'.) However, the name could be anything you like. In this case alternative possibilities would be, for example, wpdc or WP


Step 2: Create Namespaces

We need two namespaces:

  1. a namespace for the new format (wp_dc) that mixes both standard DC elements and any new ones
  2. a namespace for the new wp_dc metadata element (the property "Equipment Used") that we will use in this format

Namespaces are declared as URIs. We will use:

Note that the use of PURL for the elements namespace follows DCMI usage, but is not mandatory. However, both these namespace URIs should be under your control to ensure uniqueness and prevent re-use in the future. Namespace URIs do not need to resolve to anything.


Step 3: New terms schema

Next, we must create an XML schema for the new term. We will do this at:

http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/wp_dc/20030818/wpterms.xsd

Notice the datestamp built in to the directory structure. This makes it easier to enhance the schema without breaking things that use the old one.

The schema for the new term defines the new element "equipmentUsed" and adds it to the dc:any group. It also defines a new container type "wpterms:elementContainer".


Step 4: Container Schema

We must also create a container schema for the wp_dc record format. In this case the schema is available at:

http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/wp_dc/20030818/wp_dc.xsd

(Note again the use of a date stamp incorporated in the directory structure.) This simply imports the wpterms schema and then defines a container element 'wp_dc' of type wpterms:elementContainer.


Step 5: Validate

In order to validate the records using our new schema, we next create some test records (or modify our existing ones) including all the elements we want to use. For ease of managing our validation process, we put these in a datestamped directory and use a meaningful file naming convention, such as

http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/oai/wp_dc/20030818/test.xml

Now we can validate the records and schema with the XML schema validator at

http://www.w3.org/2001/03/webdata/xsv/


Step 6: ListMetadataFormats

The OAI-PMH verb ListMetadataFormats needs an awareness of the new format. Therefore, we need to modify our repository software (source code and/or configuration files) to support the new metadata format. We do this by adding information about the new format to our repository's response to the 'ListMetadataFormats' request.For example:

...
<metadataFormat>
  <metadataPrefix>wp_dc</metadataPrefix>
  <schema>http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/wp_dc/20030818/wp_dc.xsd</schema>
  <metadataNamespace>http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/wp_dc/</metadataNamespace>
</metadataFormat>
...


Step 7: Other Verbs

We also need to ensure that the "wp_dc" format is available using the:

verbs. To do this, we must modify our repository's response to these verbs. Accept "MetadataPrefix" must be set to the new format name "wp_dc". Responses to requests will then return the appropriate records formatted according to the new schema when that is requested by a Service Provider.


Step 8: Testing - validate again

Finally, we use the Repository Explorer to test the new format. To do so, enter the following URL to the OAI interface of the repository

http://www.ukoln.ac.uk/metadata/oa-forum/workshop-photos/oai/nph-oai2.cgi

We must test to ensure that:

Once all these conditions are met, we have a new format!


Summary - extending a format


When you want to use another metadata format Top

You can take a similar approach with other metadata record formats. In the case of IMS/IEEE LOM and ODRL, XML schemas and namespaces have already been agreed. Deployment of these formats should be easier because you don't need to define you own schemas. However, XML schema specs are continually undergoing revisions at the time of preparing this tutorial, so sometimes it is difficult for applications like IMS to keep up with the changes.


Implementing an existing format Top

To implement an existing metadata format, modify the ?ListMetadataFormats? response to include the format you wish to support. For example, for IMS:

...
<metadataFormat>
  <metadataPrefix>ims</metadataPrefix>
  <schema>http://www.imsglobal.org/xsd/imsmd_v1p2p2.xsd</schema>
  <metadataNamespace>http://www.imsglobal.org/xsd/imsmd_v1p2</metadataNamespace>
</metadataFormat>
...

Extend the other verbs (ListSets, ListIdentifiers, ListRecords, and GetRecord requests) to accept the 'metadataPrefix' set to 'ims' and return records formatted appropriately.


Summary Top

OAI-PMH allows for any metadata format, so long as it is encoded in XML with an XML Schema. All repositories must support oai_dc for a minimum level of interoperability. If oai_dc does not have enough elements, you can extend it. If oai_dc is not precise enough, a qualified Dublin Core schema can be used. If oai_dc is not the right schema for your community or purpose, then use something else as well.


Seven key definitions Top

PURL
A PURL is a Persistent Uniform Resource Locator. Functionally a PURL is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL resolution service associates the PURL with the actual URL and returns that URL to the client. The client can then complete the URL transaction in the normal fashion. In Web parlance, this is a standard HTTP redirect.
(Definition quoted from PURL at http://purl.org)

URI
URI is the acronym for Universal Resource Identifier. URIs are strings that identify things on the Web. URIs are sometimes informally called URLs (Uniform Resource Locators), although URLs are more limited than URIs. URIs are used in a number of schemes, including the HTTP and FTP URI schemes.

XML namespace
An XML namespace is a collection of names, identified by a URI reference [RFC2396], which are used in XML documents as element types and attribute names. XML namespaces differ from the "namespaces" conventionally used in computing disciplines in that the XML version has internal structure and is not, mathematically speaking, a set.
(Definition quoted from W3C—Namespaces in XML at http://www.w3.org/TR/REC-xml-names/)

XML schemas
XML Schemas express shared vocabularies and allow machines to carry out rules made by people. They provide a means for defining the structure, content and semantics of XML documents.
(Definition quoted from W3C Architecture Domain—XML schema at http://www.w3.org/XML/Schema)

container
Containers are places in OAI-PMH responses where XML complying with any external schema may be supplied. Containers are provided for extensibility and for community specific enhancements. The OAI Implementation Guidelines lists the existing optional containers and provides links to existing schemas.

DCMI (Dublin Core Metadata Initiative)
The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. DCMI's activities include consensus-driven working groups, global workshops, conferences, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices.
(Definition quoted from Dublin Core Metadata Initiative at http://dublincore.org/)

DCMES (Dublin Core Metadata Element Set)
The Dublin Core metadata element set is a standard for cross-domain information resource description. Here an information resource is defined to be "anything that has identity". This is the definition used in Internet RFC 2396, "Uniform Resource Identifiers (URI): Generic Syntax", by Tim Berners-Lee et al. There are no fundamental restrictions to the types of resources to which Dublin Core metadata can be assigned.
(Definition quoted from Dublin Core Metadata Initiative—Dublin Core Metadata Element Set, Version 1.1: Reference Description at http://dublincore.org/documents/dces/)


Sources of further information Top

Dublin Core ­ official site
http://dublincore.org/

DCMI term declarations represented in XML schema language
http://dublincore.org/schemas/xmls/

Guidelines for implementing Dublin Core in XML
http://dublincore.org/documents/dc-xml-guidelines/

W3 Schools XML tutorials include, among others, the following:

W3 Schools XML tutorial
http://www.w3schools.com/xml/

W3 Schools XML Schema Tutorial
http://www.w3schools.com/schema/

OAI ­ official site
http://www.openarchives.org/

OAI-PMH protocol specification
http://www.openarchives.org/OAI/openarchivesprotocol.html

OAI general mailing list
http://www.openarchives.org/mailman/listinfo/OAI-general/

OAI implementers mailing list
http://www.openarchives.org/mailman/listinfo/OAI-implementers/


Contents | 1 | 2 | 3 | 4 | 5 | 6 | Previous | Next


Copyright © 2003 University of Bath. All rights reserved.
Author: Leona Carpenter (co-ordinating author) for OA-Forum and UKOLN
Last modified: 14 Oct 2003 16:36
Authored in CALnet