DCMI

W3C XML Schemas - Issues

UKOLN

Introduction

DCMI provides two sets of W3C XML Schemas [DCXMLS] to support the use of the XML encoding described in the Guidelines for implementing Dublin Core in XML [DC-XML].

A number of issues have arisen as implementers have explored the use of these schemas in their XML applications:

In addition, work on the DCMI Abstract Model [DCMIAM] has highlighted some limitations of the encoding conventions described in the Guidelines for implementing Dublin Core in XML [DC-XML].

The dc:SimpleLiteral complexType

In the second set of schemas, the dc.xsd schema defines a complex type called dc:SimpleLiteral, which is defined in terms of mixed complex content. However, the cardinality attributes on the xs:any element dictate that this complex type does not permit child elements. In the dcterms.xsd schema, encoding schemes are represented as named complex types derived from the dc:SimpleLiteral type. This derivation of a complex type with simple content by restriction of a base complex type with complex content appears to be valid under the XML Schema specification, but it does not map easily into relational and object database structures and some processors reject it as an error [DCXMLSNOTES]. This leads to regular questions from implementers.

Question: Should we consider replacing the dc:SimpleLiteral type with one based on simple content?

Implications: The schema would be usable in a wider range of tools, but it would not support the derivation of complex content types that do permit child elements, and would not be usable by implementers who have derived such types using the current schema.

Versioning

The XML schemas support a quite different function from the RDFS descriptions of the DCMI Namespaces. Whereas the latter provide information about the DCMI terms and the relationships between them, the former supports the structural validation of XML documents. In fact the title of the page that describes the DCMI XML Schemas [DCXMLS], "DCMI term declarations represented in XML schema language", is rather misleading in this regard: the XML Schemas declare XML elements and define content models for those XML elements, but these are not the same thing as "DCMI terms".

A property can be referenced in the predicate of an RDF triple even if no RDFS description of that property is available; the introduction of a new RDFS property description may enable an application to infer additional information, but it does not change the "validity" of existing RDF data. However the addition of a new XML element declaration to an XML schema may change the results of XML schema validation: an XML instance which was invalid because it included an element name which was not declared in the schema may become valid when a declaration is added.

Currently, versions of the DCMI XML schemas are made available at distinct URIs (constructed using the date of publication), e.g. http://dublincore.org/schemas/xmls/qdc/2003/04/02/dc.xsd

The consequence of this is that an implementer's XML schema must reference a specific version of the DCMI XML schema, and if a new version of the DCMI XML schema is released, those references need to be updated to target the new version. This means that an implementer can control exactly which schema they are validating against, and rely on the fact that the content of that schema will not change.

The dcterms.xsd schema contains references to the other two schemas - it imports both dc.xsd and dcmitype.xsd. Those references are currently made using relative URI references. However, the schemas may version independently of each other (e.g. the addition of a new term to the DC Terms vocabulary may occasion the addition of an XML element declaration to dcterms.xsd while dc.xsd remains unchanged). At present new versions of all three files must be created when any one of them changes, which introduces some redundancy. References to the schemas within the schemas themselves should be to specific versions, using absolute URIs.

Proposal: For each of the three XML Namespaces, DCMI should make each version of the XML schema available at a "date-stamped" URI (as at present). Within these schema, the references to other DC schemas must be made by absolute URI references to "date-stamped" versions. Applications which make use of the specific versions must avoid introducing contradictions by referring to other versions of these referenced schemas.

Impact on applications: None. This is the current position, with the exception that redundant copies of unchanged schemas are not created.

The requirement to reference a specific version may be appropriate for some applications. However other implementers have indicated that they wish to refer to the schemas in a fashion which guarantees they always access the most recent version of the schema, without having to update their references.

Proposal: For each of the three XML Namespaces, DCMI should also make the latest version available at a fixed "non-date-stamped" URI. Applications which make use of the latest version must handle the fact that new element declarations and complex type definitions may be added at any time.

Impact on applications: Implementers have the choice of referencing a specific version which is fixed or referencing the latest version.

Container elements and XML Namespaces

For both sets of XML schemas provided by DCMI, the XML schemas provide a set of XML element declarations that correspond to the properties of the DCMI Namespaces. XML instance documents conforming to the schemas deploy the XML elements within a "container" XML element. The Guidelines for implementing Dublin Core in XML [DC-XML] do not specify either the local name or the XML namespace name for that container XML element: they allow an implementer to choose.

For the "oai_dc" case, that XML element is declared within an XML schema provided by OAI, and the target XML namespace of that schema (the XML namespace associated with the container element) is defined by OAI.

For the "qualified DC" set of schemas, DCMI does provide some example "container schemas" that define container XML elements. However the use of these schemas is optional and implementers are free to define their own.

Further the supplied container schemas do not declare a target XML namespace: they can be used by applications, but those applications must provide an XML Namespace (by including the DCMI-provided schema in an application-specific schema that does have a target namespace).

Some implementers have requested that DCMI provide a declaration for one or more namespace-qualified container elements, so that all applications use the same XML element rather than declaring different ones.

"Simple DC" and "Qualified DC"

The Guidelines for implementing Dublin Core in XML [DC-XML] provides models for two "application profiles" of Dublin Core:

  • "Simple DC" where a property must be one of the 15 DCMES elements, and a value can not be associated with an encoding scheme
  • "Qualified DC" where a property may be one of the 15 DCMES elements, or an element or element refinement from the DC Terms vocabulary; and a value can be associated with an encoding scheme from the DC Terms vocabulary

The example "container schemas" provided define corresponding restrictions on the XML structure for these two profiles.

However, there has been some debate about the usefulness of defining "Qualified DC" in a manner which restricts it only to the terms from the DCMI vocabularies. It has been argued that in practice implementers rarely limit themselves to the use of only those terms defined by the DCMI vocabularies, and more often than not combine those terms with application-specific terms, and therefore an XML schema that limits the content of the container element to child XML elements from DCMI XML Namespaces is not useful in practice.

The DCMI Abstract Model adopts this more "open" notion of a Qualified DC description as a description that "conforms to the DCMI abstract model, and contains at least one property taken from the DCMI Metadata Terms recommendation" [DCMIAM].

Proposal: Create schemas that declare namespace-qualified XML container elements (simpledc and qualifieddc? simpledcDescription and qualifieddcDescription?) with content models appropriate for a simple DC description and a qualified DC description, as defined by the Abstract Model.

Impact on applications: Implementers who have declared their own container elements are unaffected, but may wish to migrate to using the new container elements. New implementers should use the new container elements.

Question: How do we enforce the "conformance to the model"? Is there a risk that this "open" model for "qualified DC" becomes a mechanism for wrapping arbitrary XML structures and labelling it as "qualified DC"?

Question: What XML Namespace should be used for these container XML elements? These are XML elements, not DC "terms" in the sense defined by the DCMI Namespace Policy document. [DCMINS].

The Guidelines for implementing Dublin Core in XML [DC-XML] document currently covers the encoding only of a description of a single resource. In practice the units of metadata exchanged between applications typically cover the description of several related resources and the DCMI Abstract Model introduces the idea of a "description set".

Question: Should we also provide an XML element that acts as a container for multiple descriptions (i.e. to support the structure below)?

<?xml version="1.0"?>

<dcxml:descriptionSet
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://example.org/dcxml/ http://dublincore.org/schemas/xmls/containers.xsd"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:dcterms="http://purl.org/dc/terms/"
  xmlns:dcxml="http://example.org/dcxml/"
>

  <dcxml:qualifieddcDescription>
    <dc:identifier>
      http://www.ukoln.ac.uk/metadata/dcmi/xml-issues/
    </dc:identifier>
    <dc:title>
      Report on XML Schema Issues
    </dc:title>
    <dc:description>
      This document describes issues raised with the DCMI XML Schemas.
    </dc:description>
    <dcterms:hasVersion>
      http://www.ukoln.ac.uk/metadata/dcmi/xml-issues/2004-09-12/
    </dcterms:hasVersion>
  <dcxml:qualifieddcDescription>

  <dcxml:qualifieddcDescription>
    <dc:identifier>
      http://www.ukoln.ac.uk/metadata/dcmi/xml-issues/2004-09-12/
    </dc:identifier>
    <dc:title>
      Report on XML Schema Issues (2004-09-12)
    </dc:title>
    <dc:description>
      This document describes issues raised with the DCMI XML Schemas.
    </dc:description>
    <dcterms:modified xsi:type="dcterms:W3CDTF">
      2004-09-12
    </dcterms:modified>
    <dcterms:isVersionOf>
      http://www.ukoln.ac.uk/metadata/dcmi/xml-issues/
    </dcterms:isVersionOf>
  <dcxml:qualifieddcDescription>

</dcxml:descriptionSet>

The Abstract Model and the Guidelines for implementing Dublin Core in XML

Appendix 3 of the DCMI Abstract Model [DC-XML] highlights that the encoding conventions described by the Guidelines for implementing Dublin Core in XML are based on a simple model for a DC metadata description, and that that model does not capture all aspects of a DC metadat description.

In particular, the following aspects of the DCMI abstract model are not supported:

All the values that are encoded in this syntax are represented by value strings, even those that look, to the human reader, as though they are URIs. Neither resource URIs nor value URIs can be explicitly encoded in the XML encoding syntax.

Question: Should the Guidelines for implementing Dublin Core in XML be revised/extended to describe an encoding that captures more features of the DCMI Abstract Model?

References

[DCMES]
Dublin Core Metadata Element Set, Version 1.1: Reference Description
http://dublincore.org/documents/dces/

[DCMIAM]
DCMI Abstract Model
http://www.ukoln.ac.uk/metadata/dcmi/abstract-model/

[DCMINS]
Namespace Policy for the Dublin Core Metadata Initiative (DCMI)
http://dublincore.org/documents/dcmi-namespace/

[DCTERMS]
DCMI Metadata Terms
http://dublincore.org/documents/dcmi-terms/

[DCXMLS]
DCMI term declarations represented in XML schema language
http://dublincore.org/schemas/xmls/

[DCXMLNOTES]
Notes on the W3C XML Schemas for Qualified Dublin Core
http://dublincore.org/schemas/xmls/qdc/2003/04/02/notes/

[DC-XML]
Guidelines for implementing Dublin Core in XML
http://dublincore.org/documents/dc-xml-guidelines/

[OAI-PMH]
The Open Archives Initiative Protocol for Metadata Harvesting, Version 2.0
http://www.openarchives.org/OAI/openarchivesprotocol.html


Content by Pete Johnston of UKOLN.
Last updated on: 30-Sep-2004

[DCMI] [Metadata] [UKOLN]