DC Usage Board

Issues arising from DCMI Abstract Model


This document summarises issues for discussion at the March 2004 meeting of the DC Usage Board that arise out of the DCMI Abstract Model working draft.

This is not a full list of issues arising from the AM - just those that appear to be of interest to the UB. Apart from the first two, all these issues are primarily for information at this stage - no immediate decisions or actions are required by the UB.

Relation of the AM to other "foundational" documents maintained by DCMI

There are a number of older DCMI documents that have overlaps with the DCMI Abstarct Model Working Draft. I don't think this matters too much, provided the terminology used across all the documents is the same and appropriate linkages are put in place between documents.

More drastic action could be taken, like removing some of the older documents. However, I don't think that would be justified at this stage.

Action: Align terminology at appropriate time.

Action: Add links from older DCMI documentation to the AM at appropriate time.

Wording of term defintions

This is an issue which affects how we define new DCMI terms. The AM states that the values of all DCMI properties are resources - people, organisations, concepts, places, etc.

"A DCMI metadata value is the physical or conceptual entity that is associated with a property when it is used to describe a resource. For example, the value of the DC Creator property is a person, organisation or service - a physical entitiy. The value of the DC Date property is a point in time - a conceptual entity. The value of the DC Coverage property may be a geographic region or country - a physical entity. The value of the DC Subject property may be a concept - a conceptual entity - or a physical object or person - a physical entity. Each of these entities is a resource. The value may be identified using a value URI; the value may be represented by one or more value strings and/or rich values; the value may have some related descriptions - but the value is a resource."

The wording of new DCMI property definitions needs to reflect this. For example, definitions should not use phrases like "A reference to ..." or "A URI for ...".

Some of the existing terms definitions are poor in this respect. However, it is probably not possible to change these definitions at this stage.

Special case of dcterms:URI

It is worth noting that 'dcterms:URI' is treated specially in our syntax encodings. It is used to indicate a value URI rather than a value string. You'll note that dcterms:URI is almost never used is the RDF/XML and new XHTML encodings, because these have alternative mechanisms for indicating that a value URI is being provided ('rdf:resource' and the XHTML 'link' element respectively).

Therfore, dcterms:URI is not a normal syntax encoding scheme.

Special case of dc:type

The 'dc:type' property is again a slightly special case w.r.t the AM because it is used to indicate the class of the resource being provided.

One might expect little use to be made of 'dc:type' in RDF/XML encodings, because RDF has separate mechanisms for indicating the class of a resource.

Special case of dc:identifier

Similarily, in combination with 'dcterms:URI', the 'dc:identifier' element can be used to provide the URI of the resource being described.

Again, one would expect little use to be made of 'dc:identifier' in the RDF/XML encoding because RDF provides a separate mechanism for indicating the URI of a resource. The exception to this is where the identifier being provided cannot be encoded as a URI.

Element refinement and resource classes

This is very much an issue on the distant horizon... but if DCMI moves to a position where resources and values are more strongly typed, for example if we start making more use of rdfs:domain and rdfs:range in our RDFS term declarations (this is a big if!), then we will have to consider what impact this has on our notion of element refinement. For example, I assume that it will be the case that valid element refinement will only occur when the domain and range of the element refinement are the same as or narrower than the domain and range of the element being refined.

Vocabulary terms as URIs

When a vocabulary term gets a URI assigned to it, it changes from being a value string to being a value URI and the encoding needs to change to reflect that. So for example in XML, instead of

<dc:subject xsi:type="dcterms:DDC">Internet</dc:subject>

or

<dc:subject xsi:type="dcterms:DDC">004.678</dc:subject>

depending on your preference for numbers or words, the encoding would change to something more like

<dc:subject xsi:type="dcterms:URI">info:ddc/22/eng//004.678</dc:subject>

'DDC' is no longer required as a DCMI encoding scheme because the DDCness of the value is indicated by the URI. Therefore, there is no requirement to 'register' DDC with DCMI.

DCMI needs to consider whether it is better to encourage owners of vocabularies to move towards the use of URIs for their terms, or continue to encourage the registration of new schemes with DCMI or to adopt a mixed approach for the time being.

Vocabulary Encoding Schemes and Syntax Encoding Schemes

The AM clearly distinguishes between Vocabulary Encoding Schemes and Synatx Encoding Schemes, because they are fundamentally different: one provides classes/types of value resources, the other deals with the format/interpretation of literals. It would appear that these map exactly onto the distinction between Class and Datatype in RDF?

DCMI does currently distinguish these things, but I wonder if that distinction is made forcefully enough in some of our documentation. It is perhaps worth noting, that the lable 'Vocabulary Encoding Scheme' now looks slightly unfortunate, given that this is now seen as a mechanism for indicating the class of the value - however, it may not be sensible or possible to change our terminology at this point.

Descriptions, records and schemas/application profiles

Rachel has pointed out the the AM, as current presented, doesn't clearly indicate the relationships between descriptions/records and the schemas (or application profiles) that define those things.

Our current thinking, but this hasn't been agreed with all the AM authors, is to extend the model slightly, introducing the notion of a description set. A description set is a collection of related descriptions. We will replace record by description set in the current 'description' model. We will then add a new 'record' model which indicates that a record is an instantiation of a description set in a particular encoding syntax.

This has the advantage of clearly separating the conceptual parts of the model (the 'resource' model and the 'description' model') from the instantiated part of the model (the new 'record' model).

It is important to remember that there are two kinds of schemas - syntactic and semantic. A syntax schema will be associated with a record and will define how the syntax is being used. The most common examples of syntax schemas are those using the XML schema language. A semantic schema defines what classes of resource are being described, which terms are being used and what their semantics are. The most common examples of semantic schemas are those using the RDF Schema and OWL languages. Semantic schemas are not necessarily tightly bound to anything in the AM - for example, a semantic schema will commonly be used to declare all the terms in a particular namespace.

In this view, an application profile is a special kind of semantic schema that is associated with a description set. The important thing to note is that application profiles don't need to say anything about syntax, but do need to cover cases where multiple resources of different classes are being described (e.g. a document and its author).

Finally, it is worth noting that, although syntax schemas are currently bound to records in a formal way (for example, using xml:schemaLocation), there is no direct linkage between an application profile or semantic schema and a description set. I.e. there is no 'hard line' between these things and any other entities in the AM.


Andy Powell, UKOLN, University of Bath
March 2004