DESIRE Registry

Data Model

Overview

One of the advanced functions of a metadata registry is to provide mappings between different metadata vocabularies.

This can be achieved via hard-coded mapping tables detailing the relationships between the elements in a source vocabulary and those in a target vocabulary. This approach has a high development and maintenance cost due to the large number of mappings that must be developed to provide full coverage.

The DESIRE registry takes an alternative approach. Instead of mapping between every pair of vocabularies, every vocabulary is mapped onto an underlying semantic layer. The aim is that instead of mapping from vocabulary A to vocabulary B, we map from A onto the underlying semantic layer, and then back on to B. The result is that instead of having to create mappings between every pair of vocabularies, it is only necessary to map between each vocabulary and the semantic layer. This means that when introducing a new vocabulary into a registry with 20 vocabularies, it is only necessary to add a single mapping (between the new vocabulary and the underlying semantic layer) rather than 20 mappings, to support translation between the new vocabulary and those already registered.






This approach does have potential disadvantages. Since mappings are not hand-crafted there is a potential for a reduced quality level in auto-generated mappings. To counteract this it will be necessary to build up a detailed and complex semantic layer so that vocabulary elements can be precisely explained. If the semantic layer is not detailed enough then the translation will suffer from information loss.

The DESIRE registry is a pilot application to trial this approach and provide a platform for future research in this area.

Namespaces and Versions

A namespace is a scoping construct that supports the definition of unique identifiers. Identifier x in namespace A is distinct from identifier x in namespace B.

Namespaces can be registered in the DESIRE registry, and registered elements can be assigned to a namespace where appropriate. For example, the fifteen Dublin Core elements can be registered as belonging to a Dublin Core namespace. The data elements associated with a particular namespace form a metadata vocabulary.

Since metadata vocabularies evolve over time, it is necessary to provide a mechanism for recording this with the registry. To support this, namespaces have an associated version so Dublin Core version 1.0 and Dublin Core version 1.1 can be registered. An underlying `Namespace Concept' is also registered to tie the versions together, in this case Dublin Core is registered as a namespace concept.

In the initial version of the registry, versioning is supported only at the namespace level. Individual elements cannot be versioned.

Semantic Layer

The purpose of the semantic layer is to provide an underlying set of concepts onto which registered vocabularies can be mapped - for example, notions of author and abstract will be needed. The registered concepts must be precisely defined and unique. There should never be a situation where multiple registered concepts have the same semantics. If concepts are not unique then the quality of mappings between vocabularies will be reduced: elements may have the same semantics, but if they map to different registered concepts then this information will not be available for use in auto-generation of mappings. The semantic layer must therefore be managed, or be based on a managed namespace with limited scope for extension.

The Basic Semantics Register provides a set of elements suitable for use in the DESIRE registry. Mappings already exist between BSR and the Dublin Core and GILS metadata vocabularies. For the initial prototype DESIRE registry, BSR elements corresponding to the Dublin Core have been registered.

Note that the BSR only provides data for the DESIRE Registry, the registry is not limited to BSR for its semantic layer. The data model allows concepts from other namespaces to be registered. However, it should be emphasised that the concepts must come from a managed namespace. If BSR is found to be appropriate for this purpose then any concepts that need to be added to support registry functionality should either be added to the BSR standard or managed as a namespace extension.

The data model of the BSR also introduces a layer of `Basic Semantic Units' between vocabulary elements and concepts. This layer allows a `representation class' to be attached to a concept to further refine its description. A representation class describes the data type associated with the value space of a concept - Name, Text and Code are examples of representation classes.

For some concepts, there is only one appropriate representation class. In this case the BSR introduces only a BSU (since there is a 1-1 mapping between concepts and BSUs in such cases, the introduction of a separate concept is superfluous). In other cases, multiple representation classes are possible - for example a subject classification may be expressed as Text or as a Code.

The DESIRE registry currently has only a single semantic layer which combines concepts and BSU from BSR. This approach offers a reduction in complexity and has provided sufficient modelling power for to meet the requirements of the DESIRE registry.

Data Elements

Data elements are the units from which metadata vocabularies are built. A metadata vocabulary consists of all the data elements in a particular namespace, or all the elements associated with a particular application profile. For example, Dublin Core version 1.1 can be registered as a namespace with the fifteen DC elements as data elements of that namespace.

A data element is a realization of a BSU in a specific context.

Application Profiles

Application profiles describe data element usage for a particular application. The application may be a specific project, a piece of software, an interchange format, etc.

Application profiles cannot introduce new data elements, data elements must have an associated namespace. Application profiles can group together data elements from multiple vocabularies. An application profile can also associate a scheme with a data element to specify valid values for that data element in a specific application.

Schemes

Schemes provide a mechanism for attaching information about valid values for a particular data element.

There are three kinds of scheme possible in the DESIRE registry:

  1. Enumerated List - The scheme specifies a set of valid values - scheme elements. Scheme elements may be registered within the registry, or they may be indicated via a reference to an external definition.

  2. Rule Set - The scheme is specified by a set of rules that define or describe valid values. The rule set is indicated via a reference to an external definition. The semantics of rule sets cannot be captured in any way within the registry at present.

  3. Value Components - The scheme splits a value domain into multiple value components. A valid value is then made up of a tuple of valid values from the value components. Note that it is the tuple that is a valid value - not each of the values associated with value components.

Recommended schemes may be associated with data elements. Additionally, schemes may be associated with application profiles to reflect actual usage (strictly, a relationship can be introduced between an application profile, a data element, and a scheme).

Where multiple schemes are permitted for a data element in a specific application it must be possible to specify the scheme that has been used along with a particular value.

Where an element is repeated within a record with the same value component scheme, it must be possible to distinguish the value components associated a particular element from the value components associated with another. That is, there must be some grouping mechanism that combines value components into a tuple which is the value of the element.

Examples

Qualified Dublin Core

BIBLINK Application Profile

DC 1.0 / ROADS Document Cross-walk