Application profiles: interoperable friend or foe?

Rachel Heery

UKOLN, University of Bath, April 2002

[delivered by Michael Day] at: The European Library (TEL) - The Gate to Europe's Knowledge: Milestone Conference, Die Deutsche Bibliothek Frankfurt am Main, Germany.

Abstract

Metadata application profiles provide a simple model to articulate local adaptations of standard schemas and to specify encoding schemes in use in particular implementations. Declaring and sharing such profiles is a positive step towards interoperability enabling the re-use of existing schemas, and the identification of new terms. The SCHEMAS registry provides an overview of application profiles alongside standard schemas. In this presentation we will consider the "extra effort" that is required to ensure that declaration of application profiles contributes the interoperability agenda. How might we deal with new metadata terms to avoid duplication within multiple schemas, where are the means to capture emergent semantics? How can standards making bodies work together to harmonise metadata usage? What is our role as implementers and information professionals?

TEL is concerned with sharing metadata across a number of organisations. In my presentation to-day I would like to consider the what is involved in sharing metadata, in particular sharing metadata schema, and think a little about the processes involved.

Terminology

First we will try to establish shared understanding of terms, although it has to be acknowledged that we are working in area where definitions are evolving, and inevitably the same terms have different nuances for different communities. Certainly within the SCHEMAS project we have been examining our use of terms such as "schema", "vocabulary", and "namespace" and have found that agreeing definitions is an important step in understanding the somewhat confusing landscape of new and emerging metadata standards. An extended glossary of key concepts is available on the SCHEMAS web site (1).

Schemas as a term is used in subtly different ways by different communities, and needs to be treated with care. Sometimes the term "schemas" is be used in an abstract way to indicate an element set or a vocabulary of terms. However that can be somewhat confusing as for many people a "schema" implies something more concrete. We prefer to use "element set" to denote a coherent, bounded set of metadata terms, whereas we use "schema" to refer to a structured expression of an element set in a particular syntax (typically the RDF or XML schema language).

Application profiles are a type of metadata schema. They can be defined as "schemas which consist of data elements drawn from one or more element sets, combined together by implementers, and optimised for a particular local application." The usage of "application profile" to capture the implementer's perspective on schemas grew out of UKOLN"s work on the DESIRE project (2), although the requirement to describe application specific characteristics of standards is not new, and is used in a somewhat related way by the Z39.50 community where "profiles" indicate application specific sub-sets of the full standard. Recently use of the term application profile has grown, as it has proved useful to typify the way implementers "mix and match" terms from different element sets.

Namespaces, we will try to avoid the use of "namespaces" in this presentation. In brief a namespace is a construct, typically expressed as a URI, that gives a metadata vocabulary term its unique identity. For more details see the SCHEMAS Glossary!

Increase in schema creation activity

TEL has undergone a quite rigorous consultation exercise in order to reach agreement on a common metadata element set that can be used as the basis for sharing metadata. In this TEL is facing the same problem as many other emerging information services providing access to distributed, diverse metadata sources, whether within the corporate, education or public sector. Parallels can be seen between development of the TEL application profile and similar activity associated with the development of application profiles for corporate portals, networks of subject gateways (e.g. the Renardus application profile (3)), and services from international organizations.

In fact there is a requirement to identify an appropriate element set wherever metadata is needed. Given the rapid development of new services, and the number of projects being funded to explore innovative approaches to provision of information, this means there is significant widespread activity involving formulating schemas. And we should note that this is often a time consuming, intensive activity undertaken by specialist information professionals comes at a price. There are obvious opportunities for cost saving and quality control by making this process of schema creation more efficient and effective.

Let us pause for a moment to remind ourselves of recent history. Surely we did not have to go through this labour intensive process of creating schemas before? formerly there was MARC (or MAB?) and systems were based on these standards. Why do we need to formulate schemas and application profiles?

However it is well to remember that MARC came in many flavours, varying in form across national boundaries, across different library management systems, and different levels of complexity in cataloguing practice. Endless cycles of MARC conversions remain deeply ingrained in some of our memories. In effect librarians, system vendors, software tool developers were formulating their own "application profiles" based on MARC. In particular we recall that local usage was accommodated in MARC by using the XX9 convention for tag labeling local fields.

But the pace has changed, we are now in an environment where metadata is pervasive for manipulating data across a whole range of Web based services, not just within what might be considered library cataloguing. Digital library services are positioned in an environment where they are

In this situation resource discovery metadata is no longer positioned only within the provenance of the librarian, there are a range of other interests. We are no longer in the position that the library profession has ownership of metadata element sets in use. Information professionals now playing a role alongside other players in a wider range of standards activity.

Proliferation of profiles

Sometimes it seems we are heading towards a proliferation of application profiles. Application profiles are being identified for individual projects, domain areas, and for use with different technologies. For example within the UK Metadata for Education Group many of the initiatives are using subsets of elements drawn from other metadata element sets (typically IMS, IEEE-LOM, Dublin Core); within the Dublin Core Metadata Initiative (DCMI) various domains such as education, libraries and government, are developing application profiles; and within the Open Archives initiative (OAI) particular schemas (such as DC Simple) are being associated with the OAI Protocol for Metadata Harvesting.

While accepting individual applications (or domain areas) require specialised elements, there are obvious gains to be made in sharing profiles where appropriate, and re-using existing elements as far as possible. To enable sharing of information about schemas there needs to be a:

There is a strong tradition of collaboration within the library world, it is something that librarians do well. The library world has its own established personal and organizational networks, and might take a stronger role in encouraging sharing of schemas. In order to achieve this there would need to be a more formal structure for sharing schemas, in particular automated tools for assisting in the creation and declaration (publishing) of schemas. Metadata schema registries are seen as the basis of the infrastructure for sharing schemas.

Role of schema registries

Schema registries arose from different historical strands. There are a number of agencies first established in the mid-1990's, that maintain directories of data elements. This activity arose from recognition of the benefits of sharing the data dictionaries on which large databases were modelled. ISO/IEC 11179 outlined good practice for data element definition and specified a hierarchical registration process. Notable implementations are the National Health Information Knowledgebase (4) hosted by the Australian Institute of Health and Welfare and the Environmental Data Registry (5)hosted by the US Environmental Protection Agency.

More recently, with XML becoming established as the standard of choice for business applications, the xml.org directory(6) has been hosted by OASIS as a repository of DTDs, thereby forming a "registry" for XML implementers.

Within the digital library world there a number of schema registry initiatives:

Balancing standardisation and differentiation

Implementers need to balance the tension between standardization and differentiation. While implementers are normally happy to re-use a "core set" of element terms, there is an imperative to establish differentiation from other services. Often this differentiation will encourage innovative or variant metadata. The information professional has a role in ensuring that unnecessary duplication between element sets is avoided. This will involve taking a critical view of the introduction within an application profile of any variance from established standard schemas.

The DCMI Usage Board is responsible for approval of new terms within the DCMI process. They offer advice to schema designers on good practice for assessing the requirements for a new term. This advice is encapsulated in the form of a decision tree, first outlined by Stuart Sutton based on experience within the DC Education WG. This decision tree illustrates that the DCMI recognizes that implementers should be cautious in "inventing" new terms, and where possible should refine existing terms or use terms from other existing metadata standards (non-DCMI namespaces).

Decision Tree Table (13)

Condition 1: Can the community of practice"s need be solved with a value qualifier (i.e., through a domain-specific vocabulary) for an existing DCMI element or element qualifier? If so, do that; else...

Condition 2: Can the community of practice"s need be solved through an application profile that references an element or element qualifier from an existing and recognized non-DCMI namespace? If so, do that; else ...

Condition 3: Can the community of practice"s need be solved with a new domain-specific qualifier for an existing DCMI element? If so, do that; else ...

Condition 4: Create a new domain-specific DCMI element (and, if necessary, element and value qualifiers) to meet community of practice"s need.

One can envisage further elaboration of such a "decision tree" to guide the implementer through this process. Enabling the process of schema creation to take place in an interactive way incorporating search mechanisms linked to existing schema registries would be of considerable benefit.

Ensuring schemas meet the requirements of the user

The creation of schemas must not be driven only by requirements of the latest technology. There is a need to explore how well metadata meets the needs of users. Few large scale studies have been undertaken on the effectiveness of metadata, such as the Dublin Core, to meet the requirements of the user. There are now a number of collections of "new-style" metadata in place. It is time to measure how well such metadata meets the aims of the user and the service providers.

The metadata schema underlying a service might be evaluated under criteria such as

Metadata needs to be evaluated from the perspective of the metadata creator as well as the end-user. In particular we need to consider the cost effectiveness of metadata based on simple schema, in contrast with more elaborate metadata based on richer schema.

In any particular implementation there needs to be an awareness of appropriate schema for the application area, based on user's requirements not on the requirements of the technology. The information professional has a role as the guardian of requirements of the stakeholders.

Ordered evolution of metadata vocabularies

Some control of the evolution of emerging metadata element sets is necessary to ensure there is an authoritative record of the schema, and there is an ordered procedure for the approval of new terms. However in order to prevent growth in costly and often ineffective bureaucracy it would be worthwhile exploring some automated means to assist with the evolution of metadata element sets. It is unlikely that many metadata standards initiatives will be able to sustain in the long term the enthusiasm, let alone the cost, required for groups of experts to maintain their vocabularies on a term by term basis.

As a first step, Vocabulary Management Systems will help assist with evolution of vocabularies by providing a manageable audit of changes to definitions and other term attributes. Over time there will also need to be ways of obtaining feedback from metadata creators regarding the introduction of new terms. This might be done by means of schema creators pro-actively submitting new terms to Registries or by means of the Registry itself analyzing harvested metadata to identify new terms in common use. The sampling of metadata collections to measure usage of terms might fit particularly well within systems where metadata is harvested and large scale metadata repositories established.

Recognition of variations to existing terms within element sets in an automated fashion will require the disciplined publishing of schemas. So once again we see value in establishing collaboration based on declaration of application profiles within registries.

Cross Standard Interoperability

As element sets evolve it is possible that standard makers will acknowledge overlapping semantics and reduce duplication. The promise of the Semantic Web is that common data models will enable integration of metadata, to allow merging of metadata element sets within a single resource description. Progress towards a common data model might encourage re-use of "core vocabularies" within more specialized element sets, as well as the automated mapping of elements with similar semantics.

There is some evidence that standards bodies do see the need to work together, in particular the DCMI has worked closely with initiatives such as the OAI, GILS and the IEEE LOM. A recent article within D-Lib Magazine outlines metadata "principles and practicalities" from the common perspective of the DCMI and IEEE LOM. (14)

The CORES project, an EC accompanying measure starting this month under the Semantic Web action line, aims to encourage such collaboration between standards activity. CORES will support standardisation bodies reaching consensus on conventions for declaring standards and profiles in RDF, and will encourage registration of standards and profiles to underpin this process.

In addition it is important for practitioners to have accessible information about the variety of standard element sets, and how they inter-relate. Indeed it is often the practitioner who sees the connection between standards, and who needs advice on the way they inter-relate (there is a recent example on the ZIG mailing list concerning the relation between the DC Library Application Profile and Z39.50). Given the significant effort and costs involved in reaching consensus on standard element sets it is vital that the end-product is well documented in a form suitable for widespread use. Information professionals have a role in creating this documentation, and in lobbying the standards makers to ensure it is delivered.

Summing Up

Development of innovative and specialist services drives implementers towards customisation of element sets. In the interests of interoperability such differentiation needs to be counterbalanced. We suggest that information professionals have a role to play, especially as regards building an infrastructue for the sharing and re-use of schemas. Traditions of collaboration within the library world fit well with involvement in a co-operative effort regarding registration and documentation of schemas and profiles.

Other areas also need to be addressed. In order to assist building an infrastructure for sharing schemas requirements need to be clearly articulated for:

Throughout the process judgements need to be made between conflicting demands, whether for differentiation or interoperability; for simplicity or richness; for low cost or sophisticated solutions. In addition the players involved need to be kept informed of developments within a range of relevant standards activities. Although there is much work to be done, there are opportunities for automating much of the process, and real gains to be made in improving quality and effectivenes.

References

(1) Tom Baker. The SCHEMAS Forum - a Retrospective Glossary. http://www.schemas-forum.org/info-services/d74.html

(2) Rachel Heery and Manjula Patel. Application profiles: mixing and matching metadata schemas. In: Ariadne, No. 25, 24 September 2000. http://www.ariadne.ac.uk/issue25/app-profiles/

(3) Heike Neuroth and Traugott Koch, Metadata mapping and application profiles: approaches to providing the cross-searching of heterogeneous resources in the EU project Renardus. In: DC-2001: proceedings of the International Conference on Dublin Core and Metadata Applications 2001, National Institute of Informatics, October 24-26, 2001, Tokyo, Japan. http://www.nii.ac.jp/dc2001/proceedings/

(4) National Health Information Knowledgebase hosted by the Australian Institute of Health and Welfare. Comprehensive Data Definitions Search: http://www.aihw.gov.au/knowledgebase/index.html

(5) Environmental Data Registry hosted by the Envronmental Protection Agency in the USA: http://www.epa.gov/edr/

(6) The XML Registry hosted by OASIS: http://www.xml.org/xml/registry.jsp

(7) MetaForm hosted by the State and University Library at Göttingen: http://www2.sub.uni-goettingen.de/metaform/

(8) SCHEMAS Forum Registry hosted by UKOLN, University of Bath. http://www.schemas-forum.org/registry/

(9) Thomas Baker, Makx Dekkers, Rachel Heery, Manjula Patel and Gauri Salokhe, What terms does your metadata use? Application profiles as machine-understandable narratives. In: Journal of Digital Information, Vol. 2, No. 2, November 2001. http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Baker/

(10) MEG Registry: http://www.ukoln.ac.uk/metadata/education/registry/

(11) DCMI Registry prototypes, see: http://wip.dublincore.org:8080/registry/Registry

(12) DCMI Registry Working Group: http://dublincore.org/groups/registry/

(13) Diane I. Hillmann, Dublin Core Usage Board Administrative Processes, 20 March 2002: http://www.dublincore.org/usage/documents/process/

(14) Erik Duval, Wayne Hodgins, Stuart Sutton and Stuart L. Weibel. Metadata principles and practicalities. in: D-Lib Magazine, Vol. 8, No. 4, April 2002. http://www.dlib.org/dlib/april02/weibel/04weibel.html