Arts and Humanities Data Service Banner AHDS Icon
Dividing Line (Red)

Discovering Online Resources. The Dublin Core: A Simple Content Description Model for Electronic Resources

Stuart Weibel, OCLC Office of Research (Weibel@oclc.org)
(OCLC, http://purl.org/metadata/dublin_core)

Dividing Line (Red)

Contents

Dividing Line (Red)

Metadata for electronic resources

The term metadata simply means data about data. It is the term most often used in the Internet community for what has been known in the library community as cataloguing data, or resource description. The Dublin Core is a 15-element metadata set intended to facilitate discovery of electronic resources and tabulated in an addendum to this chapter. Originally conceived for author-generated description of World Wide Web resources, it has also attracted the attention of formal resource description communities such as museums and libraries.

The Dublin Core Workshop Series has gathered experts from the library world, the networking and digital library research communities, and a variety of content specialities in a series of focused invitational meetings. The building of an interdisciplinary, international consensus around a core element set is the central feature of the three-year evolution of the Dublin Core. The progress represents the emergent wisdom and collective experience of many stakeholders in the resource description arena. An open mailing list supports ongoing work.

The characteristics of the Dublin Core that distinguish it as a prominent candidate for description of electronic resources fall into several categories.

1.1 Simplicity

The Dublin Core is intended to be usable by non-cataloguers. It is expected that authors or Web-site maintainers unschooled in the cataloguing arts will be able to use the Dublin Core for resource description, making their collections more visible to search engines and retrieval systems. Most of the 15 elements have a commonly understood semantics that represents what might be described as a lowest common denominator for resource description (roughly equivalent to a catalogue card). As such, the Dublin Core is not intended to replace richer description models such as AACR2/MARC (Library of Congress 1997b) cataloguing, but rather to provide a core set of description elements that can be used by cataloguers or non-cataloguers for simple resource description.

1.2 Semantic interoperability

On the Internet commons, disparate description models interfere with the ability to search across discipline boundaries. For example, libraries, museums, and the geographic information systems community use different standards for resource description. This reflects the different description needs of these communities, and the fact that such standards have evolved independently. At the level of
fine-grained description, element sets are different because they must describe different things. Most writers seldom associate a cloud-cover attribute with their documents, but if you are describing satellite images of farmland, this is a critical descriptor. Still, most resources share a core set of attributes that are similar from one discipline to the next, but have different names simply because they have evolved independently and at different times. Promoting a commonly understood set of core descriptors will improve the prospects for cross-disciplinary search by unifying related attributes. For example, an author and a creator can be sensibly thought of as the same attribute for the purposes of resource discovery.
The Dublin Core is intended to serve as this core element set.

1.3 International consensus

Recognition of the international scope of resource discovery on the World Wide Web is critical to the development of effective discovery infrastructure. The Dublin Core has benefited from active participation and promotion in the United Kingdom, Australia, Sweden, Denmark, Norway, Finland, Germany, Thailand, Japan, Canada, and the United States.

1.4 Flexibility

Although initially motivated by the need for author-generated resource description, the Dublin Core has also attracted the attention of formal resource description communities. As the diversity and volume of Web resources increases, trusted intermediaries (such as museums and libraries) will achieve greater recognition as preferred sources of metadata for persistent resources. In the hands of cataloguing experts, the Dublin Core is expected to provide an economical alternative to more elaborate description models such as full MARC cataloguing (Library of Congress 1997b). It includes sufficient flexibility to encode the additional structure and more elaborate semantics appropriate to such applications.

1.5 Metadata modularity on the Web

The wide diversity of metadata needs on the World Wide Web requires an environment that supports the coexistence of many independently developed and maintained metadata packages. The Dublin Core is targeted specifically towards resource discovery, but one can imagine many functionally distinct packages that serve other goals (terms and conditions, archival management, administrative metadata, and many others). For example, a Terms and Conditions metadata package would include elements that describe rights holders, cost of acquiring a resource, restrictions on re-use of the resource, and related information. Recognition of the desirability of this sort of modularity has guided the evolution of the Dublin Core since the Warwick Workshop, and has been formalised as the Warwick Framework (Lagoze et al., 1996). The concepts articulated in this work have informed the ongoing development of a metadata architecture for the Web as well.

Metadata architecture for the Web

The World Wide Web Consortium (W3C) is the primary standards forum for the Web, and has recently begun to focus on implementing an architecture for metadata for the Web. The Resource Description Framework, or RDF, is evolving to support the many different metadata needs of vendors and information providers. Representatives of the Dublin Core effort are actively involved in the development of this architecture, bringing the digital library perspective to bear on this important component of the Web infrastructure (W3C 1997a). The evolving RDF metadata architecture will support a variety of resource description models, each with implications for functionality and management.

2.1 Embedded metadata

Currently the easiest way of deploying metadata on the World Wide Web is by embedding it in HTML documents (using the <META> tag). There are conventions that support inclusion of simple metadata in HTML versions 2.0 and above (c.f. Miller and Gill 1997). The HTML 4.0 specification released in July of this year includes additional attributes for the <META> tag that allow the qualifiers necessary for more complex implementations (W3C 1997b). The advantage of embedded metadata is that no additional system must be in place to use it; the metadata is integral to the resource, and can be harvested by Web - indexing agents.

2.2 Third party metadata

A model more familiar to the library community includes what is known in Web parlance as a third-party label bureau; that is, an entity that collects and manages metadata records that refer to resources but are not embedded in the resource (a library catalogue, for example). This model is important not only to libraries and museums. It also supports the development of agencies that might label resources according to age, appropriateness, or other acceptability criteria.

2.3 View Filter

A third model also involves management of records by a distinct entity, but not necessarily Dublin Core records per se. Managing a wide variety of data stores often involves reconciling very different description models. One approach to achieving interoperability in such an environment involves mapping many description schema into a common set such as the Dublin Core, giving users a single query model (Day 1997).

Unsolved problems and future directions

Much remains to be done to bring the Dublin Core to a state of sufficient maturity and stability to fulfil its promise as a foundation for resource discovery on the net. The main thrusts of continued development are enumerated below.

3.1 Continued refinement of Dublin Core elements

The Dublin Core elements emerged from the collective judgement and experience of the many participants in the process to date. As deployment spreads, the evolution of the Dublin Core will reflect experience with the ambiguities, conflicts, and deficiencies in the set. Standards of best practice will evolve in light of such experience.

3.2 User education and application guides

The spread of a common set of resource description conventions depends in part upon the availability of clear user guidelines. Such guidelines must be developed in many languages but with a common purpose and orientation.

3.3 Metadata registries

The Warwick Framework describes the characteristics of an architecture for metadata that will allow independently developed metadata element sets to co-exist. This implies that the 'consumers' of metadata (either people or software agents) will need formal online registries that describe the semantics, the structure, and the transport syntax of a metadata element set. Thus, an application finding Dublin Core metadata associated with a collection of resources might access the Dublin Core Metadata Registry to better understand the characteristics of the metadata. Work on metadata registries is still in an embryonic stage, but as the functional specifications evolve, they will become a central part of the infrastructure necessary to develop and manage change for a metadata set such as the Dublin Core.

3.4 Tools

Tools for creating and managing Web-based metadata are evolving now. As the infrastructure evolves and standards become stable, these tools will become commonplace in authoring, site management, and resource management applications.

3.5 Standardisation

The development of the Dublin Core has been a voluntary effort on the part of many disparate stakeholders in resource description. As it becomes more widely deployed, standards of best practice must be formalised.

Work reported here by the AHDS and UKOLN represents development on several of these fronts, but especially regarding the refinements of the Dublin Core element set and the formalisation of best practice. It is to those refinements, and the formal evaluation process from which they emerged to which attention now turns...

Addendum: the Dublin Core elements

Dublin Core elements may be optionally applied, extended with implementation-specific TYPE attributes, and repeated as necessary when describing any given resource.

Element Name Element Description
Title The name given to the resource by the CREATOR or PUBLISHER.
Creator The person(s) or organisation(s) primarily responsible for creating the intellectual content of the resource.
Subject The topic of the resource: keywords or phrases that describe the subject or content of the resource, including controlled vocabularies
Description A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources.
Publisher The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity.
Contributor Person(s) or organisation(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specified in the CREATOR element (for example, editors, transcribers, and illustrators).
Date The date the resource was made available in its present form.
Type The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. It is expected that TYPE will be chosen from an enumerated list of types.
Format The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image.
Identifier String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers, such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element.
Source The work, either print or electronic, from which this resource is derived, if applicable.
Language Language(s) of the intellectual content of the resource.
Relation Relationship to other resources. The intent of specifying this element is to provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves.
Coverage The spatial and temporal characteristic of the resource. Formal specification of COVERAGE is currently under development.
Rights The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a service that would provide such information dynamically.
Dividing Line (Red)
Return to table of contents

Send comments or questions to info@ahds.ac.uk
Last modified: Monday, 17-Nov-97 16:52:01 GMT by D. Greenstein
URL: http://www.ahds.ac.uk/public/arlist.html


This page was originally part of the Arts and Humanities Data Service (AHDS) Website: http://ahds.ac.uk/public/metadata/disc_03.html
Rescued (courtesy of the Internet Archive) and migrated to the UKOLN Website: 08-Apr-2011; Last updated: 06-May-2011.
The content is identical, but changes have been made to the HTML in an attempt to make it validate, and some links have been updated or deactivated.

Valid XHTML 1.0 Transitional

UKOLN logo