nof-digitise Technical Advisory Service

The questions and answers on this page have been asked by nof-digitise applicants. This page will be updated frequently.

1. Is there a glossary or simplified version of the various metadata standards?

2. As we are developing sites for lifelong learners, do you have any views on whether we should use metadata appropriate for learning packages, e.g. the IMS Learning Resource Metadata Model or LOM (Learning Object Metadata)?

Although the IMS Learning Resource Metadata Model or IEEE Learning Object Metadata (LOM) would be relevant, both these place a significant overhead on the metadata creator; a LOM record could take an hour or more to complete in extreme cases, for example.

We feel that LOM/IMS is too big an overhead for what these projects are meant to be doing (although a LOM/IMS description of each project might be worht considering).

An alternative might be to use Dublin Core with the extensions proposed by the Education Working Group (DCEd) of the DCMI. They have proposed an "Audience" element, and suggest adopting "InteractivityType", "InteractivityLevel", and "TypicalLearningTime" elements from the IEEE LOM standard. More information is available at:
http://dublincore.org/news/pr-20001206.shtml

Also see the UK's Metadata for Education Group at http://www.ukoln.ac.uk/metadata/education/
and note that the UK Government Information Age Champions group are currently working on a metadata schema that is likely to use Dublin Core.

3. Are there recommended standards for the core and extended metadata attributes that should be created for digitised resources, especially images. Dublin Core provides one simple model but is very general, other possible approaches would presumably include MARC and CIMI, but some shared approach to this is presumably seen as valuable.

There are, in fact, quite a few relevant standards. For resource discovery, the nof-digitise guidelines (5.2.1) suggest that "item-level descriptions should be based on the Dublin Core and should be in line with developing e-government and UfI metadata standards." In a Dublin Core context, the specifics of using DCMES for images was discussed at DC-3 - the Image Metadata Workshop held in Dublin, Ohio in September 1996. This workshop resulted in the addition of two new elements to the original thirteen and made some changes to element descriptions.

There is some useful information on DC and other image metadata formats in section 4 of the VADS/TASI guide to creating digital resources in the AHDS Guides to Good Practice series:

This mentions things like the CIMI DTD, MARC, the CIDOC standards, etc. as well as more specialised things like the Visual Resources Association (VRA) Core Record.

There is information on more specialised administrative and structural metadata in the Making of America II project's final report:

Bernard J. Hurley, John Price-Wilkin, Merrilee Proffitt and Howard Besser, The Making of America II Testbed Project: a digital library service model. Washington, D.C.: Council on Library and Information Resources, 1999.
http://www.clir.org/pubs/abstract/pub87abst.html

A shorter list of elements with a primary focus on preservation is available at:

RLG Working Group on Preservation Issues of Metadata, Final report. Mountain View, Calif.: Research Libraries Group, 1998.
http://www.rlg.org/preserv/presmeta.html

This paper first outlines a multi-level video indexing approach based on Dublin Core extensions and the Resource Description Framework (RDF). The advantages and disadvantages of this approach are discussed in the context of the requirements of the proposed MPEG-7 ("Multimedia Content Description Interface") standard. The related work on SMIL (Synchronized Multimedia Integration Language) by the W3C SYMM working group is then described. Suggestions for how this work can be applied to video metadata are made. Finally a hybrid approach is proposed based on the combined use of Dublin Core and the currently undefined MPEG-7 standard within the RDF which will provide a solution to the problem of satisfying widely differing user requirements.

4. Can you advise on approaches to/chosen standards for metadata for sound files. Are there any recently developed models of good practice?

As a more practical example, Jon Maslin (J.Maslin@surrey.ac.uk) describes the approach taken to creating metadata for music recordings, scores and video in the performing arts at the University of Surrey:

We have adopted the Dublin Core as a basis for our metadata because we needed a clearly defined structure and wanted, if possible, to adopt a standard. It was adopted while it was still unclear in some respects, but we knew what we had to achieve so we selected only the relevant elements, expanded some and extended DC with new elements needed for the application.

So, while it was convenient to use it we had to extend it, but did not use all the elements.

We are using the same schema for music recordings, scores and video in the performing arts.

Creators and contributors: The roles of these are defined with their names. There can be an unlimited number. We have not adopted a dictionary of defined roles as the performing arts has a potentially unlimited number, but have taken the view that different applications will act upon the metadata and that retrieval software will be sufficiently intelligent to take care of interpreting different roles (hence an informal convention of adopting the terminology on the source and defining the instrument rather than the role (largely to avoid contortions such as guitarist). It is debatable, as is the difference between creator and contributor in some instances. We have tended to class producers, recording engineers as a contributor. One of the benefits of defining a role is that the importance may not terribly significant.

A similar approach has been adopted for other elements, such as the place and time of recording. We have limited this to a few attributes for our own convenience. There is no reason why this should not be expanded in the way that creators element is used.

The location elemented is structured as a URL. In the example you will see it pointing to the patronserver. It can be to any other web server or a direct file access

In addition a number of patron elements have been added which relate to courses. Another element has been added to define uniquely a title, eg all scores and recordings of a piece have an id.

The most extensive addition has been to define the contents of a piece in a standard way regardless of type or medium. In effect this gives a multi-level table of contents. It has been designed to provide an objective series of access points which can be created without extensive subject knowledge. Typically a classical piece of music will list the movements with references to starting and stopping times. Scores have access points to movements and page numbers and repeats if required. There is no limit to the granularity (beyond time and patience).

It is important to remember that this is entirely independent of the application. The advantage of the XML implementation is that variations in application are relatively simple - in Patron the application displays these in cascading hierarchies.One of the objectives has been to include sufficient data and structure to allow the metadata to be exchanged and processed for the current implementation of Patron and possible enhancement,and also to be developed with more universal standards.

We have created the metadata from a MS Access database which also holds rights information. We have also developed a form builder which automatically creates an input form from the metadata schema. This enables metadata to be created and tested rapidly, and allows inputters to adopt previously entered data to reduce time and to ensure accuracy.

So, the answer is yes and it works, but that it has been application-driven: other applications would need to add to it.

Metadata should be capable of supporting the delivery of item-level DC descriptions of all project resources.

7. Is simple Dublin Core metadata sufficient or are qualifiers needed? If they are, which ones should be used and how will interoperability between different domains be handled?

The 15 Dublin Core metadata elements form a fairly basic cross-domain core that ensures a degree of commonality across domains and applications. In order to less ambiguously express richer or more structured information than is possible in the 15 elements, the Dublin Core community supports the notion of qualification, using element refinements and encoding schemes.

An initial set of these is defined by the Dublin Core community, in the Dublin Core Qualifiers , and these are a good place to start. Where the agreed qualifiers do not meet your needs, it is possible to define others, either within your project or as part of a broader domain-based interest group.

In defining new qualifiers it is important to ensure that:
- they REFINE, and do not EXTEND, the definition of one of the Dublin Core elements
- they do not OVERLAP with the function of an existing qualifier
- if the qualifier is IGNORED by a system/user that does not understand it, the value that is left should still make sense within the definition of the parent element

As an illustration, the DC-Government Working Group recently proposed 'previousAccessMarkingChangeDate' as a refinement of DC.Rights. This was rejected because the definition of DC.Rights is:

A value of the proposed 'previousAccessMarkingChangeDate' element refinement would have been a simple date, which, on its own, does not constitute 'information about rights held in and over the resource'.

8. We are planning to digitise and make accessible through a database 20,000 photographs. We are collecting enough detail at item level to create dublin core. Please can you give some examples of dynamic, database-driven sites which use this.

The AHDS gateway http://www.ahds.ac.uk visibly displays DC metadata.

However, a lot of large sites - including the ADS http://ads.ahds.ac.uk/ - are designed in a Dublin Core-aware fashion and could send a *computer* DC-marked up metadata. It often makes sense to do as ADS have and to display the content to human readers in a way that uses language and field names more intelligible to that audience. Just because the human-readable name has been changed doesn't mean it isn't a Dublin Core field.

9. We are cataloguing video clips and each item has approximately 20 metadata fields that need to be incorporated in the site, offering advanced search options. How would I incorporate a metadata structure that conforms to e-Government standards. What steps do I need to take to achieve this?

The Dublin Core (DC) metadata scheme is based on a set of 15 core elements that are generic enough to define individual digital objects, however and wherever they have been created. Elements included in the list include 'title', 'creator', 'date' etc. A full list of these elements is available from http://dublincore.org/documents/dces/.

In many cases, however, these 15 elements are not sufficient to define accurately the objects in question. The elements are then extended or qualified to define further the resource. For one type of digital resource, an HTML page, one often sees the date element extended to include fields called 'date.created' and 'date.lastmodified', i.e. the metadata includes two dates, one informing when the page was first created and a second informing when it was last updated. For a video collection the rights element may well need to be extended so to record the various copyright issues involved. Sometimes DC elements can be qualified according to examples set by others trying to define similar digital objects; in other cases, projects need to develop their own qualifying terms.

For the criteria mentioned in the query, it would probably be best to have multiple qualifications of the creator and contributor elements to record details of interviewers, interviewees, gender etc. "Which tape" and "absolute address" could probably be slotted under the 'title', 'identifier' or 'source' elements.

It's important to note that there is no perfect metadata scheme for any one collection. How you qualify your DC metadata can depend on how your resources are being digitised or what soft- and hardware you are using. Perhaps most importantly, any metadata scheme depends on who will be searching for your resources. A metadata scheme has to be set up to allow users to find the information they need, so, in an ideal world, the creation of a metadata scheme will follow a period of research on user needs. Users must be thought of in the broadest terms, including not only a general public, for example, but future custodians of the collection. While members of the general public may want to metadata fields which permit they to do advanced searches, future custodians may need to find detailed information on the copyright holders of the videos in questions. This could be recorded in the 'rights' element.

There is a Dublin Core user group especially devoted to metadata issues surrounding moving images, although it is not particularly well developed at the moment. The user group is housed at http://dublincore.org/groups/moving-pictures/ One case study (at http://ahds.ac.uk/shakespeare.htm) gives an indication of how one digital project recording theatrical performances went about creating its metadata.

Dublin Core is recommended by the NOF-digi technical standards because its common takeup should allow digital collections around the country to be interoperable with one another, i.e. to allow users to search through more than one collection at the same time.

We would also point you to a (rather technical) paper looking at video metadata representation (mainly MPEG-7) at:
http://archive.dstc.edu.au/RDU/staff/jane-hunter/www8 /paper.html
which you may find useful.

In addition, as this metadata seems to describes individuals there may also be important data protection problems that need to be solved.

The Identifier element has to be an unambiguous reference because it defines the actual item/resource being described. When describing the kind of resources you will be creating within your NOF project the Identifier element will most likely need to include the Project's Image Number and any reference numbers used by the host institutions (e.g. accession numbers). It could be a URI (Uniform resource identifier) but should not be just the URL of the resource, though the URL could be included.

A few examples of the type of thing you should be putting in the Identifier element are listed below (these are taken from DC Assist)

You could look at some examples from dc-assist - http://www.ukoln.ac.uk/metadata/dcassist/
Bear in mind when you are looking at these in DC Assist that they are presented in a form for representing them in meta elements in HTML, but the values are still useful e.g.

For spatial coverage, values might be:
Columbus, Ohio, USA; Lat: 39 57 N Long: 082 59 W
(using encoding scheme TGN) Columbus (C,V)
(using encoding scheme DCMI Box) northlimit=23.5; southlimit=-23.5; name=The Tropics

For temporal coverage, values might be:
(using encoding scheme W3CDTF) 1945
(using encoding scheme DCMI Period) start=1929; end=1939; name=The Great Depression

Question - How does one use the Period encoding scheme for the element Coverage, Time.? Can I just simply list the Period in a field called Coverage, Period. I found the explanation in the DCMI site difficult to understand.

Again, how you manage it in your database is up to you, but it probably makes sense to have separate fields for the start date, end date and name of the Period (I'd suggest you probably don't need to store the name of the date scheme in your database as that should be constant). You might need to make the group repeatable if you envisage multiple ranges for temporal coverage, but that does seem quite complex.

When you expose/export your metadata, the start date, end date, scheme and name of a range all form part of the value of an occurrence of the spatial coverage property. N.B. this is still a spatial coverage property: "DCMI Period" is the name of an encoding scheme. You might want to check the distinction DC makes between "element refinements" (like "spatial" and "temporal") and "encoding schemes" (like DCMI-Period, or a subject scheme). See the start of:
http://dublincore.org/documents/2000/07/11/dcmes-qualifiers/

Anyway....In the database, you might have a record with fields like:
Identifier - Project 6789
Title - Banking during the Great Depression
Creator - John Smith
Subject - Economic history
Temporal coverage start - 1929
Temporal coverage end - 1939
Temporal coverage name - The Great Depression
etc etc etc

But when you expose/export the DC metadata record the value of the temporal coverage property would be encoded as
start=1929; end=1939; name=The Great Depression
i.e. a single property value with an internal "structure".

12. Can NOF recommend or suggest any models for preservation metadata that we might use for our own projects?

The RLG Working Group which suggests using 16 elements to capture crucial information about a digital file, their elements are fairly 'lightweight' and would probably be OK for a digitisation project, assuming that some descriptive metadata (e.g. DCMES) is also available. It's a bit old now, and it might be worth looking at METS http://www.ukoln.ac.uk/metadata/resources/mets/ or the more detailed set of elements which can be found in the draft NISO Technical Metadata for Digital Still Images standard. This can be found (in PDF) at http://www.niso.org/committees/committee_au.html and is also mentioned in the NOF guidelines.

Other guidance would be available in:
Anne R. Kenney and Oya Y. Rieger, Moving theory into practice: digital imaging for libraries and archives. Mountain View, Calif.: Research Libraries Group, 2000.

You could also have a look at the OCLC/RLG Preservation Metadata Working Group which has published an overview (chiefly of the OAIS model, and the specifications developed by Cedars, NEDLIB and NLA) and recommendations for 'Content Information' and a forthcoming one on 'Preservation Description Information' (these are OAIS terms): http://www.oclc.org/research/pmwg/

The "Gathering the Jewels" NOF digitisation project in Wales has settled on what metadata elements and digitisation guidelines it is going to adopt. In the interests of sharing this information as widely as possible, they have put it up on their Web site - please see http://www.gtj.org.uk/technical_logo.html and scroll down the page.

It can be useful to embed metadata into the HTML meta elements on a Web page, however when doing so keep the points below in mind.

(a) it depends on a service provider (i) finding the document (search engines still have issues when harvesting dynamically created pages, for more information see Search Engine Watch and the NOF dissemination section of the programme manual) and (ii) extracting and using the metadata; and

(b) HTML meta elements are not the only way of exposing metadata. Further information on OAI and other ways of making your metadata available will follow.

The dc:relation element is used to encode a reference to a resource which is related to the resource being described. The value of the dc:relation element should be an identifier for the related resource.

In any DC metadata record, there may be multiple occurrences of the dc:relation element, expressing relationships between the current resource and a number of other resources.

In simple/unqualified Dublin Core, dc:relation allows you to express the fact that a relationship exists between the current resource and a related resource, but it does not permit you to say anything more about the nature of that relationship between the two resources.

Qualified Dublin Core introduces a number of element refinements to dc:relation, which allow you to express the nature of the relationship between the current resource and the related resource.

In both simple/unqualified DC and in qualified DC, the value of the dc:relation element (or the value of any of its element refinements) should be an identifier for the related resource.

e.g.
[Simple DC] dc:relation = http://my.project/resource2
[Qualified DC] dcterms:isPartOf = http://my.project/resource2

16. How would you define the language of a bilingual item using Dublin Core?

Hopefully you are storing this metadata some how in your database for your own use. If so it should be fairly straightforward to define the language of your pamphlet using Dublin Core. Although DC doesn't allow for a second language you can have multiple occurrences of one element (in fact all of the 15 elements allow unlimited occurrence). So for example in a html page this would appear as below (depending on which encoding scheme you choose to use - ISO 639 or RFC 1766, and how many characters).

17. NOF require that I submit a sample of my project's metadata. Could you show me an example of how to do this?

All projects are required to submit samples of their item-level metadata and indicate which fields are being used for Dublin Core metadata.

This is a fictional sample taken from the digitisation project of Sandfordshire Council. It is of a digitised image of an etching done by the artist John Shade. It gives an indication of the format that should be used when forwarding metadata samples to case managers. Many of the fields here are loosely based on the JIDI Metadata Guidelines.

The example shows what categories the project is using for its metadata, the actual descriptions used for one item and how the fields relate to the core element set of Dublin Core. Note that not every Dublin Core element needs to be mapped to; DC.RELATION and DC.SOURCE, for instance, were omitted from the example below. Other Dublin Core fields, in this case DC.COVERAGE, can be qualified to add extra descriptive richness. The notes on the right indicate what controlled vocabularies can be used, but there is no need for projects to indicate which ones they are utilising.

Projects will have developed different metadata schema according to their collection and content; some will have more detail in certain areas and some will have less. There is no need to replicate the schema shown here. What is important with the sample is to give a sense of how each of your metadata categories are being interpreted and how they are being mapped to Dublin Core.

Frequently Asked Questions