UKOLN AHDS QA Focus Briefing Documents: Print All - metadata



This page is for printing out briefing papers on metadata. Note that some of the internal links may not work.


Briefing 41

Introduction To Metadata


What is Metadata?

Metadata is often described as "data about data". The concept of metadata is not new - a Library catalogue contains metadata about the books held in the Library. What is new is the potential that metadata provides in developing rich digital library services.

The term metadata has come to mean structured information that is used by automated processes. This is probably the most useful way to think about metadata [1].

The Classic Metadata Example

The classic example of metadata is the library catalogue. A catalogue record normally contains information about a book (title, format, ISBN, author, etc.). Such information is stored in a structured, standardised form, often using an international standard known as MARC. Use of this international standard allows catalogue records to be shared across organisations.

Why is Metadata So Important?

Although metadata is nothing new, the importance of metadata has grown with the development of the World Wide Web. As is well-known the Web seeks to provide universal access to distributed resources. In order to develop richly functional Web applications which can exploit the Web's global information environment it is becoming increasingly necessary to make use of metadata which describes the resources in some formal standardised manner.

Metadata Standards

In order to allow metadata to be processed in a consistent manner by computer software it is necessary for metadata to be described in a standard way. There are many metadata standards available. However in the Web environment the best known standard is the Dublin Core standard which provides an agreed set of core metadata elements for use in resource discovery.

The Dublin Core standard (formally known as the Dublin Core Metadata Element Set) has defined 15 core elements: Title, Creator, Subject, Description, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage and Rights [2].

The core element set is clearly very basic. A mechanism for extending Dublin Core elements has been developed. This allows what is known as Qualified Dublin Core elements to refine the core elements. For example DC.Date.Created refines the DC.Date element by allowing the date of creation of the resource to be described. DC.Date.Modified can be used to describe the date on which the resource was changed. Without the qualifiers, it would not be possible to tell which date related to which event. Work is in progress in defining a common framework for qualifiers.

Using Metadata

The Dublin Core standard defines a set of core elements. The standard does not specify how these elements should be deployed on the Web. Initially consideration was given to using Dublin Core by embedding it within HTML pages using the <meta> element e.g. <meta name="DC.Creator" content="John Smith">. However this approach has limitations: initially HTML was not rich enough to all metadata schemes to be including (which could specify that a list of keywords are taken from the Library Of Congress list); it is not possible to define relationships for metadata elements (which may be needed if, for example, there are multiple creators of a resource) and processing the metadata requires the entire HTML document to be downloaded.

In order to address these concerns a number of alternative approaches for using metadata have been developed. RDF (Resource Description Framework) [3], for example, has been developed by W3C as a framework for describing a wide range of metadata applications. In addition OAI (Open Archives Initiative) [4] is an initiative to develop and promote interoperability standards that aim to facilitate the efficient dissemination of content.

In addition to selecting the appropriate standards use of metadata may also require use of a metadata management system and a metadata repository.

References

  1. Metadata Demystified, NISO,
    <http://www.niso.org/standards/resources/Metadata_Demystified.pdf>
  2. Dublin Core Metadata Element Set, DCMI,
    <http://dublincore.org/documents/dces/>
  3. Resource Description Framework (RDF), W3C,
    <http://www.w3.org/RDF/>
  4. Open Archives Initiative (OAI),
    <http://www.openarchives.org/>
  5. Information Environment Home, JISC,
    <http://www.jisc.ac.uk/index.cfm?name=ie_home>

Briefing 42

Metadata Deployment


Introduction

This document describes the issues you will need to address in order to ensure that you make use of appropriate approaches for the deployment of metadata within your project.

Why Do You Wish To Use Metadata?

The first question you should address is "Why do you wish to use metadata?". You may have heard that metadata is important. You may have heard that metadata will help solve many problems you have with your project. You may have heard that others are using metadata and you don't wish to be left behind. Although all of these points have some validity, they are not sufficient in isolation to justify the time and effort needed in order to deploy metadata effectively.

You should first specify the problem you wish to address using metadata. It may be that you wish to allow resources on your Web site to be found more easily from search engines such as Google. It may be that you wish to improve local searching on your Web site. It may be that you wish to interoperate with other projects and services. Or it may be that you wish to improve the maintenance of resources on your Web site. In all of these cases metadata may have a role to play; however different approaches may be needed to tackle these different problem and, indeed, approaches other than use of metadata may be more effective (for example, Google makes only limited use of metadata so an alternative approach may be needed).

Identifying The Functionality To Be Provided

Once you have clarified the reasons you wish to make use of metadata you should identify the end user functionality you wish to provide. This is needed in order to define the metadata you will need, how it should be represented and how it should be created, managed and deployed.

Choosing The Metadata Standard

You will need to choose the metadata standard which is relevant for your purpose. In many cases this may be self-evident - for example, your project may be funded to develop resources for use in an OAI environment, in which case you will be using the OAI application.

Metadata Modelling

It may be necessary for you to decide how to model your metadata. For example if you wish to use qualified Dublin Core metadata you will have to chose the qualifiers you wish to use. A QA Focus case study illustrates the decision-making process [1].

Metadata Management

It is important that you give thought to the management of the metadata. If you don't you are likely to find that your metadata becomes out-of-date. Since metadata is not normally displayed to end users but processed by software you won't even be able to use visual checking of the metadata. Poor quality metadata is likely to be a major barrier to the deployment of interoperable services.

If, for example, you embed metadata directly into a file, you may find it difficult to maintain the metadata (e.g. the creator changes their name or contact details). A better approach may be use of a database (sometimes referred to as a metadata repository) which provides management capabilities.

Example Of Use Of This Approach

The Exploit Interactive [2] e-journal was developed by UKOLN with EU funding. Metadata was required in order to provide enhanced searching for the end user. The specific functionality required was the ability to search by issue, article type, author and title and by funding body. In addition metadata was needed in order to assist the project manager producing reports, such as the numbers of different types of articles. This functionality helped to identify the qualified Dublin Core elements required.

The MS SiteServer software used to provide the service provided an indexing and searching capability for processing arbitrary metadata. It was therefore decided to provide Dublin Core metadata stored in <meta> tags in HTML pages. In order to allow the metadata to be more easily converted into other formats (e.g. XHTML) the metadata was held externally and converted to HTML by server-side scripts.

A case study which gives further information (and describes the limitations of the metadata management approach) is available [3].

References

  1. Gathering the Jewels: Creating a Dublin Core Metadata Strategy, QA Focus,
    <http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-13/>
  2. Exploit Interactive,
    <http://www.exploit-lib.org/>
  3. Managing And Using Metadata In An E-Journal, QA Focus,
    <http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-01/>

Briefing 43

Quality Assurance For Metadata


Introduction

Once you have decided to make use of metadata in your project, you then need to agree on the functionality to be provided, the metadata standards to be used and the architecture for managing and deploying your metadata. However this is not the end of the matter. You will also need to ensure that you have appropriate quality assurance procedures to ensure that your metadata provides fitness for its purposes.

What Can Go Wrong?

There are a number of ways in which services based on metadata can go wrong, such as:

Incorrect content:
The content of the metadata may be incorrect or out-of-date. There is a danger that metadata content is even more likely to be out-of-date than normal content, as content is normally visible, unlike metadata which is not normally displayed on, say, a Web page. In addition humans can be tolerant of errors, ambiguities, etc. in ways that software tools normally aren't.
Inconsistent content:
The metadata content may be inconsistent due to a lack of cataloguing rules and inconsistent approaches if multiple people are involved in creating metadata.
Non-interoperable content:
Even if metadata is consistent within a project, other projects may apply different cataloguing rules. For example the date 01/12/2003 could be interpreted as 1 December or 12 January if projects based in the UK and USA make assumptions about the date format.
Incorrect format:
The metadata may be stored in a non-valid format. Again, although Web browsers are normally tolerant of HTML errors, formats such as XML insist on compliance with standards.
Errors with metadata management tools:
Metadata creation and management tools could output metadata in invalid formats.
Errors with the workflow process:
Data processed by metadata or other tools could become corrupted through the workflow. As a simple example a MS Windows character such as © could be entered into a database and then output as an invalid character in a XML file.

QA For Metadata Content

You should have procedures to ensure that the metadata content is correct when created and is maintained as appropriate. This could involve ensuring that you have cataloguing rules, ensuring that you have mechanisms for ensuring the cataloguing rules are implemented (possibly in software when the metadata is created). You may also need systematic procedures for periodic checking of the metadata.

QA For Metadata Formats

As metadata which is to be reused by other applications is increasingly being stored in XML it is essential that the format is compliant (otherwise tools will not be able to process the metadata). XML compliance checking can be implemented fairly easily. More difficult will be to ensure that metadata makes use of appropriate XML schemas.

QA For Metadata Tools

You should ensure that the output from metadata creation and management tools is compliant with appropriate standards. You should expect that such tools have a rich set of test suites to validate a wide range of environments. You will need to consider such issues if you develop your own metadata management system.

QA For Metadata Workflow

You should ensure that metadata does not become corrupted as it flows through a workflow system.

A Fictitious Nightmare Scenario

A multimedia e-journal project is set up. Dublin Core metadata is used for articles which are published. Unfortunately there are documented cataloguing rules and, due to a high staff turnover (staff are on short term contracts) there are many inconsistencies in the metadata (John Smith & Smith, J.; University of Bath and Bath University; etc.)

The metadata is managed by a home-grown tool. Unfortunately the author metadata is output in HTML as DC.Author rather than DC.Creator. In addition the tool output the metadata in XHTML 1.0 format which is embedded in HTML 4.0 documents.

The metadata is created by hand and is not checked. This results in a large number of typos and use of characters which are not permitted in XML without further processing (e.g. £, — and &).

Rights metadata for images which describes which images can be published freely and which is restricted to local use becomes separated from the images during the workflow process.


Briefing 44

Metadata Harvesting


Background

As the number of available digital resources increases so does the need for quick and accurate resource discovery. In order to allow users to search more effectively many resource discovery services now operate across the resources of multiple distributed content providers. There are two possible ways to do this. Either by distributed searching across many metadata databases or by searching harvested metadata.

Metadata harvesting is the aggregation of metadata records from multiple providers into a single database. Building applications or services that use these aggregated records provides additional views of those resources, assisting in access across sectors and greater exposure of those resources to the wider community.

Open Archives Initiative Protocol for Metadata Harvesting

When metadata harvesting is carried out within the JISC Information Environment the Open Archives Initiative Protocol for Metadata Harvesting (OAI PMH) [1] version 2.0 is recommended. The Open Archives Initiative [2] had it roots in the e-prints community who were trying to improve access to scholarly resources. The OAI PMH was developed initially by an international technical committee in 1999. It is a light-weight low cost protocol that is built on HTTP and XML. The protocol defines six requests, known as verbs:

  1. GetRecord Identify
  2. ListIdentifiers
  3. ListMetadataFormats
  4. ListRecords
  5. ListSets

In order for metadata to be shared effectively two things need to happen:

  1. Content/data providers need to make metadata records available in a commonly understood form.
  2. Service providers need to obtain these metadata records from the content providers and hold them in a repository.

OAI PMH provides a means of doing the above.

Record Format

At the lowest level a data provider must support the simple Dublin Core [3] record format ('oai_dc'). This format is defined by the OAI-PMH DC XML schema [4]. Data providers may also provide metadata records in other formats. Within the JISC Information Environment if the repository is of value to the learning and teaching community projects should also consider exposing metadata records that conform to the UK Common Metadata Framework [5] in line with the IMS Digital Repositories Specification using the IEEE LOM XML schemas [6] .

OAI-PMH also provides a number of facilities to supply metadata about metadata records for example rights and/or provenance information can be provided in the <about> element of the GetRecord response. Also collection-level descriptions can be provided in the <description> element of the Identify response.

Example OAI DC metadata record

The following example is taken from the Library of Congress Repository 1).


<oai_dc:dc>
<dc:title>Empire State Building. [View from], to Central Park</dc:title>
<dc:creator>Gottscho, Samuel H. 1875-1971, photographer.</dc:creator>
<dc:date>1932 Jan. 19</dc:date>
<dc:type>image</dc:type>
<dc:type>two-dimensional nonprojectible graphic</dc:type>
<dc:type>Cityscape photographs.</dc:type>
<dc:type>Acetate negatives.</dc:type>
<dc:identifier>http://hdl.loc.gov/loc.pnp/gsc.5a18067</dc:identifier>
<dc:coverage>United States--New York (State)--New York.</dc:coverage>
<dc:rights>No known restrictions on publication.</dc:rights>
</oai_dc:dc>

Conformance Testing for Basic Functionality

The OAI gives information on tests an OAI repository must successfully complete in order to be entered in the registry. For example:

More information on the tests necessary is available from the OAI Web site [7]. Projects could use the tests listed to create a checklist to measure their repository's conformance.

References

  1. The Open Archives Initiative Protocol for Metadata Harvesting,
    <http://www.openarchives.org/OAI/openarchivesprotocol.html>
  2. Open Archives Initiative,
    <http://www.openarchives.org/>
  3. Dublin Core,
    <http://dublincore.org/>
  4. OAI-PMH DC XML Schema,
    <http://www.openarchives.org/OAI/2.0/oai_dc.xsd>
  5. UK Common Metadata Framework,
    <http://metadata.cetis.ac.uk/guides/>
  6. IMS Digital Repositories Specification,
    <http://www.imsglobal.org/digitalrepositories/>
  7. Registering as a Data Provider,
    <http://www.openarchives.org/data/registerasprovider.html>

Further Information


Briefing 63

Choosing a Metadata Standard For Resource Discovery


Background

Resource discovery metadata is an essential part of any digital resource. If resources are to be retrieved and understood in the distributed environment of the World Wide Web, they must be described in a consistent, structured manner suitable for processing by computer software. There are now many formal standards. They range from simple to rich formats, from the loosely structured to the highly structured, and from proprietary, emerging standards, to international standards.

There is no set decision-making procedure to follow but here are some factors that should normally be considered:

Purpose of metadata: A well-articulated definition of purposes at the outset can act as a benchmark against which to compare standards. Metadata may be for:

Attributes of resource: It is important that you also identify your resource type (e.g. text, image), its domain of origin (e.g. library, archive or museum), subject (e.g. visual arts, history) and the specific features that are essential to an understanding of it. Datasets, digital texts, images and multimedia objects, for instance, clearly have very different attributes. Does your resource have pagination or is it three-dimensional? Was it born digital or does it have a hard-copy source? Which attributes will the user need to know to understand the resource?

Design of standard: Metadata standards have generally been developed in response to the needs of specific resource types, domains or subjects. Therefore, once you know the type, domain and broad subject of your resource, you should be able to draw up a shortlist of likely standards. Here are some of the better-known ones:

The key attributes of your resource can be matched against each standard in turn to find the best fit. Is there a dedicated element for each attribute? Are the categories of information relevant and at a suitable level of detail?

Granularity: At this point it is worth considering whether your metadata should (as is usual) be created at the level of the text, image or other such item or at collection level. Collection-level description may be provided where item-level metadata is not feasible or as an additional layer providing an overview of the resource. This could be valuable for large-scale digitisation projects or portals where item-level searching may retrieve an unmanageable number of 'hits'. Digital reproductions may be grouped like their real world sources e.g. by subject or provenance - or be assigned to multiple 'virtual collections'. The RSLP Collection Level Description is emerging as the leading format in this area.

Interoperability: It is important, wherever possible, to choose one of the leading standards (such as those listed above) from within your subject community or domain. This should help to make your resource accessible beyond the confines of your own project. Metadata that is in a recognisable common format may be harvested by subject or domain-wide portals and cross-searched with resources from many other institutions. In-house standards may be tailored to your precise needs but are unlikely to be compatible with other standards and should be used only where nothing suitable already exists. If your over-riding need is for interoperability across all domains or subjects, Dublin Core may be the most suitable standard but it may lack the richness required for other purposes. Care should be taken to ensure that in-house standards at least map to Dublin Core or one of the DC Application profiles.

Support: Using a standard that is well supported by a leading institution can also bring cost benefits. Implementation guidance, user guidance, examples, XML/RDF schemas, crosswalks, multi-lingual capacity, and software tools may pre-exist, thus easing the process of development, customisation and update.

Growth: Consider too whether the standard is capable of further development? Are there regular working groups and workshops devoted to the task?

Extensibility: Also, does the standard permit the inclusion of data elements drawn from other schemas and the description of new object types? It may be necessary to 'mix and match' elements from more than one standard.

Reputation: Funding bodies will be familiar with established, international standards - something, perhaps, to remember when applying for digitisation grants.

Ease of use: Be aware that the required level of expertise can vary greatly between standards. AACR2 and MARC 21, for instance, may produce rich bibliographic description but require the learning of rules. The simpler Dublin Core may allow creators to produce their own metadata records with no extensive training.

Existing experience: Have staff at your organisation used the metadata standard before? If so, the implementation time may be reduced.

Summary

There is no single standard that is best for all circumstances. Each is designed to meet a need and has its own strengths and weaknesses. Start by considering the circumstances of the individual digital project and identify the need(s) or purpose(s) that the metadata will need to satisfy. Once that is done, one can evaluate rival metadata schemas and find the best match. A trade-off will normally have to be made between the priorities listed above.

Summary

There is no single standard that is best for all circumstances. Each is designed to meet a need and has its own strengths and weaknesses. Start by considering the circumstances of the individual digital project and identify the need(s) or purpose(s) that the metadata will need to satisfy. Once that is done, one can evaluate rival metadata schemas and find the best match. A trade-off will normally have to be made between the priorities listed above.

Further Information


Briefing 64

Metadata And Subject Searching


Introduction

Digital collections are only likely to make an impact on the Web if they are presented in such a way that users can retrieve their component parts quickly and easily. This is true even if they have been well selected, digitised to a suitable standard and have appropriate metadata formats. Subject-based access to the collection through searching and/or browsing a tree-like structure can greatly enhance the value of your resource.

Subject Access - Some Options

Subject-based access can be provided in several ways:

Keywords: A simple but crude method is to anticipate the terms that an unguided searcher might intuitively choose and insert them into a keyword field within relevant records. For instance, the text of Ten days that shook the world [1], a classic narrative of the events of 1917, is more likely to be retrieved if the keywords Russian Revolution are added by the cataloguer (based on his/her analysis of the resource and subject knowledge) and if the keyword field is included in the search. In the absence of an agreed vocabulary, however, variant spellings (labor versus labour), and synonyms or near synonyms (Marxist versus Communist) that distort retrieval are likely to proliferate.

Thesauri and subject schemes: Controlled vocabularies, known as thesauri, can prevent inconsistent description and their use is recommended. They define preferred terms and their spelling. If the thesaurus structure is shown on the search interface, users may be guided through broader-term, narrower-term and associated-term relationships to choose the most appropriate keyword with which to search. Take care to choose a vocabulary appropriate to the scope of your resource. A broad and general collection might require a correspondingly universal vocabulary, such as the Library of Congress Subject Headings (>LCSH) [2]. A subject-specific vocabulary, such as the Getty Art and Architecture Thesaurus (AAT) [3], may provide a more limited but detailed range of terms appropriate for a tightly focused collection.

Classification schemes: Keywords and thesauri are primarily aids to searching but browsing can often be a more rewarding approach - particularly for users new to a given subject area. Thesauri are not always structured ideally for browsing as when related or narrower terms are listed alphabetically rather by topical proximity. Truly effective browsing requires the use of a subject classification scheme. A classification scheme arranges resources into a hierarchy on the basis of their subject but differs from a thesaurus in using a sophisticated alphanumeric notation to ensure that related subjects will be displayed in close, browsable, proximity. A well-designed classification scheme should present a navigable continuum of topics from one broad subject area to another and in this way guide the user related items that might otherwise be missed, as in this example from the Dewey Decimal Classification (DDC) [4].

700 Arts, fine and decorative
740 Drawing and decorative arts
745 Decorative arts
745.6 Calligraphy, heraldic design, illumination
745.66 Heraldic design

The notation does not necessarily have to be displayed on screen, however. The subject terms, rather than their respective numbers, may mean more to the user. Another tip is to assign multiple classification numbers to any item that crosses subjects. Digital items can have several 'virtual' locations, unlike a book, which is tied to a single position on a shelf.

Keywords, thesauri and classification can be used in combination or individually.

Choosing a Classification Scheme

The most important consideration when choosing a classification scheme is to select the one that best fits the subject, scope and intended audience of your resource.

Universal classification schemes: These are particularly appropriate where collections and their audiences span continents, subjects and languages. Dewey Decimal Classification (DDC) [5], for instance, is the most widely recognised scheme worldwide, whilst UDC (Universal Decimal Classification) [6] is predominant in Europe and Asia. Well-established schemes of this sort are most likely to have user-friendly online implementation tools.

National or subject-specific schemes: More specific collections are usually best served by schemes tailored to a single country (e.g. BC Nederlandse Basisclassificatie) [7], language, or subject (e.g. NLM National Library of Medicine) [8]. If nothing suitable exists, a scheme can be created in-house.

Homegrown schemes: Project-specific schemes can be flexible, easy to change and suited wholly to one's own needs so that there are no empty categories or illogical subject groupings to hinder browsing. However, the development process is costly, time-consuming and requires expert subject-knowledge. Users are sure to be unfamiliar with your categories and, perhaps worst of all, such schemes are unlikely to be interoperable with the broader information world and will hinder wider cross searching. They should be regarded very much as a last resort.

Adapting an existing scheme: A better approach is normally to adapt an existing scheme by rearranging empty classes, raising or lowering branches of the hierarchy, renaming captions, or extending the scheme. Be aware, though, that recurring notation may be found within a scheme at its various hierarchical levels or the scheme might officially be modified over time, both of which can lead to conflict between the official and customised versions. Take care to document your changes to ensure consistency through the lifetime of the project. Some well-known Internet search-services (e.g. Yahoo!) [9] have developed their own classifications but there is no guarantee that they will remain stable or even survive into the medium term.

Double classification: It may be worthwhile classifying your resource using a universal scheme for cross-searching and interoperability in the wider information environment and at the same time using a more focused scheme for use within the context of your own Web site. Cost is likely to be an issue that underpins all of these decisions. For instance, the scheme you wish to use may be freely available for use on the Internet or alternatively you may need to pay for a licence.

References

  1. Ten Days That Shook the World, Reed, J. New York: Boni & Liveright, 1922; Bartleby.com, 2000,
    <http://www.bartleby.com/79/>
  2. Library of Congress Authorities,
    <http://authorities.loc.gov/>
  3. Art & Architecture Thesaurus Online,
    <http://www.getty.edu/research/conducting_research/vocabularies/aat/>
  4. Dewey Decimal Classification and Relative Index, Dewey, M. in Joan S. Mitchell et al (ed.), 4 vols, (Dublin, Ohio: OCLC, 2003), Vol. 3, p. 610
  5. Dewey Decimal Classification,
    <http://www.oclc.org/dewey/>
  6. Universal Decimal Classification,
    <http://www.udcc.org/>
  7. Nederlandse Basisclassificatie (Dutch Basic Classification),
    <http://www.kb.nl/dutchess/>
  8. National Library of Medicine
    <http://wwwcf.nlm.nih.gov/class/>
  9. Yahoo!
    <http://www.yahoo.com/>

Further Information


Briefing 36

IMS Question And Test Interoperability


Introduction

This document describes an international specification for computer based questions and tests, suitable for those wishing to use computer based assessments in courses.

What Is IMS Question And Test Interoperability?

Computers are increasingly being used to help assess learning, knowledge and understanding. IMS Question and Test Interoperability (QTI) [1] is an international specification for a standard way of sharing such test and assessment data. It is one of a number of such specifications being produced by the IMS Global Learning Consortium to support the sharing of computer based educational material such as assessments, learning objects and learner information.

This new specification is now being implemented within a number of assessment systems and Virtual Learning Environments. Some systems store the data in their own formats but support the export and import of question data in IMS QTI format. Other systems operate directly on IMS QTI format data. Having alternative systems conforming to this standard format means that questions can be shared between institutions that do not use the same testing systems. It also means that banks of questions can be created that will be usable by many departments.

Technical Details

The QTI specification uses XML (Extensible Markup Language) to record the information about assessments. XML is a powerful and flexible markup language that uses 'tags' rather like HTML. The IMS QTI specification was designed to be pedagogy and subject neutral. It supports five different type of user response (item selection, text input, numeric input, xy-position selection and group selection) that can be combined with several different input techniques (radio button, check box, text entry box, mouse xy position dragging or clicking, slider bar and others). It is able to display formatted text, pictures, sound files, video clips and even interactive applications or applets. How any particular question appears on the screen and what the user has to do to answer it may vary between different systems, but the question itself, the knowledge or understanding required to answer it, the marks awarded and the feedback provided should all remain the same.

The specification is relatively new. Version 1.2 was made public in 2002, and a minor upgrade to Version 1.2.1 was made early in 2003, that corrected some errors and ambiguities. The specification is complex comprising nine separate documents. Various commercial assessment systems (e.g. Questionmark [2], MedWeb, Canvas Learning [3]) have implemented some aspect of IMS QTI compatibility for their assessments. A number of academic systems are also being developed to comply with the specification. These include the TOIA project [4] which will have editing and course management facilities, the SToMP system [5], which was used with students for the first time in 2002, and a Scottish Enterprise system called Oghma which is currently being developed.

Discipline Specific Features

A disadvantage of such a standard system is that particular features required by some disciplines are likely to be missing. For example, engineering and the sciences need to be able to deal with algebraic expressions, the handling of both accuracy and precision of numbers, the use of alternative number bases, the provision of randomised values, and graphical input. Language tests need better textual support such as the presetting of text entry boxes with specific text and more sophisticated text based conditions. Some of these features are being addressed by groups such as the CETIS assessment SIG [6].

What This Means To You

If you are starting or planning to start using computer based tests, then you need to be aware of the advantages of using a standard-compliant system. It is clearly a good idea to choose a system that will allow you to move your assessments to another system at a later time with the minimum of effort or to be able to import assessments authored elsewhere.

A consideration to bear in mind, however, is that at this early stage in the life of the specification there will be a range of legacy differences between various implementations. It will also remain possible with some 'compliant' systems to create non-standard question formats if implementation specific extensions are used. The degree of conformity of any one system is a parameter that is difficult to assess at any time. Tools to assist with this are now beginning to be discussed, but it will be some time before objective measures of conformance will be available. In view of this it is a good idea to keep in touch with those interested in the development of the specification, and the best way within UK HE is probably via the CETIS Assessment Special Interest Group Web site [7].

It is important that the specification should have subject specific input from academics. The needs of different disciplines are not always well known and the lack of specific features can make adoption difficult. Look at the examples on the CETIS Web site and give feedback on areas where your needs are not being met.

References And Further Information

  1. QTI Specification,
    <http://www.imsglobal.org/>
  2. Questionmark,
    <http://www.questionmark.com/>
  3. Canvas Learning Author and Player,
    <http://www.canvaslearning.com/>
  4. TOIA,
    <http://www.toia.ac.uk>
  5. SToMP,
    <http://www.stomp.ac.uk/>
  6. CETIS Assessment Special Interest Group,
    <http://www.cetis.ac.uk/assessment>
  7. CETIS,
    <http://www.cetis.ac.uk/>

The following URLs may also be of interest.

Acknowledgments

This document was originally written by Niall Sclater and Rowin Cross of CETIS and adapted by Dick Bacon, Department of Physics, University of Surrey, consultant to the LTSN Physical Sciences Centre.

The original briefing paper (PDF format) is available on the CETIS Web site. The version available on this Web site was originally published in the LTSN Physical Science News (Centre News issue 10).


Briefing 44

Metadata Harvesting


Background

As the number of available digital resources increases so does the need for quick and accurate resource discovery. In order to allow users to search more effectively many resource discovery services now operate across the resources of multiple distributed content providers. There are two possible ways to do this. Either by distributed searching across many metadata databases or by searching harvested metadata.

Metadata harvesting is the aggregation of metadata records from multiple providers into a single database. Building applications or services that use these aggregated records provides additional views of those resources, assisting in access across sectors and greater exposure of those resources to the wider community.

Open Archives Initiative Protocol for Metadata Harvesting

When metadata harvesting is carried out within the JISC Information Environment the Open Archives Initiative Protocol for Metadata Harvesting (OAI PMH) [1] version 2.0 is recommended. The Open Archives Initiative [2] had it roots in the e-prints community who were trying to improve access to scholarly resources. The OAI PMH was developed initially by an international technical committee in 1999. It is a light-weight low cost protocol that is built on HTTP and XML. The protocol defines six requests, known as verbs:

  1. GetRecord Identify
  2. ListIdentifiers
  3. ListMetadataFormats
  4. ListRecords
  5. ListSets

In order for metadata to be shared effectively two things need to happen:

  1. Content/data providers need to make metadata records available in a commonly understood form.
  2. Service providers need to obtain these metadata records from the content providers and hold them in a repository.

OAI PMH provides a means of doing the above.

Record Format

At the lowest level a data provider must support the simple Dublin Core [3] record format ('oai_dc'). This format is defined by the OAI-PMH DC XML schema [4]. Data providers may also provide metadata records in other formats. Within the JISC Information Environment if the repository is of value to the learning and teaching community projects should also consider exposing metadata records that conform to the UK Common Metadata Framework [5] in line with the IMS Digital Repositories Specification using the IEEE LOM XML schemas [6] .

OAI-PMH also provides a number of facilities to supply metadata about metadata records for example rights and/or provenance information can be provided in the <about> element of the GetRecord response. Also collection-level descriptions can be provided in the <description> element of the Identify response.

Example OAI DC metadata record

The following example is taken from the Library of Congress Repository 1).


<oai_dc:dc>
<dc:title>Empire State Building. [View from], to Central Park</dc:title>
<dc:creator>Gottscho, Samuel H. 1875-1971, photographer.</dc:creator>
<dc:date>1932 Jan. 19</dc:date>
<dc:type>image</dc:type>
<dc:type>two-dimensional nonprojectible graphic</dc:type>
<dc:type>Cityscape photographs.</dc:type>
<dc:type>Acetate negatives.</dc:type>
<dc:identifier>http://hdl.loc.gov/loc.pnp/gsc.5a18067</dc:identifier>
<dc:coverage>United States--New York (State)--New York.</dc:coverage>
<dc:rights>No known restrictions on publication.</dc:rights>
</oai_dc:dc>

Conformance Testing for Basic Functionality

The OAI gives information on tests an OAI repository must successfully complete in order to be entered in the registry. For example:

More information on the tests necessary is available from the OAI Web site [7]. Projects could use the tests listed to create a checklist to measure their repository's conformance.

References

  1. The Open Archives Initiative Protocol for Metadata Harvesting,
    <http://www.openarchives.org/OAI/openarchivesprotocol.html>
  2. Open Archives Initiative,
    <http://www.openarchives.org/>
  3. Dublin Core,
    <http://dublincore.org/>
  4. OAI-PMH DC XML Schema,
    <http://www.openarchives.org/OAI/2.0/oai_dc.xsd>
  5. UK Common Metadata Framework,
    <http://metadata.cetis.ac.uk/guides/>
  6. IMS Digital Repositories Specification,
    <http://www.imsglobal.org/digitalrepositories/>
  7. Registering as a Data Provider,
    <http://www.openarchives.org/data/registerasprovider.html>

Further Information


Briefing 69

QA In The Construction Of A TEI Header


Background

Since the TEI header is still a relatively recent development, there has been a lack of clear guidelines as to its implementation; with the result that metadata has tended to be poor and sometimes erroneous. The implementation of a standard approach to metadata will improve the quality of data and increase the likelihood of locating relevant information.

Structure of a TEI header

The TEI header has a well-defined structure that may provide information analogous to that of a title page for printed text. The <teiHeader> element contains four major components:

  1. FileDesc: The mandatory <fileDesc> element contains a full bibliographic description of an electronic file.
  2. EncodingDesc: The <encodingDesc> element details the relationship between the electronic text and the source (or sources) from which it was derived. Its use is highly recommended.
  3. ProfileDesc: The <profileDesc> element provides a detailed description of any non-bibliographic aspects of a text. Specifically the languages and sublanguages used, the situation in which it was produced, or the participants and their setting.
  4. RevisionDesc: The <revisionDesc> element provides a change log in which each change made to a text may be recorded. The log may be recorded as a sequence of <change> elements each of which contains a corpus or collection of texts, that share many characteristics, or you may use one header for the corpus and individual headers for each component of the corpus.

A corpus or collection of texts, which share many characteristics, may have one header for the corpus and individual headers for each component of the corpus. In this case the type attribute indicates the type of header. For example, <teiHeader type-"corpus"> indicates the header for corpus-level information.

Some of the header elements contain running prose that consists of one or more <p>s. Others are grouped:

What Standards Should I Conform To?

The cataloguer should observe the Anglo-american cataloguing rules 2nd ed. (rev), AACR2, and the international standard bibliographic for electronic resources, ISBD (ER) when creating new headers. AACR2 is used in the Source Description of the header, which is primarily concerned with printed material, whereas ISBD (ER) is used more heavily in the rest of the File Description in which the electronic file is being described.

Further Information


Briefing 73

Using The QA For Metadata Toolkit


About The QA Focus Toolkits

The QA For Metadata Toolkit is an online resources which can be used as a checklist to ensure that your project or service has addressed key areas which can help to ensure that the metadata you use in your service is fit for its intended purpose and will ensure that your application will be interoperable.

The QA For Metadata Toolkit is one of several toolkits which have been developed by the QA Focus project to support JISC's digital library programmes.

Accessing The QA For Metadata Toolkit

The QA For Metadata Toolkit is available from <http://www.ukoln.ac.uk/qa-focus/toolkit/>. The toolkit is illustrated in Figure 1:

Figure 1: The QA For Metadata Toolkit
Figure 1: The QA For Metadata Toolkit

Coverage

The toolkit addresses the following key areas

Embedding The Toolkit In Your Project Activities

The toolkit can provide access to a set of online checking services.

The toolkit can provide a simple checklist for ensuring that your project has addressed key areas in the development and deployment of metadata. As well as providing an aide memoire for projects the toolkit may also be useful in a more formal context. For example the answers could be used in initial scoping work at the early stages of a project or in reports to the project funders. In addition answers to the issues raised may be helpful for other potential users of the metadata or the final service provider of the project deliverables.

About The QA For Metadata Toolkit Resource

The QA For Web Toolkit described in this document provides a single interface to several online checking services hosted elsewhere. The QA Focus project and its host organisations (UKOLN and AHDS) have no control over the remote online checking services. We cannot guarantee that the remote services will continue to be available.

Further Information

Further toolkits are available at <http://www.ukoln.ac.uk/qa-focus/toolkit/>

.

Briefing 81

An Introduction To Folksonomies


What is a Folksonomy?

A folksonomy is a decentralised, social approach to creating metadata for digital resources. It is usually created by a group of individuals, typically the resource users, who add natural language tags to online items, such as images, videos, bookmarks and text. These tags are then shared and sometimes refined. Folksonomies can be divided into broad folksonomies, when lots of users tag one object, and narrow folksonomies, when a small number of users tag individual items. This new social approach to creating online metadata has sparked much discussion in the cataloguing world.

Note that despite its name a folksonomy is not a taxonomy. A taxonomy is the process, within subject-based classification, of arranging the terms given in a controlled vocabulary into a hierarchy. Folksonomies move away from the hierarchical approach to an approach more akin to that taken by faceted classification or other flat systems.

The History of Folksonomies

With the rise of the Internet and increased use of digital networks it has become easier to both work in an informal and adhoc manner, and as part of a community. In the late 1990s Weblogs (or blogs), a Web application similar to an online diary, became popular and user centred metadata was first created. In late 2003 delicious, an online bookmark manager, went live. The ability to add tags using a non-hierarchical keyword categorisation system was appended in early 2004.Tagging was quickly replicated by other social software and in late 2004 the Folksonomy name, a portmanteau of folk and taxonomy, was coined by Thomas Vander Wal.

Strengths and Weaknesses of Folksonomies

Robin Good is quoted as saying that "a folksonomy represents simultaneously some of the best and worst in the organization of information." There is clearly a lot to be learnt from this new method of classification as long as you remain aware of the strengths and weaknesses.

Strengths

Serendipity
Folksonomies at this point in time are more about browsing than finding and a great deal of useful information can be found in this way.
Cheap and extendable
Folksonomies are created by users. This makes them relatively cheap and highly scalable, unlike more formal methods of adding metadata. Often users find that it is not a case of 'folksonomy or professional classification' but 'folksonomy or nothing'.
Community
The key to folksonomies success is community and feedback. The metadata creation process is quick and responsive to user needs, new words can become well used in days. If studied they can allow more formal classification systems to emerge and demonstrate clear desire lines (the paths users will want to follow).

Weaknesses

Imprecision of terms
Folksonomy terms are added by users which means that they can be ambiguous, overly personalised and imprecise. Some sites only allow single word metadata resulting in many compound terms, many tags are single use and at present there is little or no synonym control.
Searching
The uncontrolled set of terms created can mean that folksonomies may not support searching as well as services using controlled vocabularies.

The Future for Folksonomies

Over time users of the Internet have come to realise that old methods of categorisation do not sit comfortably in a digital space, where physical constraints no longer apply and there is a huge amount to be organised. Search services like Yahoo's directory, where items are divided into a hierarchy, often seem unwieldy and users appear happier with the Google search box approach. With the rise of communities on the Web there has also come about a feeling that meaning comes best from our common view of the world, rather than a professional's view.

While there is no doubt that the professional cataloguing will continues to have a place, both off the Internet and on, there has been recent acceptance that new ways of adding metadata, such as folksonomies, need more exploration, alongside other areas like the semantic Web. The two models of categorisation (formal and informal) are not mutually exclusive and further investigation could only help us improve the way we organise and search for information. If nothing else folksonomies have achieved the once believed unachievable task of getting people to talk about metadata!

Further Information

The following additional resources may be useful:

Bookmark Sites

Images, Video and Sound

Other