Terminology Services and Technology
JISC state of the art review

September 2006

UKOLN
Authors:

Douglas Tudhope, University of Glamorgan
Traugott Koch, UKOLN, University of Bath
Rachel Heery, UKOLN, University of Bath

The full document is available in the original Word format. Other formats: PDF version at the JISC site and HTML (unedited conversion from Word).


EXECUTIVE SUMMARY

Purpose

Over the next two years, as part of its Capital Funding Programme, the Joint Information Systems Committee (JISC) is supporting further work to realize a rich information environment within the learning and research communities. This review is intended to inform JISC’s planning for future work related to Terminology Services and Technology, as well as to provide useful background information for participants in future calls, whether specifically featuring terminology or where terminology can be used to underpin other services.

Overview of report contents

This report reviews vocabularies of different types, best practice guidelines, research on terminology services and related projects. It discusses possibilities for terminology services within the JISC Information Environment and eFramework.

Terminology Services (TS) are a set of services that present and apply vocabularies, both controlled and uncontrolled, including their member terms, concepts and relationships. This is done for purposes of searching, browsing, discovery, translation, mapping, semantic reasoning, subject indexing and classification, harvesting, alerting etc. Indicative use cases are discussed.

One type of TS attempts to increase consistency and improve access to digital collections and Web navigation systems via vocabulary control. Vocabulary control aims to reduce the ambiguity of natural language when describing and retrieving items for purposes of information searching. Another type of TS is not concerned with consistency but with making it easier for end-users to describe information items and to have access to other users’ descriptions. This results in vocabularies (folksonomies) that may not be controlled, at least initially. The report reviews different kinds of vocabularies, according to their structure and their intended purpose. Potential benefits and return on investment are discussed. Named entity authority and social tagging services are discussed in some detail. Pointers are given on best practice guidelines and networked access to vocabularies, including key issues for future terminology registries.

The wider context of TS is considered. Relevant literature on user studies is reviewed. TS are located within an information lifecycle and within the JISC IE. Suggestions are made towards a more specific definition of Terminology Web Services within the JISC IE. Current work on Terminology Web Services is reviewed, along with work on mapping, automatic classification/indexing and repositories. Current projects that involve TS activity (JISC, UK, and international) are briefly reviewed.

Relevant standards are discussed, particularly for vocabulary representation; identification of concepts, terms and vocabularies; protocols and APIs.

Key points

TS can be m2m or interactive, user-facing services and can be applied at all stages of the search process. Services include resolving search terms to controlled vocabulary, disambiguation services, offering browsing access, offering mapping between vocabularies, query expansion, query reformulation, combined search and browsing. These can be applied as immediate elements of the end-user interface or can underpin services behind the scenes, according to context. The appropriate balance between interactive and automatic service components requires careful attention.

Return on investment should be considered in any service provision. There are various types of vocabularies serving different purposes, with different degrees of vocabulary control, richness of semantic relationships, formality, editorial control. There are a range of TS options, both interactive and automatic. There is potential for piloting TS to augment existing JISC programmes and projects.

TS are sometimes contrasted with free text searching, assisted by statistical Information Retrieval techniques in automatic indexing and ranking. These are not, however, exclusive options and there are opportunities in exploring different combinations of the two approaches. It should be noted that Web search engines have introduced elements of TS, by offering synonym and lexical expansion options. Thus TS should not be seen as antithetical to free text searching and can augment it.

There are many existing vocabularies. Different arrangements regarding ownership, maintenance and licensing of vocabularies can be found. The issue of who will maintain a vocabulary and the basis on which it can be described or made available in a registry needs investigation since this underpins systematic use of vocabularies in the JISC IE. This involves establishing business models for access to and maintenance of vocabularies.

Mapping is a key requirement for semantic interoperability in heterogeneous environments. Although schemas, frameworks and tools can help, detailed mapping work at the concept level is necessary, requiring a combination of intellectual work and automated assistance. The impact on retrieval is a key consideration.

Automatic classification and indexing tools are important for addressing the potential resource overheads in applying TS to indexed collections and repositories. Some tools are emerging that should be investigated for JISC purposes. Many argue for a combination of intellectual and automatic methods.

It is important to consider how people search for information when designing and evaluating TS, in order to reduce the scope for design errors and increase the possibility that services will actually be used. User studies should be conducted where feasible in ongoing project work.

TS should not be seen as an isolated, free-standing component. TS need to be considered within the wider context of the JISC IE, and need to be integrated with other components of the eFramework. They should be seen as forming a set of services that can be combined with a wide range of other services. There is a need for specifications of TS and their workflow, as part of the JISC IE.

Interoperability requires commonly agreed standards and protocols. Standards exist at different levels and types of interoperability. The prospect is emerging for a broad set of standards across different aspects of terminology services - persistent identifiers, representation of vocabularies, protocols for programmatic access, vocabulary-level metadata in repositories. Such standards are an infrastructure upon which future TS will rest but it is not feasible to wait for international agreements; international consensus will be influenced by operational experience. Pilot TS projects should orient to existing potential standards (in persistent identifiers, representations, protocols for programmatic access) and help to evaluate and evolve them.


RECOMMENDATIONS

The review was asked to include: “recommendations for further activities needed in this field, and the extent to which JISC should be involved in the work (both short and longer term), including collaboration with other organizations as a possible form of involvement". The following recommendations are listed according to the relevant section of the review, where further context may be found.

1. Introduction 

1.1 Purpose of this review 

1.2 Terminology Services overview

1.2.3 Combination of terminology tools and techniques 

1.3.2 Return on investment

2 Use cases - scenarios

3 Types of vocabularies 

3.1 Vocabularies by structure

3.2 Vocabularies by purpose 

3.2.4 eLearning purposes 

3.2.5 eScience purposes

3.3 Named entity authority and disambiguation services

3.4 Social tagging and folksonomies

3.7 Terminology Registries

Demonstrate the use of a terminologies registry within JISC IE testbed to include:

4 Activities with TS 

4.1 Studies and models of information seeking behaviour

4.3 Types of Terminology Web Services

4.3.4 Terminology Web Services review

4.4 Mapping 

4.5 Automatic classification and indexing 

4.6 Text mining and information extraction

5 Review of current terminology service activity 

5.5 Repositories 

5.6 Augmenting existing programmes and projects

6 Standards

6.1 Design

6.2 Representations

6.3 Identification of concepts, terms and vocabularies

6.4 Protocols, profiles and APIs

Back to top


Web page by: Shirley Keane
File last modified: Thursday, 11-Jan-2007 13:19:14 UTC