The role of classification schemes in Internet resource description and discovery
Work Package 3 of Telematics for Research project DESIRE (RE 1004)
Table of Contents

1. Introduction and overview

1.1. Background

This report investigates the use of classification schemes to aid retrieval in a network environment, specifically with regard to the Internet. The library community, over many years, had appeared to favour subject indexing systems (the use of a controlled vocabulary to assign indexing terms to documents) over the use of traditional classification schemes (grouping documents into a hierarchical structure of subject categories). During the first period of the development of networked information services, many specialists, especially those from the computing community, also questioned the value of library subject description systems in principle, pointing to the accomplishments of full-text indexing software.

The increasing use of the Internet and the World Wide Web (WWW) for the storage and retrieval of vast amounts of information has, however, changed this perception. Two distinct ways of finding resources on the Internet emerged (Dodd 1996, p. 276). One approach consisted of the development of robot based search engines which could be used for powerful keyword searches of the contents of the WWW. These are extremely useful tools, although they have a tendency to return large amounts of irrelevant information. The other approach started with producing 'hotlists' which would encourage users to browse the WWW. The production of hierarchical browsing tools sometimes led to the adoption of library classification schemes to provide the subject hierarchy. At least one general discovery service, Yahoo! <URL:>, devised their own 'home-grown' classification scheme (or ontology) to give structured hierarchical access to the resources which they had indexed. Quality-controlled subject services, which gave access only to selected Internet resources, also understood that a browsing structure based on subject classification would be a desirable compliment to a search engine type service. Most subject services of this type, and almost all of the Electronic Libraries (eLib) Programme access to network resources services and the proposed DESIRE test-bed services currently use a classification scheme which can be browsed. A list of Internet sites that use library classification systems or subject headings can be found in Beyond bookmarks (McKiernan 1996) <URL:>.

This report will describe the advantages of resource classification for subject-based information gateways in the Internet and will analyse the advantages and disadvantages of different types of classification systems and will then review some important individual schemes.

1.2. Advantages and disadvantages of classification

The use of classification schemes offers one solution to providing improved access to WWW resources. Web sites have been created to act as a guide to other Web sites selected according to some pre-specified criteria, e.g. they are judged to be good quality resources or relevant to a particular subject-area. Some of these sites typically consist of an alphabetical list of subjects, and selected Web resources are listed below each one.

Examples include Argus Clearinghouse <URL:> and the WWW Virtual Library <URL:>. In this context, it can be understood why classification schemes have begun to be used to give added-value subject access to Web sites. A site that organises knowledge with a classification scheme demonstrates several advantages over sites which do not (cf. Svenonius 1983):

Classification schemes, however, can be sometimes subject to criticism:

1.3. The typology of classification schemes

1.3.1. Types of classification systems

There are several different types of classification systems around, varying in scope, methodology and other characteristics. Detailed descriptions cannot be given here, but it might be useful to know these different types, when trying to understand the terminology of this report and when decisions about which scheme to use is required.

Classification systems - by facet:

(The categories are not dichotomic, a classification can fit into more than one category).

The facet structure above shows what types of classification scheme are theoretically possible. In reality, the most frequently used types of classification schemes are: a) universal; b) national general; c) subject specific schemes, most often international; d) home-grown systems; d) local adaptations of all types.

The term 'universal' schemes is used for schemes which aim to include all subjects, are global geographically and multilingual in scope. Part 2 of the report deals with some of the most well-known individual schemes as examples.

1.3.2. Universal classification schemes

The first practical universal classification schemes were developed in the late-nineteenth-century as a response to the problem of organising libraries in the context of rapidly growing knowledge and an increase in the numbers of printed books. Universal schemes aim to be both comprehensive and also to expand and contract to fit the state of knowledge at any time.

The most widely-used universal classification schemes are those which were developed for the use of libraries since the late-nineteenth-century, notably the Dewey Decimal Classification (DDC), the Universal Decimal Classification (UDC) and the classification scheme devised by the Library of Congress (LCC).

Use of a universal, multidisciplinary classification scheme in an Internet context results in the following advantages (in addition to the general advantages of using a classification scheme, see 1.2 above):

Universal classification schemes, however, are subject to several criticisms:

1.3.3. National general schemes

Most of the advantages and disadvantages of universal classification schemes apply also to national general schemes (cf. 2.4. National general schemes), but they have additional characteristics that make them perhaps not the best choice for an Internet service that claims to be relevant for a wider user group than one limited to certain national boundaries.

Some of those characteristics are discussed here, relating to use of the scheme in the Internet environment:

When the choice was made in the Koninklijke Bibliotheek to use the Nederlandse Basisclassifatie for an Internet subject service (the Nederlandse Basisclassificatie Web), this was done mainly because the subject specialists already used the scheme for classification of printed works. If NBW outgrows its national boundaries, for instance in the DESIRE context, or by the participation of non-Dutch institutions, the conversion to another scheme will deserve serious consideration, to make wider interoperability possible.

1.3.4. Subject specific schemes

Most special subject specific schemes have been devised with a particular user-group in mind. Typically they have been developed for use with indexing and abstracting services, special collections or important journals and bibliographies in a scientific discipline. They do have the potential to provide a structure and terminology much closer to the discipline and can be more up-to-date, compared to universal schemes.

Examples of specific schemes are Engineering Information (Ei) for engineering, the National Library of Medicine (NLM) Classification for medicine and the British Catalogue of Music Classification. In subject areas like medicine, agricultural science and engineering, where there are international and widely recognised schemes available, subject services normally will prefer these or use them in combination with an universal scheme.

Subject specific schemes do have some drawbacks:

It is therefore advisable that only well-established subject specific classification schemes should be used to describe Internet resources.

1.3.5. Home-grown schemes

Some Web sites have tried to organise knowledge on the Internet by devising their own classification scheme. Yahoo!, created in 1994, lists Web sites using their own universal classification scheme or 'ontology', which contains 14 main categories. Each Web site collected for Yahoo! is listed under one of 20,000 categories or sub-categories (Steinberg 1996), the scheme being developed over time by the 20 people doing the classification work.

A study by Vizine-Goetz (1996a) showed that out of Yahoo!'s 50 most popular categories, all but four mapped perfectly to explicit DDC or LCC numbers or ranges. The results "... indicate that DDC and LCC have sufficiently wide topic coverage for classifying Internet resources". The structure of Yahoo! would require encoding to take advantage of the relationships between classes which is handled by notations in traditional library schemes, an important prerequisite for automatic routines and improved navigation.

Home-grown schemes do have some theoretical advantages over library universal classification schemes:

On the other hand, home-grown schemes have a number of disadvantages:

