The role of classification schemes in Internet resource description and discovery
Work Package 3 of Telematics for Research project DESIRE (RE 1004)
Table of Contents
The Thirty-sixth Allerton Institute, held in October 1994, was a starting point for the discussion of the use of classification systems in information networks. Most of the closing remarks by Marcia Bates and Sarah Thomas pointing to important working directions are still relevant to the issues covered in this report (Cochrane 1995):
"1. Exploit technology
a) for adding class numbers to materials in digital form
b) for linking subject access systems like LCSH and DDC.
c) for providing navigation and retrieval tools based on outlines of knowledge within classification schedules.
2. Extend the use of library classification to Internet resources
4. Share development strategies among and between various classification systems and thesauri, creating the ability to link with one another including multilingual and specialized systems
6. Build bridges from the past (e.g. , library collections classified by DDC, LCC, etc.) to the future (e.g., digitized full text collections)
12. Organize the classification schemes differently for the end-user than for the classifier and provide more than one scheme for users to browse and navigate before and after retrieval".
In the meantime, a variety of classification schemes are being used to bring systematic order to discovery oriented Internet services. The major universal schemes like DDC and LCC are mainly used by services run by the library community, while UDC is used primarily in Europe for subject specific services or for a general information gateway like NISS.
For services in several countries, like the Netherlands
or Sweden, national general schemes are used. Subject specific
international schemes are the dominating choice among subject-based
information gateways. However, many services develop browsing
structures on their own, similar to traditional classification
systems or have developed extensive local adaptations of existing
TABLE 4.1: Summary of reviewed classification schemes
|Number of Internet services using system|
|Integration with LCSH|
|Integration with other systems|
Table 4.1 shows some of the features identified in section 2 for all the reviewed classification schemes. The most used scheme in Internet services is DDC, which reflects its use in traditional and other online services. All the schemes have a multilingual capability to the extent that they use Arabic numerals, sometimes with added letters from the Latin alphabet. The real constraint on their use, however, is the availability of suitable translations and only UDC and DDC have been translated to any significant degree. Both LCC and DDC are integrated to some extent with LCSH and other schemes are integrated with relevant subject schemes, like NLM to MeSH and Ei to the Ei thesaurus. Most of the schemes are available in some digital form, although the exact way this is done varies between schemes. All the classification schemes are extensible, although not always in a completely logical manner, e.g. new subjects are fitted into any remaining gaps in SAB.
The use of a classification scheme in an subject-based Internet service would be extremely useful. It offers the following advantages:
The main criteria for the choice of classification system would normally be the scope of the service: its subject, language and geographic coverage and its user population.
In some situations the solution is quite obvious: for documents from all areas of knowledge, published throughout the world and in many languages and to be offered to an international multi-disciplinary community of users, a universal scheme can be selected, at least as a basic solution. DDC and UDC have a good multilingual capability due to the fact that they are entirely numerical and their schedules have been widely translated. If the collection however focuses on a rather limited subject area or discipline and there is a suitable international subject-specific scheme available, it should be used.
Problems will occur for services covering subjects where there are several different schemes (e.g. the earth sciences), although the use of concordances may help. There will also be problems when there is no comprehensive scheme available for a service covering a particular geographic area or subject scope (e.g. the European social sciences in SOSIG).
Perceived shortcomings in classification schemes are sometimes countered by adaptations and amendments to a scheme. For example: EEVL's variant of Ei, NISS and SOSIG's use of UDC, etc. Adaptations can arise from the use of classification schemes in this different electronic environment. One is not preparing a shelf arrangement of physical objects, but a digital, virtual display in an online system where the classification scheme itself is used as a browsing aid.
Another reason for adapting classification schemes is the potential, when using the exact version of a library classification system, that some parts of the scheme could remain completely empty while other parts of the scheme are overcrowded. This is due to the possibility that the subjects in existing digital documents might widely differ from those found in printed collections, or that the sizes of printed and digital collections in this subject area might also be different (cf. 188.8.131.52. Ei classification).
In spite of these good reasons to locally adapt schemes, changes to a scheme will hamper interoperability and co-operation.
Interoperability between subject services could be accomplished by an hybrid usage of universal and subject-specific schemes. Universal schemes could 'glue' different subject systems together and provide a coherent structuring principle at a top-entry level to subject specific services. Then, when moving into the subject services themselves, a subject-specific scheme could be used.
With regard to subject-specific classification schemes, it is advisable that only well-established schemes should be used. Whenever feasible, especially in small services, it might help if a classification from one of the universal schemes could be added. Conversion programs between classification schemes could help accomplish interoperability as well.
Home-grown schemes on the Web are not normally specifically designed to classify academic resources (for the research community) but aim to categorise a wider breadth of form and content: e.g. entertainment, commercial information, government information, etc. UDC, Dewey and LCC and subject-specific schemes, on the contrary, have been developed as schemes to classify the whole of knowledge and are especially useful for classifying academic resources, although as DDC shows, they can embrace popular types of content too. For an academic subject service, home-grown schemes should, therefore, not be developed.
Scheme conversion programs and methods of shared classification are considered very useful especially for subject specific services. Different methods of derived indexing recently developed, clustering and selection technologies, agents and concept maps, and similar techniques of automatic classification are soon expected to offer good improvement in services of limited size.
|Next||Table of Contents|
Page maintained by: UKOLN Metadata Group
Last updated: 14-May-1997