The role of classification schemes in Internet resource description and discovery
Work Package 3 of Telematics for Research project DESIRE (RE 1004)
Table of Contents
The Universal Decimal Classification (UDC) is an international scheme which endeavours to cover all areas of knowledge. Its origins lie in the Dewey Decimal Classification (DDC) which was adapted towards the end of the Nineteenth century by Paul Otlet and Henri LaFontaine in an attempt to create a universal bibliography. Until recently responsibility for the scheme belonged to the FID (Federation Internationale de Documentation), this responsibility was passed to a consortium of publishers (the UDC Consortium) in 1992. The original purpose of use for ordering and indexing entries in a printed bibliography have since been overtaken by its use for indexing and retrieval in computer based systems. The scheme consists of 60,000 classes (divisions and sub-divisions) as well as a number of auxiliary tables to describe countries, etc.
At least five Internet services are currently using UDC:
A brief e-mail questionnaire was sent to staff at each of these services (see Appendix 1). The answers are summarised throughout this report.
The BUBL Subject Tree <URL:http://www.bubl.bath.ac.uk/BUBL/home.html> aims to give comprehensive coverage of UK Internet resources in all subject areas.
The original Subject Tree uses UDC, but it should be noted that BUBL is in the process of transforming into a new service called LINK that will be using the Dewey Decimal Classification Scheme.
BUBL do not classify individual items, but do use the UDC to provide browsable sections for each subject area. The depth to which the classification scheme is used varies across the different subjects.
In the Deutsche Forchungsgemeinschaft (DFG) funded project GERHARD (German Harvest Automated Retrieval and Directory) <URL:http://gerhard.bis.uni-oldenburg.de/> the UDC is used in the enlarged and multilingual version of the ETH library Zürich. The aim of the project is to establish a service for searching and browsing German Internet resources. The documents are gathered by a robot, matched to the UDC entries by computer linguistic algorithms to create an searchable index and an automatically generated subject tree. The project started in October 1996 at the university library of Oldenburg. A prototype will be available during May 1997.
The NISS Directory of Networked Resources <URL:http://www.niss.ac.uk/subject/index.html> is a selective service that covers all subject areas.
NISS uses UDC in some detail, and browsing NISS involves working through UDC hierarchies, with the numbers displayed on the screen, above each section, rather like a virtual shelf mark. The Directory may be browsed in UDC number "inverted tree structure", UDC number linear structure (shelf order) or alphabetical subject heading order. All three output formats may be linked to from: <URL:http://www.niss.ac.uk/subject/index.html>.
NISS do not normally classify beyond the decimal point, although there are exceptions in the 'computing' and 'geography' sections.
Classmarks are added to the Directory on an ad hoc basis (i.e. if a sufficient number of new resources warrant the use of a new UDC classmark then one is added), since it was never intended that the full range of UDC classmarks would be used.
OMNI (Organising Medical Networked Information) <URL:http://www.omni.ac.uk/> is a selective subject service that catalogues resources relating to medicine.
OMNI currently use UDC to create browsable sections. However, they also use a subject-based classification scheme, the NLM, which is used to create separate browsing sections. It is interesting to note that like BUBL, OMNI plan to stop using UDC in the near future. They will only use NLM.
OMNI currently classify in as much detail as possible with the NLM scheme, but find this more difficult with the UDC scheme.
SOSIG (The Social science Information Gateway) <URL:http://sosig.ac.uk/> is a selective subject service that catalogues resources relating to the social sciences.
SOSIG does not use the UDC in its complete form, but has drawn upon UDC social science classification numbers to create the browsing sections of the service. A selection of 26 UDC numbers are currently used for the browsing sections. In cataloguing however, a larger list is used - 57 numbers have been selected with a view to increasing the number of browsing sections when a suitable number of resources have been placed in the new sections. No other UDC numbers are currently used.
The detail of the numbers varies from being at the top of a hierarchy (e.g. Philosophy = 1) to being fairly low down the hierarchy (e.g. Environmental Issues = 551.588). One advantage of classifying Internet resources is that you can assign more than one number to a resource, since they do not need to put in numerical order on a shelf - they can be kept in two place at once.
UDC is used in a number of online catalogues, databases and information retrieval packages. There are no current figures available for the extent of usage but a survey in 1977 found that of thirty one countries in Europe, eight used UDC in their national bibliographies with a further four countries adopting the scheme in the 1980's (McIlwaine 1991). The same paper mentioned use of the classification in Latin America and six French speaking African countries. In the UK UDC is used by a small number of university libraries including Aberdeen, Liverpool and Edinburgh. It is also used by a number of small specialised libraries such as the Scott Polar Research Institute, the Royal Greenwich Observatory and the British Architectural Library. Databases using the scheme include the HSE-line database on ESA-IRS and the HELPIS database on BLAISE-LINE. The scheme is used in the CDS/ISIS retrieval package and is available as a sort option in the InMagic information retrieval package.
Because the UDC is based on numerical notation the scheme is not language dependant. The classification exists in several different languages including English, French, German and Japanese.
GERHARD was the only service surveyed that will use the multilingual capability of the UDC.
Strengths of UDC
UDC is an agreed international standard which means it is widely recognised, used and available. It also means it is regularly (if not frequently) maintained. As already noted the scheme is not language dependant and exists in several different languages. The structure of the classification allows composite codes to be assigned to provide complex and detailed description of the subject content of a document or resource.
Strengths as seen by Internet services
Two of the services said they originally chose to use UDC so that it would be compatible with other key Internet services being developed in the UK at the time. The fact that it was being used by BUBL, and NISS were the key reasons for choosing the scheme. However, since then both BUBL and OMNI have decided to drop UDC despite this strength, which implies it has weaknesses which can over-ride this advantage.
UDC is a sizeable and comprehensive classification scheme which gives it a certain amount of flexibility. A number of the services said they were able to adapt it to suit the needs of their particular service, and cited this as an advantage. NISS suggested that UDC could be used to a variety of different levels of precision, which suited them. BUBL found the top hierarchy of UDC (i.e. the first sub-divisions were suitable for the way they wanted to present their subject tree. SOSIG said that a suitable subject specific classification scheme could not be found for the social sciences, and UDC covered the subjects that fell within the scope of SOSIG.
None of the services (except Gerhard) had had to pay to use UDC, and had had no problems over copyright when they published sections of it on their Web pages (there would have been copyright or licensing issues had the services used the machine-readable Master Reference File which belongs to the UDC Consortium).
Weaknesses of UDC
One of the main criticisms of the UDC is that the scheme is out of date. New knowledge is continually developing and existing knowledge is being redefined which causes problems for large schemes such as the UDC. Part of the problem of revisions to the scheme was also due to the unwieldy committee structure that was in place when the classification was under the control of the FID. The UDC Consortium have adopted a much more flexible approach to revisions of the classification, which are now done on a contractual basis.
The complex structure of the scheme is also considered a problem, the main tables of the classification can be combined with auxiliary tables and punctuation to express detailed concepts and relationships.
Many institutions and libraries only use a simplified version of the scheme (Buxton 1990). An introductory guide to the use of the UDC was published in 1993 and revised in 1995 to help users with the application of the scheme (McIlwaine 1993; McIlwaine and Buxton 1995).
Weaknesses as seen by Internet services
Both SOSIG and NISS suggested that UDC was not updated frequently enough and that this caused problems. NISS commented that the main divisions of UDC (the top level) have their roots in the 19th century and are not intuitive to modern academics. SOSIG said they found two problems with currency: Firstly, some of the subjects seemed to be old-fashioned and outdated. For example, UDC has a section called 'feminism' which SOSIG users have suggested should be called 'gender studies'. Secondly, some subject areas seem to have 'out grown' their UDC section. The UDC hierarchy does not always grow at the speed required to keep up with subject areas that have developed significantly over recent years. For example, 'environmental issues' and 'development studies' are growing areas where there is a lot of interest and a lot of new resources, but these are not particularly well catered for by UDC.
Some services felt that UDC did not cater adequately for some subject areas.
- SOSIG found the hierarchy did not give a high enough status to Environmental Studies, Developmental Studies.
- NISS suggested UDC is particularly weak in Medicine and Health sciences, which was borne out by OMNI who point out that UDC is not generally used in medical libraries, and who have now decided to drop UDC because of its weaknesses in the medical field.
Three services suggested that UDC was too complicated to use (BUBL, OMNI and BIZ/ED).
- OMNI found UDC was too complicated to use in the classification of detailed and complicated subjects. UDC requires the formation of composite classification codes to express complex subjects, whereas ROADS software wants an enumeration of all the codes you want to use. If you don't have this in electronic from this is difficult.
- NISS suggested that the decimal notation does not reflect the true hierarchy of subjects, which means that the hierarchical tree structure can't be maintained fully automatically, and that the implementation of wildcard searching presents problems.
The ETHICS (ETH library Information Control System) at ETH (Eidgenössische Technische Hochschule) in Zürich link each UDC number to related thesaurus terms in English, French or German.
Before the revised UDC schedules for 'computing' were published, the NISS service created more detailed classification under the classmark 518, the 'computing' section, by adding new numbers and headings that did not exist in the official UDC. They took the new headings from a descendant of a former ASK/NISS Thesaurus of software classifications. The headings from this were adapted with numeric notation to slot in under a classmark 518
Bibliographic records supplied by national libraries and bibliographic utilities are able to link classification data to other classification schemes through the MARC record (see 2.1.5 and 2.1.6: DDC review).
None of the services surveyed do this, although there is currently some potential for linking UDC and NLM classification notation in OMNI.
A Master Reference File of the scheme (in English) was created in 1993 and is currently maintained by the Technical Director of the UDC in the Hague. This contains approximately 60,000 classes (divisions and sub-divisions). Copies of the file are available under licence for a period of three years.
GERHARD had multilingual and enlarged version of the scheme. NISS and SOSIG have abridged versions on their Web pages:
SOSIG has a selected-group of numbers from the social science sections of UDC available on their Web pages: <URL:http://sosig.ac.uk/Subjects/udc-list.html>.
SOSIG also links to the 'UDC in brief' given on the NISS Web server.
NISS offer a guide to 'UDC in brief' This gives more detail of the main subject headings and numbers in the scheme: <URL:http://www.niss.ac.uk/resource-description/udcbrief.html>
NISS uses UDC in some detail, and so in browsing NISS the UDC numbers are featured electronically on the screen.
The copyright of the machine readable Master Reference File belongs to the UDC Consortium from which licenses can be purchased. Hard-copy versions of the classification are the responsibility of the individual publishers.
Out of the Internet services surveyed, only the GERHARD project had to pay an annual fee of 5,000 Dutch Guldens.
The responsibility for developing and managing the classification belongs to a Consortium, this is made up of members from Britain, Holland, Belgium, Spain and Japan (who represent the major publishers of the scheme) and the FID who were the previous owners of the classification. Developments and revisions are undertaken on a contractual basis; recent or ongoing revision work is being carried out in Astronomy, Linguistics and Philology, Medicine and Computer Science. The Consortium publish revisions annually in the publication Extensions and Corrections to the UDC. Other plans include moving the UDC towards a faceted classification and co-operation with the Dewey Decimal Classification Committee to jointly publish area tables so that both schemes would have a standard set of notations for expressing countries.
It seems that at the outset eLib funded subject services were keen to try and use the same schemes as other services with a view to being interoperable at some point in the future. Two of the services (SOSIG and OMNI) chose UDC for this reason. However, OMNI make the point that there is no need to use the same scheme as long as one scheme could be mapped onto another. OMNI find that their users much prefer the browsing sections that are based on the NLM scheme to the sections based on UDC. They plan to stop using UDC for this reason, but still see interoperability as being viable, since the subject headings from the two schemes could be mapped onto each other automatically for this purpose.
BUBL made the point that numbers from two different schemes can be simultaneously assigned. For their new service they use an OCLC CD-ROM to assign Dewey numbers and LCSH simultaneously.
|Next||Table of Contents|
Page maintained by: UKOLN Metadata Group
Last updated: 14-May-1997