The role of classification schemes in Internet resource description and discovery
Work Package 3 of Telematics for Research project DESIRE (RE 1004)
Table of Contents
One of the world's most widely spread classification schemes is the Library of Congress Classification System (LCC). This is largely due to the fact that every exported record from the Library of Congress contains their own classification of the item. Apart from being dominating, it is quite old: LCC will soon celebrate its centenary. In 1899 the Librarian of Congress Dr. Herbert Putnam and his Chief Cataloguer Charles Martel decided to start a new classification system for the collections of the Library of Congress (established 1800). Basic features were taken from Charles Ammi Cutter's Expansive Classification. LCC is an enumerative system built on 21 major classes, each class being given an arbitrary capital letter between A-Z, with 5 exceptions: I, O, W, X, Y (these appear at the second or third level in the notation for various subclasses). After this was decided, Putnam delegated the further development of different parts of the system to subject specialists, cataloguers and classifiers. Initially and intentionally the system was, and has remained, decentralised and the different classes and subclasses were published for the first time between 1899-1940. This has lead to the fact that schedules often differ very much in number and the kinds of revisions accomplished.
LCC notations are composed of repeated letters and numbers. Capital letters are, as mentioned above, used for main and subclass notations, for subdivisions further down the hierarchies LC uses Arabic numerals (i.e. Urban Transport = HE 305-311). There is no official comprehensive index to the LCC, the scheme is very extensive and included in about 46 volumes published by the Library of Congress.
The Library of Congress has developed a USMARC Format for Classification Data which allows classification data to interact with other USMARC bibliographic data and authority files (Guenther 1992; Guenther 1996). The format has been designed for use with DDC and LCC, (the two major classification systems used in the US), but permits communication with other classification schemes, especially UDC and NLM. A machine-readable version of DDC has been available for some years as part of its Editorial Support System (ESS), and recognising the benefits of this, the Library of Congress decided in 1993 to convert their 46 schedules into machine-readable form for the USMARC classification format. When complete, the LCC database will hold in the region of 450,000 classification records (Vizine-Goetz 1996a).
In this review, the use of Library of Congress Subject Headings (LCSH) in an Internet context will also be investigated to a limited extent.
There are several services on the Internet which claim that they use the LCC for classification of resources.
CyberStacks <URL:http://www.public.iastate.edu/~CYBERSTACKS/> is a centralised, integrated, and unified collection of World Wide Web and other Internet resources categorised using the Library of Congress classification scheme (McKiernan 1997). Resources are organised under one or more relevant Library of Congress class numbers and an associated publication format and subject description. The person who is responsible for the service, Gerry McKiernan of Iowa State University, has chosen to provide information within the six main classes Q (Science), R (Medicine), S (Agriculture), T (Technology), U (Military Science), V (Naval Science). The resources are first categorised, within a broad classification (i.e. Chemistry, QD), then within narrower subclasses (i.e. Physical & Theoretical Chemistry, QD 450-731), and then finally listed under a specific classification range (i.e. QD 467 Classification. Periodic Law). LC subject headings are not used and searching is not possible as the service is designed for browsing. Within the selected classes, the entire LCC notation scheme is published and as the service still is under construction, there is often a lack of content - no resources or links - behind the headings.
The WWW Virtual Library
The WWW Virtual Library <URL:http://www.w3.org/pub/DataSources/bySubject/LibraryOfCongress.html> is a quite comprehensive distributed subject catalogue which offers several ways to enter or browse the collected resources. One approach is through the LCC system, but this order is only applied at the very first level to produce a global ordering of a site that, in the most part is just an alphabetic list. All main classes in the LCC system are represented. The WWW Virtual Library does not use the classification notations themselves at all in their LCC structure, only their corresponding headings. The service can not be searched only browsed.
Two libraries trying to organise Internet resources in accordance with LCC
Seattle Pacific writes the following about their choice:
1. "The Library Web page is in LC order so that you can easily find additional sources of information in the same areas where that subject is found in the collection. For instance, if you are doing research on Education, you will find both books and Web references in L." (Seattle Pacific University Library 1996)
2. "At SPU, we decided to change from Dewey Decimal to the Library of Congress, because it provides for better subject access to the collection. LC has many more subjects than Dewey, which tends to put things in broad categories." (Seattle Pacific University Library 1996))
Both libraries only use the classification system as a way to organise resources at a first broad level (i.e. L Education, HA Statistics) and the classification notation disappears at the second level. All (main and sub-) classes are meant to be covered but this is done quite poorly and in a haphazard way. LC subject headings are not used and the sites are not searchable.
T.F. Mills Home Page - Some Humanities Links in Library of Congress Class Order
T.F. Mills <URL:http://www.du.edu/~tomills/> has organised his favourite sites according to classes in LCC on his private home page. This is done merely on first level and the subjects are chosen at random. It is not possible to search the site. He does not use LC subject headings.
NetFirst <URL:http://www.netfirst.ac.uk/> is a commercial search service run by OCLC composed of a growing collection with approximately 60,000 full bibliographic descriptions of Internet resources. The resources are compiled by OCLC staff and classified in Dewey Decimal Classification (DDC) and the opportunity is given to search for DDC notations. Until recently browsing has only been provided in indexes arranged alphabetically, but a browsing structure based on the DDC notations for each record has recently been made available (Oehler 1996). The service also uses LC Subject Headings to make the resources searchable (see also 2.1.1).
NISS: National Information Services and Systems
NISS <URL:http://www.niss.ac.uk/> is a service classifying Internet resources and offering several ways of access to them, through both searching or browsing. All resources in NISS are described by records in a Resource Descriptions Database (which underlies the Information Gateway). The records contain, among other information, UDC classifications and these constitute the basis for the browsing structure, and they are searchable as well. The goal is to cover all subjects.
NISS now wants people to give (four) LC subject headings when sending in a resource through the 'send-in-a-resource' form, but it remains optional to add that information. On their information page they write: "these headings represent a structured natural language thesaurus which has been applied to all records in the NetFirst service ... In order to make NISS resource descriptions more compatible with NetFirst records we are encouraging the use of Library of Congress Subject Headings as a supplement to the UDC classification ..." (NISS 1996) (see also section 2.2.1. on UDC).
INFOMINE <URL:http://lib-www.ucr.edu/> started January 1994 as a project at the Library of the University of California, Riverside. It "currently enjoys participation from librarians at all nine University of California campuses and Stanford University and is a good example of a multiple-campus, shared Internet resource collection project" (INFOMINE, 1997).
It is a combined search and browse service and it offers descriptions of and access to about eight thousand Internet resources, all of which are said to be of academic interest. Browsing can be done by date-'what's new', title of resource, table of contents (i.e. resource titles arranged under their subjects), subject and keyword. Searching is done in the following fields: title, subject and keyword. INFOMINE uses LCSH when cataloguing Internet resources, so each record in the database is given a number of Library of Congress Subject Headings, as is normally done for books in libraries. These subject headings are then used as subject-terms under which the resources are organised in the browsing structure, only they are displayed alphabetically. Before the user gets the opportunity to search or browse he/she has to choose one of the following main subjects/classes on the top page of the service: 1. Biological, Agricultural, and Medical Sciences, 2. Government Info., 3. Instructional resources: K-12, 4. Instructional resources: University , 5. Internet Enabling Tools, 6. Maps & GIS, 7. Physical Sciences, Engineering, Computing, and Math, 8. Regional & General Interests, 9. Social Sciences and Humanities, 10. Visual and Performing Arts
Each of these are called a INFOMINE and they not related to either LCC or LCSH practice. When the search results are presented there is a list at the end where the same subject heading can be searched for in the other INFOMINEs.
The minimum number of LCSH applied to each resource are 2-3 but many records have more than this, usually about 4-8 headings in addition to several key-words. There is no maximum upper limit on the number of LCSH terms used <URL:http://lib-www.ucr.edu/pubs/postlcsh.html>.
Several of the examples presented here are trying to cover a wide range of subjects, "the total knowledge of the world"; all (or most) of the classes in LCC. Not one of them is aiming at a limited service; only covering one subject. LCC is an international universal classification scheme, and is therefore unlikely to be the best choice for a service providing extensive information within one specific subject area. For that it would probably be more convenient to use an international subject-specific scheme. However, CyberStacks covers Science and Technology and seems to find LCC detailed enough for these subjects.
There is a difference between NetFirst and NISS and the other services mentioned. The former are making database records of each resource which then are organised mechanically in classification systems/structures, whereas the other services are making systems of Web pages, corresponding these to the classification scheme and linking to the resources from the appropriate schedule.
The classification in these Web page based services is mostly produced in a rather superficial way and subject headings are not used to describe the resources, so is there no opportunity to build connections between subject headings and classification. The notations exist all the way through different levels in NetFirst and NISS, whereas they disappear rapidly in the other services, with the notable exception of CyberStacks. The possibility to search for classification numbers (or subject headings) is only offered in NISS and NetFirst.. To navigate in the other services the user can only browse. Some of the services offer an alternative access to their collections by listing the titles of the resources alphabetically.
Often the services seem to have changed the original heading of the notation, and several of them have not used the precise notation belonging to LCC. As soon as this takes place, the services have altered the schemes, and to all intents and purposes, have developed a different classification scheme.
LCC is used extensively in the United States, Canada and Australia, principally in academic libraries in both card and online catalogues. LCC notations are present in many records in OCLC (Online Computer Library Centre) and RLIN (Research Libraries Information Network), but their presence depends on the institution inputting the bibliographic record. The Library of Congress, of course, always provides an LC notation, but other libraries have the option of supplying other classification numbers or none at all.
This can be compared with the use of LCC and LCSH in the Nordic countries. Out of the 40 largest libraries in Sweden one uses LCC, and another uses a local version of LCC. LCSH are not used at all in Sweden whereas in Norway the subject headings are used to some extent but LCC is not used at all.
Many library catalogues in the world have hidden LCC notations in their records since the cataloguers deliberately do not always erase this field when importing LC records mechanically. One example is LIBRIS, the Union catalogue of Swedish research and special libraries.
Some OPAC systems can search on Library of Congress Classification numbers. It is contained in field 050 of the USMARC bibliographic record.
LCC is an American system and has no multilingual capability. There is no well-known translation of the LCC schedules. The notation itself is not language dependent since it is a enumerative system, using letters (Latin) and numbers (Arabic) that are used in a considerable part of the world. Some classification numbers have captions in multiple languages, but these are primarily in the law schedules.
Since LCC is a general classification scheme it is likely to be less detailed in specific subjects than subject specific schemes. There are no examples of services using LCC for the classification of resources within a single subject area, like subject based information gateways.
This class is up-dated regularly and thus new subjects or aspects are added quite quickly. The fairly new and fast growing subject 'Computer Engineering, Computer Hardware' is covered by the notation span TK 7885-7895, a subclass to 'Electrical Engineering, Electronics, Nuclear Engineering'TK 1-9971.
Being a general scheme, LCC is not as good as Ei (Engineering Information Inc.'s subject specific classification scheme) and other subject specific classification schemes for detailed classification of large collections of engineering resources.
The class Fine Arts (N) still has many empty notations. It is organised by form (sculpture, painting etc.) first, then by chronology or nationality, and finally by artist.
The coverage of the social sciences in LCC is covered in different ways depending on which country is being classified, i.e. sometimes Law and Economics are covered, sometimes they are not. LCC reflects an American way of understanding the social sciences and how to organise resources within the subject. This does not always fit the ideas or demands of other countries.
The social science subject service SOSIG chose UDC to be compatible with other national services in UK such as NISS and BUBL. They did not take into account any other universal schemes such as LCC or DDC at the time that they made this choice.
Marcella and Newton note that "LC is the least international of the major general classification schemes. In its coverage it predominantly reflects a national collection; there is a distinct bias towards the social structure, history, law and cultural concerns of the United States. The notation is complex and not truly comprehensible internationally. In particular, the use of Cutter numbers, which has a linguistic dimension, is not likely to be consistently applied internationally." (Marcella and Newton 1994).
By contrast, the U.S. bias among LCSH in recent years has diminished, as a consequence of the inclusion of headings contributed by libraries other than the Library of Congress.
LCC's development history means that it can be seen more as gathering of a whole range of special subject classification schemes. It is up-dated in subject areas that change regularly but special libraries still do not seem to think that their subjects are covered well enough. The system is developed through continuous revision, at least in part as there are schedules that have not been up-dated since the nineteen-sixties, i.e. PB-PH Modern European Languages (1966).
LCC is sizeable and comprehensive and there are hundreds of letter-number combinations left making it suitable for expansion in the future.
The authority record for subject headings has a field for a classification number if there is a correlation. The USMARC bibliographic record format has fields for both classification data and for controlled subject headings. This provides one mechanism for linking the two.
The USMARC Format for Classification Data has fields for Index Terms (fields 700-754), so that LCC (or DDC) classifications can be expressly linked with subject or thesaurus terms like those in LCSH or MeSH (Guenther 1996, p. 190).
Vizine-Goetz (1996b) notes that "For LCC, explicit links between LC subject heading and class numbers occur in LC Subject Authority records that contain classification number fields. In an analysis of the LC Subject Authority file, Vizine-Goetz and Markey (1989) found that about 43% of topical subject heading records (MARC tag 150) contain LC classification number fields. Science and technology classes account for almost half (47.72%) of the class numbers. Efforts to improve the index to LCC may also lead to better links between LCC and LCSH."
" the LC cataloging and Policy Support of Office is reviewing the index structure of the LCC schedules and is consulting with classification expert Lois Chan on the design of a combined index to LCC. It is very likely that this work could lead to future efforts to form better links between LCC and LCSH." (Vizine-Goetz, 1996b)
The printed LCSH does provide examples of possible LC classes, but these are only suggestions. For some subject headings a range of LC classes are given, in others several different ones depending on the aspect of the subject matter, and for many there are no classes mentioned at all.
LCC is very extensive and unfortunately there is no official comprehensive index. The printed edition of the Library of Congress Subject Headings is the publication most likely to serve as a substitute for such a comprehensive index. It contains references to one or more class numbers after entries and subdivisions but no effort is made to maintain class notations in LCSH.
LCC is linked to DDC and other classification schemes (including UDC and NLM) in many MARC catalogue records supplied by bibliographic utilities (for more details see also 2.1.5 and 2.1.6: DDC review).
With reference to the USMARC Format for Classification Data, LC made a commitment in 1993 to complete the conversion of all LCC schedules into machine-readable form (Guenther 1996, p. 178) and this classification database is still growing and being improved. This machine-readable version of LCC fulfils a need first identified by Chan (1986). Currently, only a test file is available: <URL:http://lcweb.loc.gov/cds/newform#lccr>
Many LCC schedules, though not all, are available on a CD-ROM product called Classification Plus, available from the Library of Congress. Information on this product is available at: <URL:http://lcweb.loc.gov/cds/cdroms1.html#classplus>
LC does not offer any schedules or manuals online, i.e. on the Internet, but LC is represented on the Net and their Cataloging Distribution Service (CDS) describes their product range at: <URL:http://lcweb.loc.gov/cds/cdsintro.html>
There is, however, a private initiative by Matt T. Rosenberg, who has published LCC at a page called 'Library of Congress Classification System' at: <URL:http://www.geocities.com/Athens/8459/lc.html>. This site comprises LCC in hypertextual form and it allows browsing of notation ranges on the World Wide Web. It includes only schedules and does not include resources classified in accordance to the system. The site is not an authorised official Library of Congress service and as it is still under construction, not all classes are currently completed.
The classification system is not copyright in the United States but it is elsewhere. As with DDC (see section 2.1.8) the classification numbers would be able to be used without restraint but not the accompanying textual material.
More information on LCC can be obtained from the Cataloging Distribution Service of the Library of Congress. <firstname.lastname@example.org>
The LCC are generally not able to reflect the language of the material being catalogued, with some exceptions in some areas of the literature schedules (Wynar 1992).
Up-dating is accomplished in committee and published in a number of publications (both printed and online), most of which are available from the Library of Congress. Revised editions of individual schedules, Library of Congress Classification - Additions and Changes, Library of Congress Classification Schedules: A Cumulation of Additions and Changes, Cataloging Service Bulletin, Library of Congress Subject Headings. Information about developments and news in LCSH can also be found on the Internet. The LC Weekly Subject Headings List is at <URL:gopher://marvel.loc.gov:70/11/services/cataloging/weekly>. LCC is said to be flexible and "functionally up-to-date" since revisions are made independently within special classes and subclasses.
There are concordances from the Dutch national scheme NBC to and from LCC. These concordance tables are used in the Pica automated system.
|Next||Table of Contents|
Page maintained by: UKOLN Metadata Group
Last updated: 14-May-1997