The library, the catalogue, the broker

brokering access to information in the hybrid library

Lorcan Dempsey, UK Office for Library and Information Networking, University of Bath


This is a preprint version of an article which appeared in:
S. Criddle, L. Dempsey, R. Heseltine (eds). Information Landscapes for a Learning Society. London: Library Association, 1999.
A slightly revised version appeared in The New Review of Information Networking, Vol. 5, 1999. p 3-26.

Please refer to the print version in any citation.


Introduction

In his chapter on libraries and librarians in A history of reading, [1] Alberto Manguel calls librarians `ordainers of the universe' an epithet used, he tells us, by the Sumerians. He discusses the efforts of Callimachus to ordain the order of books at The Library of Alexandria and notes that `With Callimachus, the library became an organized reading-space'.

These phrases are useful handles on which to hang a view of what it is libraries do. Libraries may no longer aim to collect and classify all documented knowledge, but their selection and acquisition policies have `ordained' the view of knowledge and learning their readers have had, as well as which materials have become a part of the intellectual record libraries jointly create with archives, museums and others. The library has further organized these materials in ways that are useful to their users; it is not merely an unordered aggregation. In this paper I want to explore some aspects of this organization in a new environment. What `organized reading-spaces' will libraries create in a network society?

Organization, libraries and catalogues

Organization is central to what libraries do, and is a large part of the value they add. As such it is not surprising that the `organizaton of knowledge' has been central to the disciplinary claims of librarianship and an elaborate apparatus of cataloguing, classification and bibliography has developed. Indeed, Callimachus stands near the head of this tradition. He was responsible for the production of a work in 240 B.C. which Norris describes in her History of cataloguing as `a classified catalogue, a bibliography and a biographical dictionary all in one'. [2] The subsequent history of the catalogue provides interesting examples of discussion and argument about the value, and the cost, of such organizational skills and labour. Norris traces the development from systematic-classed lists of the ancient world (`ordainers' of the universe as they sought to capture all knowledge in their classifications), through the inventories of the medieval period, to the realization that the catalogue had to be more than a mere listing, it `was a key to the library'. The construction of such a key required a specialist art and governing rules. Of central interest here is the debate surrounding the catalogue of the British Museum, and the opposition between Anthony Pannizzi who developed his seminal 91 rules for the construction of the British Museum Catalogue and various lay proponents of a simpler -- cheaper and quicker -- list-based approach.

Why is a simple list not enough? Because neither the effective disclosure of library materials nor the user's best interests are well served by such a list. Much discussion of the nature and function of the catalogue go back to the celebrated `objects' described by Cutter. [3] Simply stated, these are that the catalogue supports the finding function (by known author, title or subject), and the gathering function (by author, or subject or kind of literature), and the selecting function (as to edition or as to character). An inventory or list will not support these functions very well. Authors may have different names. Works appear in many manifestations (Hamlet is a work; there are many manifestations of this work in different editions, compilations, formats, and so on). Works may not have titles or authors, or their titles or authors may not be clear. In a world where the description and collocation of works, manifestations, authors and subjects may be quite complicated, the organizational investment represented in the catalogue, and the skills and rules which enable it, are required to support the realization of Cutter's objects. Going beyond Cutter, some argue that the role of the catalogue is to surface through such organizational devices the knowledge that is represented in the collection, by allowing the user to navigate by subjects, works and authors.

Libraries have also organized the materials themselves, and in open access libraries this typically follows a classified approach. The reader can browse in `ordained' ways. The format of the materials -- books and journals -- is well known and has co-evolved with the development of libraries and the reading patterns of their users. These materials present further organizational devices -- contents pages, indexes, bibliographies -- which provide other avenues into the literature. [4] However, alongside the book collections are others -- reports, slides, journals, special collections, archives -- which are typically not well integrated into the catalogue itself or, physically, into the classified sequence on the shelves. They may be internally organized islands, where such organization is not consistent across collections. Each island may involve a different pattern of organization and use. They may have separate catalogues or other finding tools, and may be arranged in various ways, slides one way, for example, EU documentation another.

Until recently, these collections have been physically co-located in library buildings. In the largely print-based world, readers are accustomed to the labour of interacting with the apparatus of different collection types. The special collections, or the archives, or the slide collection, or the standards collection, might be in different places, with different levels of catalogue. Readers recognise that the catalogue deals with books, and that to discover whether a particular journal article is available may involve several steps in different tools. They move between the collections themselves, finding tools, conversation with colleagues, and advice from the librarian -- that `living catalogue only waiting to be consulted'. [5] A large part of the sustained experience of any library comes from the relationship between the various finding tools, the collections, and the places they collectively occupy. [6] These exist in complex relation, and complex practices and behaviours have developed around them, often specific to particular libraries. As in any complicated system, much use is directed by tacit understanding developed through custom and experiment. In fact, we know surprisingly little about such behaviour in the round: research tends to focus on the use of the catalogue, or on browsing behaviour, or on some other individual component; similarly, the progress of automation has been piecemeal, task by task. This is one reason why, despite several years of attention, we do not have very well developed views of what digital or hybrid libraries will be like. It is also a reason for occasional misunderstanding between reader and library over a particular change: where the latter might see a specific improvement or saving, the former might see disruption in a pattern of behaviour or expectation.

As digital resources multiply, organizational and behavioural practices are being modified, typically, at present, in ad hoc ways. Such resources introduce new `islands': they are usually not part of the traditional organizational apparatus, whether realized in the catalogue or through physical arrangement. There may be an electronic reserve collection; a collection of CD-ROMs; document discovery and delivery services (FirstSearch, BIDS, etc.); access to collections of specialist data sets (through the Arts and Humanities Data Service, or MIDAS (Manchester Information Datasets and associated Services), or the Data Archive, for example). The multiplication of such islands has several drawbacks. For users, it means that the use of resources becomes complicated. There may be diversity of organization of resources (different collection types, different catalogues); diversity of user interface and interaction pattern; there may be diversity of logins and conditions of use. Such diversity is a barrier to use: it erects fences, wastes time and dampens demand. For libraries, it means that there are additional demands in terms of training, support and collection management: as diversity increases, scale economies diminish.

How do these new developments relate to the traditional organizational techniques, the catalogue and the physical arrangement of materials?

The catalogue has only ever provided access to a part of the collection. Typically, the non-book `islands' have not been represented in it. Of particular significance here is the absence of the journal literature. It may contain title information, but typically not article details. As the volume and variety of non-book material increases, the catalogue increasingly becomes one resource among many. It is less and less the complete `key to the library'. Terry Hanson has discussed the motivation for developing an `access catalogue' which describes a range of resources to users, [7] and the LASER (Library Access to Selected Electronic Resources) system at Leeds is an interesting example of this. [8] However, the user now also has access to other resource description databases: for journals, for Internet resources, for electronic texts, and so on.

Clearly, these new resources are not physically integrated as part of a physical classified sequence. The experience may be partly replicated by providing web pages which present resources organized in simple ways. Increasingly, as suggested above, there may be databases of resource descriptions which may be used to present views of what is available. In this way, the intellectual and the physical arrangements come together. But there are some interesting ways in which the arrangement of digital resources differs from the physical. I have suggested that electronic resources do not currently have the uniformity of treatment which has developed with print resources. Each may have to be `learned' or handled in special ways. Furthermore, materials in the print world arrive in discrete packages, which are managed and used individually. Because of their `physicality', the user or the library has to makes the connections or links. In the new network space, resources may communicate, connections may be made, resources may change shape or be reconfigured. So, for example, documents, bibliographic references, or scientific data may be imported into a user workspace. An intermediate system may interact with several resources on behalf of a user, as happens with the `clump' projects, where several catalogues are queried in parallel. [9] Data may be automatically collected and indexed, as happens with the web robots. Processes may be automated through communication between applications, as where data is passed between a search service, a document delivery service, and an accounting system, without the need for manual intervention.

We are only in early stages of such developments, but this variety of resource, and of organizational approach, is characteristic of what is coming to be known as the `hybrid library'. The hybrid library can be understood as an organized attempt to come to terms with the multiple islands that library services are increasingly becoming and to reduce the difference in patterns of access and management between those islands.

To finish this part of the discussion, I want to pick up two points. The first is to do with the implications of some of this change for the `organization of knowledge' tradition of libraries. The second takes forward the remark made earlier about value and cost.

I began by talking about the goals of the catalogue, as elaborated by Cutter. An apparatus of codes and practices has developed to realize these goals: so that the assignment of headings, and perhaps, further authorities work, bring together names, works and subjects; descriptions allow retrieval of manifestations; and so on. Of course, these goals may be imperfectly and variably realized in today's catalogues. Furthermore, cataloguing theory has been developed in an environment where the catalogue typically exposed the content of a particular collection under a single organization's control, where that collection typically contained books, and where the catalogue was realized by manual means. The structure of the catalogue may not be apparent in some automated catalogues, and is not very effective across collections. In fact, the information retrieval approach, and its extension in network protocols such as Z39.50, involves a `flattening' of structure: the model is one of individual, unrelated records which describe information objects. So, the catalogue may not be well equipped to satisfy these theoretical cataloguing objectives. Increasingly, readers may be allowed to search across several catalogues, individually or as a federated resource, meaning that differences in practice between libraries and catalogues may be apparent. In a new development, the catalogue may be looked at alongside resources created within different curatorial traditions (archival finding aids) or from different sectors (abstracting and indexing services, online book sellers), where the same structuring apparatus may not be used. These issues raise serious questions for libraries and their practices -- both in terms of their theoretical basis and their practical application -- but further exploration is beyond my scope here.

Whatever the current state of the catalogue, a more general form of the argument advanced above in its defence remains relevant. The hybrid library cannot be a mere collocation of services, a listing on a web page. Where is the added value the library provides? The value of the library is that it saves users' time, that it releases the value of the resources it manages, that it effectively brings together users and resources over time. Organization continues to be central, but the techniques used need to develop with the needs of the user and the characteristics of the materials. In particular it is becoming clear that it is not enough for the library to provide access to a part of its collection through a catalogue; another through a set of annotated web links or resource database; another indirectly through abstracting and indexing services; and so on; with no relation between them. It is likely that the collections and services will be brought together at some higher level. Furthermore, It will be increasingly difficult to consider these services separately from the wider fabric of resources that is available. And, it is becoming more important to support other services than just searching or discovery: network resources can communicate with each other, and this involves further thinking about how to present and support user services. I consider some of these issues further in later sections. The purpose of the comparison with the catalogue is to highlight that the `added value' may not be obvious, nor may it emerge directly from user requirements. Indeed, its value may need to be promoted and defended.

Organization, libraries and networked information

I want to consider some of the characteristics of network resources, and of the environment in which they are used. Let us take first, as an example, the resources that might be available to the reader who is interested in books and journals: catalogues; abstracting and indexing services; document delivery services; and so on. Some of the characteristics of available resources are:

This lack of organized access has some implications:

This is the situation within the area traditionally looked after by libraries. However, library users might expect organized access to a wider range of materials. Libraries now operate in a shared network space which brings together users, service providers and resources in new combinations and balances. So, there are new divisions of labour in the learning and information domains. Take the example of document supply, where publishers, libraries and aggregators are realigning the pattern of delivery; or the creation of new learning environments which may bring together learning technologists, library and information people, and so on, in the creation of a new type of service. There are new forms of user behaviour and expectation, as for example, where communication technologies are reaching into writing and learning environments. There are also cross domain convergences, where previously distinct activities may be brought into new relation. For example, we are likely to see much greater collaboration in relation to cultural heritage and local history where libraries, archives and museums recognise shared access and preservation concerns. Or again, greater link up between the library and booksellers, publishers, or other suppliers of materials, as they recognize a shared interest in providing services which meet the various needs of users. Some libraries have bookshops on their premises; why not link their online services also?

These influences again point to an environment in which heterogeneous, autonomously managed information resources continue to increase in volume and variety. We can suggest some characteristics of this wider environment and its relationship to library services:

I suggested above that the goal of the hybrid library was somehow to overcome the fragmentation of the library service into multiple islands. We can now recognize that there are also multiple other islands in the network space the user occupies, and that the library needs to work to provide effective access to some of these also. This is in the midst of rapid organizational, service and technical change which makes taking decisions difficult. As other communities are facing some of these issues, and as libraries may have to try to provide access to resources outside of their control, common approaches at various levels would be useful. However, at the moment, as discussed by Ray Lester elsewhere in this volume, there is limited opportunity to seek consensus across these communities.

How might we move forward? What does integration mean? Some examples of what it might cover are:

In recent years, `broker' services have emerged which aim to provide some or all of these integration services. In the next section I explore broker systems, with particular reference to the library issues. Typically a broker will provide support for projecting a unified service across some part of the independently produced, distributed resources we have been discussing.

Organization, libraries and brokers

Building blocks

In his discussion of networked heritage information, Mike Stapleton provides a useful list of building blocks. I repeat this here with my own gloss. Mike's text is in quotation marks.

Brokers

Some examples of brokers from the library and related domains are listed in this section. In each case, an application, or `external mediator', facilitates access to diverse resources and supports data flow. The broker will be designed to support a particular business need, which determines what types of integration it supports. The term `broker' may be familiar from a distributed object environment, but I intend no specialised meaning here. A broker may be a set of annotated web links; it may deploy a more sophisticated apparatus which supports a richer business model or quality of service. The examples given here do rather more than provide a set of web links or resource database, even if they are still rather early examples.

The focus of some of these developments is discovery: to varying degrees, they hide differences and collate the results from several different underlying discovery systems. Some go beyond this to address several functional areas and allow data to flow between them. For example, the hybrid library project Agora aims to pass data about selected articles from a discovery to a `locate' function where it may be matched against some holdings data; data may then be passed to a request function, where it forms the basis of a request message. It hides the difference between different discovery systems and different request systems.

These examples prompt some comments, before discussing brokers in more general terms:

MODELS and brokers

This discussion draws on work still developing with our MODELS (Moving to Distributed Environments for Library Services) project which is describing the MODELS Information Architecture (MIA). [16] MODELS started from the view that the development of a high-level architecture was a useful support to discussions of interworking in the library area. It has moved forward through a series of invitational workshops, desk research, and it has influenced the development of several systems and services in the UK and beyond. The MIA provides a common frame of reference and vocabulary which has achieved some currency. The current state of the MIA is described elsewhere and I do not propose to discuss it in detail here. [17] The initial work described a high level broad framework: (see Figure 1.1) this now being refined with a view to implementation, and will be further reported elsewhere.

A model of a broker

The broker provides infrastructure for managed access to physically distributed resources. MODELS has generalized the services provided by such `brokers' in the following way:

These are logical functional groups which have worked well when measured against a range of emerging developments. The advantage of such an approach is that it separates different aspects. So if a service is offered through a different protocol, or if a new service is added, it should not be necessary to change the user access level. The appropriate transformations will be effected in the middle layer. Similarly, users may see available resources through different landscapes without having to alter the way in which those resources are organized. We begin to see how new resources might be routinely `shelved' by being added to the lower layer. We can also see how the flexibility introduced in the user access layer makes it plausible to consider a variety of customized approaches into the available resources.

We can also see the advantages of increasing modularization. The broker may be able to call service components from various places to provide its service. For example, terminology services (thesauri) might be used in query expansion.

This is a somewhat idealised presentation, and current brokers in the areas under discussion are rather less well developed than this sketch suggests. The purpose of MIA has been to provide some common ground for discussion of these issues. It is also clear that in implementation many difficulties will be experienced. These brokers are typically working with large, diverse collections of legacy data.

A note on metadata

In the library domain, discussion has tended to focus on so-called `item' level metadata (descriptions of individual books, articles, and so on). The new environment has new requirements. The broker needs to have access to various types of metadata to support its operation. This is data about its environment and the resources in it. It should be clear that metadata is of central importance. Agility in a distributed environment depends on the ability to identify and use components, and this increasingly relies on metadata. Metadata will be pervasive of distributed information environments. [19]Metadata will be associated with information objects, with applications, with people, with organizations. It will support operations by people and by programs, providing them with advance knowledge of the characteristics of objects of interest and supporting sensible behaviour. In relation to brokers we can note:

In the current environment it is likely that brokers may be configured with this type of data. In due course it will be stored in directory and registry services which the broker queries.

There are various ways to create machine- and human-readable descriptions of collections, applications and user profiles at the moment. None commands universal assent. Approaches may be embedded in particular application and/or professional domains. A review of some current approaches to collection and service descriptions has been prepared as part of the MODELS project. [21]

Brokers, islands and hybrid libraries

I have suggested that one way of looking at the hybrid library is to see it as an organized attempt to come to terms with the multiple islands that library services are increasingly becoming and to reduce the difference in patterns of access and management between those islands. Islands may be collections which have their own catalogues, organizational patterns or access mechanisms. They may be network or CD-ROM databases or repositories, or print or other material collections. And so on. There may also be service islands with different procedures and forms of interaction (Inter Library Loan, for example).

Such islands will be material -- a part of the physical library collections -- or digital. Many, but not all, material collections will have digital catalogues. Digital resources -- image and document repositories, catalogues, and so on -- will be interfaced to the network in various ways, allowing different levels of interaction and data exchange.

Initial approaches to organizing such environments have focused on developing organized sets of links. Resource databases take this a step further, providing search and browse access to descriptions of collections and services. These might be seen as simple brokers which provide discovery services. However, they deliver us to the door of resources, rather than delivering the content of resources to us. Broker services are emerging which support a complex of services. These operate in particular domains (library, museum) and so far have limited production use. Other domains are developing similar approaches. Brokers may have different business aims. A common requirement is to allow cross searching of heterogeneous metadata resources (e.g. library catalogues). Another is to automate end-to-end processes (e.g. Agora will attempt to automate the whole chain of document discovery, locate, request, deliver). Brokers may integrate access to other service components, terminology services, authentication services, user directories, and so on. A developing pattern seems to be where an `information landscape' is presented which allows navigation, discovery and selection of collections and services, followed by resource specific interaction. Standards-based interaction between brokers and resources confers some benefits, but many resources are not made available through standard interfaces. The level of interaction possible between a broker and resources will vary. For this and other reasons the level of abstraction away from underlying resources the broker provides may not be very high in some cases. The ultimate ambition is to present a unified service over independently developed resources. Early experiences suggest that the construction of brokers is a complex undertaking, and it will be interesting to see how far they are developed. It should be noted that broker access to a resource is not incompatible with continued access direct to the resource itself. A fuller treatment of some of the problem issues that might be encountered in trying to build a system which brokers access to document discovery and delivery services is given elsewhere. [22]

Some final notes on brokers:

Conclusion

I began with some discussion of the catalogue, and moved on to a general discussion of brokers. They are different types of thing, and, indeed, one of the challenges for the library is to broker access to different catalogues, or to the catalogue alongside other resources. The association is for a different reason. Libraries add value by saving users' time and effort, by ensuring they are united with the resources most useful to them, by releasing the value of resources over time. To continue to provide these services, libraries must move to a new level of activity which involves integration, not merely collocation, of services. In the current environment, how to do this is not straightforward and we are witnessing a range of interesting experiment and exploration. Users may not expect libraries to `ordain the universe', but they do look to them to help them do useful things in a complicated network space. Thinking about how to do this brings us to the centre of what libraries are about.

References

1. Manguel, A. A history of reading. London: HarperCollins, 1996.

2. Norris, D., M. A history of cataloguing and cataloguing methods 1100-1850: with an introductory survey of ancient times. London: Grafton &Co, 1939.

3. Cutter, C. Rules for a dictionary catalogue. 4th ed. Washington: Government Printing Office, 1904.

4. This complementarity is discussed in: Oddy, P. Future libraries: future catalogues. London: Library Association Publishing, 1996.

5. These words are attributed to Sir Henry Ellis, Director of the British Museum in Pannizzi's time, in Norris, op cit. 204.

6. This relationship is further briefly discussed in: Dempsey, L. Afterword: places and spaces. In: Towards the digital library: the British Library's Initiatives for Access programme. London: British Library, 1998.

7. Hanson, T. The access catalogue gateway to resources. Ariadne, 15, 1998. Available at <URL:http://www.ariadne.ac.uk/issue15/main/>.

8. <URL:http://www.leeds.ac.uk/library/laser/>

9. Dempsey, L and Russell, R. Clumps -- or organized access to the scholarly record. Program, 31(3), 1997, 239-249.

10. Bell, A. The impact of electronic information on the academic research community. The New Review of Academic Librarianship, 3, 1997, 1-24.

11. Winograd T. From computing machinery to interaction design. In: Denning, P. and Metcalfe, R. (eds), Beyond calculation: the next fifty years of computing, Spinger-Verlag, 1997, 149-162. Also available at <URL:http://hci.stanford.edu/~winograd/acm97.html>)

12. Blinco, K. LIDDAS -- an Australian document delivery project. Presentation at Information Ecologies: the impact of new information 'species'. A conference organized by the Electronic Libraries Programme and coordinated by UKOLN. 2-4 December 1998, Viking Moat House Hotel, York. Presentation is linked from conference report at: <URL: <URL: http://www.ukoln.ac.uk/services/elib/events/information-ecologies/>.

13. Dempsey, L. and Russell, R. `Clumps' -- or distributed access to scholarly material. Program, 31(3), July 1997, 239-249.

14. The ROADS cross searching service is available at
<URL: <URL:http://www.ukoln.ac.uk/metadata/roads/crossroads/>.

15. Valkenburg, P. (ed). Standards in a distributed indexing architecture, draft version 1. 24 February 1998. <URL:http://www.terena.nl/projects/chic-pilot/standards_v1.html>

16. Further information about MODELS can be found at
<URL: <URL:http://www.ukoln.ac.uk/dlis/models/>.

17. The discussion here and elsewhere throughout this paper draws on: Dempsey, L; Russell, R; Murray, R. A utopian place of criticism? Brokering access to network information. Journal of Documentation, 55(1), 1999, 33-70.

18. <URL:http://www.cdl.org>

19. Dempsey, L. and Heery, R. Metadata: a current view of practice and issues. Journal of Documentation, 54(2), 1998, 145-172.

20. Allen, J. and. Mealling, M. The architecture of the Common Indexing Protocol (CIP). Request for Comments draft version 1. 1997. Available at <URL: ftp://ftp.ietf.org/internet-drafts/draft-ietf-find-cip-arch-01.txt>

21. The MODELS collection description study is available at: <URL: <URL:http://www.ukoln.ac.uk/dlis/models/studies/>

22. Dempsey, L; Russell, R; Murray, R. A utopian place of criticism? Brokering access to network information. Journal of Documentation, 55(1), 1999, 33-70.