Introduction

This document is intended to provide an overview of the technological issues surrounding the implementation of Clumps. Several possible types of implementation are discussed, and a list of related issues is presented along with some possible solutions. This document does not provide a prescription for building Clumps, but rather, suggests some possible implementations and assumes that the individual Clump implementors will be best placed to provide a solution that most accurately matches their needs. However, it is important to remember that one of the goals of the Clumping exercise is to provide interoperability between Clump implementations, and so it is vital that the various groups produce solutions that will inter-work with one another by the end of the project.

Why clumps?

As more and more networked services come on line, using different protocols to provide access to more and more varied types of data, problems of service discovery and navigation make themselves apparent. How does an individual searcher know which services are available? What type of data do they provide? How is that data accessed? How much does it cost? Which specialisations are best provided for by which services?.

One possible solution to these problems is to produce "clumps" of data services that share common features. A set of databases may be "clumped" around a specific geographic location, or a subject specialisation, or an intellectual domain. Clumps can therefore help to organise the content of the network and help searchers to select the right databases for the particular search they are engaged upon.

Clumps may also represent networked services in their own right, for instance, in the library community, a Clump of databases might be formed to provide a logical union catalogue, or to provide a regional Inter-library Loan service.

What is a clump?

A clump may be no more than a list of databases that share some common features like:

• Regional location

• Content type

• Subject matter

• Domain type (archives, museums, libraries etc)

• Service type (ILL consortia, logical union catalogues, etc)

Clumps can be grouped into Physical and Virtual Clumps. Physical Clumps would be a group of databases fronted by a single Target or Server, whilst Virtual Clumps are adhoc groups of databases that are accessed individually through several servers, but appear to the end uset as if they are a single entity.

Virtual clumps may only exist for the duration of a particular search session, and they may only exist within the user's search client software. Most current Z39.50 clients allow the user to build up lists of known databases, so that for any particular search, the user can select a set of databases that seem appropriate to the search, and the client will search the selected list as if it were a single server and present the search results to the user.

Some "client-side" Clumping may be more permanent, in that the user may group commonly accessed databases under logical names, and then simply select the named group for a search session. Popular clients such as ZNavigator or Willow already provide this kind of functionality. One may even suggest that the classic Web Browsers also provide a similar concept with their hierarchical bookmark functions.

At the other end of the spectrum, Clumps based around a given service may represent Clumping at its most sophisticated. Here the Clump is no longer simply a figment of software but a physical organisation, with administrative staff and a group of member service providers who collectively form the Clump. End users of such a service may not care about which given databases actually provide the service, they are only interested in the services being offered.

Who creates, administers, and disseminates clump information?

There seems to be three or four types of player in the Clump community who may provide some or all of the imformation required.

• The database owners and database access providers

• The clump owners or service providers

• The end user's host organisation

• Third party information gateway services or agencies

• A national agency

And there are at least three levels of descriptive metadata required to describe a Clump and its components:

· The Clump itself

· The databases which contribute to a Clump

· The servers that provide access to the databases

The database owners are probably the best people to create metadata that describes their databases and servers. Clump level metadata may come from the Clump owners themselves if it is a physical organisation otherwise Clump information may be provided by third-party agencies or data gateway providers at the end users site or at some data access service.

Some unanswered questions are:

• Should Clump and Database information be held centrally or should it be distributed throughout the network. If it is to be held centrally then should there be a national centre, or should it be concentrated in several places (ie end user organisations, Clump providers, etc). If it is to be distributed, then how does one locate it, and how does one navigate from one distributed Clump metadata site to another. How are the various caches of Clump information kept in synch and up to date?

• Should Clump and Database information be officially registered, and if so by whom? how?