Information gateways: collaboration on content

Rachel Heery
UK Office for Library and Information Networking (UKOLN) University of Bath
Bath, UK
r.heery@ukoln.ac.uk

11 December 1999

This is a preprint version of an article of the same name appearing in Online Information Review, 24(1), 2000, 40-45. Please refer to the print version in any citation.

Introduction

Content is key for information gateways, the effective selection of high quality content forms the chief rationale for the gateway approach. It is therefore no surprise that discussion of content issues formed a central theme for participants at the first IMesh Workshop reported in this issue. In the workshop considerations of content ranged over a broad spectrum of topics including selection criteria, technical and policy issues, guidelines for managing the gateway as well as recommended standards and conventions. Indeed discussion of content quickly merges into consideration of business and service issues, the close relationship between the content of the gateway and the business model, delivery of service, and marketing direction cannot be ignored.

Within this report we will consider some of the issues raised at the IMesh Workshop, and in particular look at the implications of collaboration for the way content is created and managed. This piece is informed by discussions at the workshop but is not intended as an exact report which can be found elsewhere [Dempsey et al. 2000]. As regards terminology, we use the term 'subject gateway' to indicate search services to high quality web resources focusing on a particular discipline, and 'information gateway' for gateways that select material according to other criteria, often a national or regional approach, and that also offer subject access.

1 Gateways in context

Subject gateways do not exist in isolation. For the user they form part of the wider experience of resource discovery. On the one hand the searcher, whether child or professor, is faced with the compelling option of using the global services, such as Yahoo and Google, as a first step. The undifferentiated experience offered by such services can be compared with the specialist view offered by information gateways. Gateways offer the user an alternative to the generalist approach of the commercial global search engines, but in order to optimise the gateway service we need to gain a better understanding of user's requirements for particular types of search during the research and learning process. It would be instructive to compare information seeking behaviour and success rates for a variety of uses of global search engine as compared to gateways. Likewise one could analyse the differences in users' search strategies within the context of the traditional library, hybrid library and subject gateway.

It may be helpful to liken the subject gateway approach to the traditional 'departmental library' as the user's first port of call, a place where the user feels comfortable in a known environment and is able to gain skills to navigate a limited area of information. It would be interesting to see how far we could draw parallels between the requirements of users of subject gateways and the users of 'subject based' libraries. The users of both services benefit from an understanding of the boundaries and content of the information space they are accessing. Much the same issues face managers of specialist subject collections whether they exist on the web or in a traditional service environment, in both there is a need to relate one particular subject area to wider cross disciplinary information, and to manage the inter-relation of services.

It is important to connect user behaviour regarding Internet resource discovery with wider issues relating to the use of information in the learning and research processes. What does the user want from the research experience? Understanding users' behaviour in relation to gateways will enable gateway managers to position themselves within the mesh of existing gateways and meet the needs of their target audiences. The undergraduate may want to identify 'key texts', perhaps the top ten resources in a gateway, whereas the subject expert may be investigating the 'borders' of their knowledge, perhaps looking at treatment of their research topic in other disciplines. Such considerations begin to suggest opportunities for gateways to develop targeted services, providing expert and naïve users with varied interfaces to available data.

Given the world of different audiences, how will gateways choose their audience and optimise their services? The IMesh workshop recognised the need for existing gateways to collaborate in building user studies, for example by using a common methodology for collecting information on searchers and patterns of use, by agreeing on a common approach to analysis of usage log files, by reaching agreement on the common design of a user survey.

The workshop looked forward to some of this work being initiated within the activity of two projects. Within the EC funded Reynard project which starts in January 2000 Koninklijke Bibliotheek, the National Library of the Netherlands, hopes to co-ordinate some work on user surveys across various European gateways. In the international context the IMesh Toolkit project intends to explore user requirements by building a number of scenarios covering different user groups, this is a project funded jointly by the UK Joint Information Services Committee and the National Science Foundation as part of the Digital Libraries Initiative. [IMesh Toolkit].

2 Collaboration between gateways

By definition gateways select particular material from the wealth of resources available, so each individual gateway must consider its own strategy for relating to other gateways covering 'the rest of the world'. Collaboration has been established as essential for the growth and sustainability of existing gateways. Collaboration on content issues can be viewed from a variety of aspects:

Coverage

Region
Language
Subject areas

Co-operative metadata creation
Co-ordinated subject access
Cross searching facilities

Collaboration on this range of 'content activity' can extend from informal sharing of experience to formulation of shared policies and agreement on service levels. Sharing of experience might lead to a more formal recognition of 'best practice', while shared policies might encompass selection criteria and collection building. Service level agreements would be appropriate for managing the exchange of metadata and maintenance of cross searching services.

The IMesh workshop felt that the main focus of collaboration should be the level of service, stating in its conclusions that "the main motivation for collaboration is to improve user-services". It is clear collaboration offers benefits both from a service and user perspective. From the viewpoint of the user collaboration offers easier access to a range of collections, more consistency, and a more user-friendly interface. Service providers benefit from opportunities to provide more sophisticated services from the 'point of entry', for example manipulation of retrieved lists using subject clustering techniques, as well as financial savings from sharing software development and metadata creation effort.

We shall deal in a little more detail with some of the options for collaboration between gateways.

2.1 Coverage

Gateways need to scope their coverage and to ensure there is a good fit between the needs of their target users and their coverage policy. To achieve the greatest efficiency in terms of costs it might seem best if there was little overlap between the coverage of separate gateways. But would such cost efficiencies at the service provider side actually meet the requirements of the user? Given the range of interests of researchers and students across disciplines, language and curatorial sectors (museums, archives, libraries) it would seem that overlapping coverage has some benefit. If users do tend to access a 'favoured' gateway it is important that this specialist gateway offers the potential of a 'wide view'.

An increase in co-operative metadata creation will help to minimise the duplicated effort involved in overlapping coverage. Also the layering of a brokerage service on top of collaborating gateway services, enabling searching across services, might go some way to relieve the need for gateways to cover overlapping areas. However, as discussed at the IMesh workshop, it is important to realise that the same resource would be described and indexed quite differently depending on the target audience of the gateway. In order to ensure that a searcher gets their own 'discipline-centric' view of the 'overlapping areas' it will be necessary to ensure metadata is enhanced to include descriptions appropriate for a variety of audiences.

Management of duplication between gateways and agreements on coverage are made more complicated by the difficulty in precisely identifying particular web resources by means of an unambiguous identifier or URI. Any attempts to manage and measure overlap between gateways, and to de-duplicate retrieved lists require unambiguous identification of resources. This issue is being actively addressed by those gateways that are investigating the use of Digital Object Identifiers (DOI's) to identify their content.

2.2 Co-operative metadata creation

Creating resource descriptions is one of the most time consuming and costly tasks for gateways, and opportunities for making this process more efficient are attractive to service providers. This might be by shared cataloguing models using a union catalogue or by means of 'bi-lateral' agreements between gateways for exchanging metadata. Both approaches have a long history in the co-operative cataloguing models used for library MARC records. Services could agree to create metadata into a central or distributed 'union catalogue' of records, following a similar model to the OCLC CORC project [CORC]. On the other hand services might arrange with each other to define the scope of their areas of resource coverage and extend this by exchanging metadata with other services

Co-operative metadata creation has the big advantage of reducing the need for several service providers to create metadata for the same resource. Co-operation enables 'inheritance' of metadata, a service can take metadata created elsewhere and enhance its quality by adding additional data (e.g. subject terms), applying authority control, or adding multi-lingual data. Along the same pattern as with MARC union catalogues, algorithms can be introduced for ensuring the 'best' or 'richest' resource description is preserved.

Given the potential of the Resource Description Framework [RDF] there may be possibilities in future to link together metadata that describes the same resource, whether it is distributed in different locations, or created using different schema. The benefits in terms of saved cataloguing effort would be huge with the possibilities of sharing a single occurrence of metadata globally.

In order to achieve the goals of co-operative metadata creation there is a need for

Compatible technical solutions
Shared semantics (common metadata sets)
Shared syntax (HTML, RDF/XML )
Consistency of content (cataloguing rules)

The best way for services to explore the issues around sharing metadata is to participate in pilot systems. It is hoped that the Reynard project and the IMesh Toolkit projects will provide test-beds for metadata sharing activity.

2.3 Co-ordinated subject access

Provision of subject based searching options and effective subject-indexing are crucial. In this area, collaboration between gateways can aid multi-disciplinary access, in particular cross-searching across different gateways and cross browsing of different collections.

It is inevitable that gateways will choose different schemes for classification, controlled vocabularies, and thesauri, according to their users' requirements. There has been some work exploring the advantages and disadvantages of various classification schemes within the DESIRE project [Koch 1997]. In order to aid collaboration it would be useful to make an inventory of services using existing schemes.

How can gateways manage to offer cross searching and cross browsing when different schemes are in use? Various possibilities have been debated both within the workshop and in the context of national initiatives such as the UK Resource Discovery Network. These include

Introducing agreed mappings between different schemes and controlled lists.

This involves significant human resource and would be a time-consuming effort, even if it was confined to higher levels of controlled schemes.

Agreeing a top level list to be used in common by all collaborating gateways in addition to their own subject schemes.

This is an attractive option as it is limited in scope and might be achievable. It depends on reaching consensus amongst gateways on an existing high level list which needs only limited additional work. Would such a list be useful to searchers? Experience of using such schemes in the past is limited, and it would be useful to know whether end-users will approach cross searching at a broad subject level or will enter more precise terms.

Applying a common scheme across collaborating gateways by using a process of automatic classification.

Methods and techniques for automatic classification have been developed within the DESIRE project [Ardö 1999] and by OCLC [Scorpion].

2.4 Cross searching facilities

Provision of basic cross searching services comes with compliance to the Z39.50 protocol. As most gateways are compliant with the Z39.50 protocol some level of cross searching is possible from any configurable Z39.50 client. However to achieve a more effective cross searching service additional agreements are needed between collaborating services, not only on technical matters regarding a Z39.50 profile, but also on common indexing and resource description practices. It is hoped that work now taking place on an 'International Z39.50 Specification for Library Applications and Resource Discovery', currently know as the Bath profile, will provide the means for effective searching across OPACs, gateways and other information services [Lunau 1999].

In order to provide meaningful cross searching there will need to be sensible clusters of subject gateways. Such subject clusters will be international and include services from different sectors. Inevitably clusters will be formed from services funded by more than one agency. It is hoped that IMesh will offer a framework by which such diverse services can work together. There will be a need for organisational and business agreements to underpin such collaboration.

3 Collaboration with information providers

The trend has been for subject gateways to create their own metadata, whether by centrally located gateway staff or by remotely located experts, typically librarians or academics, inputting metadata to the gateway. An alternative, or additional, source of metadata would be from collaboration with the providers of resources, in effect the 'publishers' of web resources [Heery 1999]. The process of metadata creation could be enhanced by building up a relationship with information providers. Not only is this cost effective as it lessens duplication of effort, it assists in ensuring the currency of the information held by the gateway. The benefit to the information provider lies in improved dissemination of their information.

Metadata could be derived from direct input from the publisher from any metadata created when a resource is first made available. Such a shared metadata environment with publishers is being developed now for national bibliographic agencies [BIBLINK]. National agencies traditionally have responsibility for recording all hard copy material that is published within their country. This responsibility is being extended (to a greater or lesser extent depending on national policy) to digital information [Byrum 1999]. In the European library world the business model for bibliographic record-exchange is to a great extent based on the re-use of these national records. In the future subject gateways could make use of the national records produced by national bibliographic agencies as a basis for their own descriptions.

4 Registry and directory

As the number of gateways increases it is important to provide access to a maintained source of information about services and gateway activities. An initial inventory of services was gathered prior to the IMesh workshop. It would be useful for such a list to be maintained over time. This might be done through a registry of service profiles. In a similar way to a registry of resource description metadata, where resource discovery metadata schemas are disclosed, there would be benefits in disclosing information about individual gateway services in an authoritative registry.

The IMesh workshop suggested that a gateway service profile or schema could be drawn up. For each service the service profile would cover details of use of metadata standards and protocols, coverage and selection policies, conditions of use etc. The service profile would be broader than a collection description, so for example it would include details of compliance with metadata standards and protocols.

A registry of such service profiles would facilitate users' selection of gateways, as well as enabling brokers to negotiate and select gateways for satisfying queries. Human agents, people working within gateways or those accessing gateways, could use such a service registry to gain information about services of interest. If the service profile was expressed in a machine readable form such as in XML/RDF, then software agents could use such a registry in response to a query to broker access to suitable services..

The IMesh Toolkit project intends to develop registry software which could support IMesh registry activity. Some activity iin this area is already taking place. The DESIRE project is currently involved in producing a registry of schemas and profiles in use by DESIRE services [DESIRE]. The IMesh Toolkit registry work will build on this work. This registry activity will take place in association with the SCHEMAS European Fifth Framework accompanying measure, this project intends to provide a forum for implementers of schemas.

A definition of minimum requirements for collaboration might arise from consideration of such individual service characteristics. If services are to collaborate then for each collaborating cluster a minimum level service profile with 'minimum entry requirements' will need to be established.

5 Conclusion

The IMesh workshop provided a focus for discussion of collaboration on content. Participants hoped that such activity could be taken forward on an international basis. A number of European initiatives have been mentioned in this article, there are many others taking place world wide, particularly in Australia and the USA. Now that a basis for co-operation has been established, we need to realise outputs and initiate joint activities.

The popularity of establishing gateway services has been proven, now that a number are in existence it is likely that gateways will quickly evolve. They will take on a national role, become part of the wider research and learning experience, become embedded in institutional roles. It is important to encourage this development while retaining the distinctive features of gateways that have made them successful.

6 References

[Ardö 1999] ARDÖ, A. and KOCH, T., (1999), "Automatic classification applied to the full-text Internet documents in a robot-generated subject index." In: Proceedings of the Online Information 99 Conference. Manuscript at: <URL:http://www.lub.lu.se/~traugott/online99.htm>

[BIBLINK] BIBLINK project website. <URL:http://hosted.ukoln.ac.uk/biblink/>

[Byrum 1999] BYRUM, J.D., (1999), "Inclusion of information covering electronic resources in national bibliographies: results of a survey conducted May-June 1998." In: Conference proceedings of the 65^th IFLA Council and General Conference, Bangkok. <URL:http://www.ifla.org/IV/ifla65/papers/124-153e.htm>

[CORC] CORC. Cooperative Online Resource Catalog, OCLC. <URL:http://purl.oclc.org/corc>

[Dempsey et al. 2000] DEMPSEY, L., GARDNER, T., DAY, M. and VAN DER WERF, T., (2000), "The First IMesh Workshop Report." D-Lib, forthcoming.
[Note: this article has since been published in: D-Lib Magazine, 5 (12), December 1999. <URL:http://www.dlib.org/dlib/december99/12dempsey.html>]

[DESIRE] DESIRE project website. <URL:http://www.desire.org/>

[Heery 1999] HEERY, R., (1999), "Working with information providers." In: Belcher, M., Knight, V. and Place, E. (Eds.), DESIRE subject gateways handbook. <URL:http://www.desire.org/handbook/>

[IMesh Toolkit] IMesh Toolkit project website. <URL:http://www.imesh.org/toolkit/>

[Koch 1997] KOCH, T. and DAY, M., (1997), "Specification for resource description methods." Deliverable of Telematics for Research project DESIRE -- Part 3: The role of classification schemes in Internet resource description and discovery. <URL:http://www.ukoln.ac.uk/metadata/DESIRE/classification/>

[Lunau 1999] LUNAU, C., MILLER, P. and MOEN, W.E. (Eds.), (1999), BATH Profile: An international Z39.50 specification for library applications and resource discovery. Draft for public comment. <URL:http://www.ukoln.ac.uk/interop-focus/activities/z3950/int_profile/bath/draft/>
[Note: available from new URL: <URL:http://www.ukoln.ac.uk/interop-focus/bath/>]

[RDF] RDF home page , W3C. <URL:http://www.w3.org/RDF/>

[Scorpion] SCORPION project home page, OCLC. <URL:http://purl.oclc.org/scorpion>

Maintained by: Rachel Heery of UKOLN: The UK Office for Library and Information Networking, University of Bath.
Last updated: 21-Aug-2000..