Implementing a Subject Based Information Gateway

Product Comparison:
Information Gateway Software

Subject based information gateways are one approach to Internet resource discovery that is increasing in popularity. Such gateways are supported by a variety of software packages. For example:

Social Science Information Gateway using ROADS ,
Syracuse University Gateway to Educational Materials using Blue Angel Technologies,
Eco Companion Australasia using ASF,
Astrophysical Journal on-line using CNIDR.

This report is concerned with investigating the availability of resource discovery software for setting up such gateways. It is being carried out as part of the ROADS Project and is influenced by the requirements of the eLib Subject Gateways. A level of arbitrary choice has been made in selecting the products and enough contrast exists in their nature that direct comparison is not entirely possible due to the high degree of scalability exhibited in some of the software. It is intended to update this comparison as new products come to our attention.

The software decisions will always be specific to the needs of a project but it may be useful to specify a generically simple gateway project in order to go through the comparison process in the presence of such a rich choice.

Search Service Requirements:
A search service is to established which will provide a gateway to an index of a specialised subject area. The index is expected to initially contain several thousand references and to not likely exceed 20,000. The resources reside primarily on international higher education web sites. Metadata will be created partly by hand, partly by automated gathering techniques, and partly by a combination of these methods.

Since the widest possible access is desired, with particular emphasis on the library community, the gateway must be accessible from the WWW via standard browsers as well as provide minimal Z39.50 compatibility. Internet standard technology should be used wherever and whenever possible. Standard metadata formats need to be supported.

The initial resource descriptions will be added by the database administrator but thereafter the remote resource owner will be able to administer the entry so some form of minor password protection must be part of the implementation along with remote user write access.

Product Comparison:
Information Gateway Software
Both "free" and commercial software packages will be considered. The products described here represent an example of what what is available. The information is based on their distributors web-site descriptions. In general, for each product, there is browsable/searchable implementation base.

Brief descriptions and links to the product suppliers' web-sites follow. The products can be associated with the three common search and retrieve protocols. A closer look at these products is planned with perhaps some consideration and comparison of inherent advantages or limitations based on this background.

Z39.50	Whois++	LDAP
MetaStar c ASF f Index+ c Isite f SiteSearch c Cheshire II f	Harvest f ROADS f Meta Web c	ISAAC f
c=commercial,f=freeware

These product descriptions have been compiled by UKOLN for the ROADS Project as an aid to gateway implementors in identifying ROADS' position among other software choices. Every effort has been made to provide accurate information, however, no liability is accepted for errors, omissions or inaccuracies in the presentation.

MetaStar Product Suitedemo1demo2 demo3 demo4
Provider	Blue Angel Technologies Inc.
Purpose	A commercial package aiming at a complete solution. Components may be integrated with other products.
Approach	Able to supply "turn-key" solutions that will run on all popular platforms.
*functionality*
Server: Z39.50 version 3 compliant,	Comes with integrated off-the-shelf search engines (e.g., AltaVista, Fulcrum, etc.); configurable for both full-text and structured searches; allows simultaneous distributed search by a single query with merged results; implemented in Java and NT and is compatible with most Web servers.
Database:	A metadata database management tool that works with standard off-the-shelf ODBC databases provides configurable input and output to support metadata formats. By default, files are imported in XML format.
Search Tools:	Z39.50 gateway provides the ability to simultaneously search distributed servers with a single query using a Web-based query interface. Gateway is implemented in Java and provides an ability to customize the look and feel of the interface.
Special Capabilities:	Data Entry permits users to enter XML data from their web browsers directly into a relational database. Once the updates clear an optional workflow approval process, they are incrementally indexed and available for discovery in real time. Metastar Harvester is a crawler that gathers and parses XML and HTML and extracts designated data elements such as HTML tags. Extracted information is translated into either a structured or a custom file formal.
API:	Software Development Kits in C++ and Java for both the Client and the Server. Translation SDK for customising metadata for import and export from and to external systems.
Metadata Formats Supported:	Using XML internally, output mappings produce standard metadata formats.(GILS, FGDC, DIF, Dublin Core...)

Harvest demo1 demo2 demo3
Provider	University of Edinburgh
Purpose	Efficient indexing for Web resources with a focus on automated gathering.
Approach	Freely available software used widely.
*functionality*
Server:	The Broker subsystem takes information from one or more Gatherers, suppresses duplicate information, indexes the collection, and provides a WWW query interface to it.
Database:	Along with its own harvested collection, its Replicator subsystem replicates brokers (can be web sites) around the Internet allowing for very efficient use of Internet bandwidth.
Search Tool:	Web browser.
Special Capability:	The Harvest Gatherer uses the Essence subsystem to extract indexing information. Depending on the file type, Essence different data to index using a type-specific indexing algorithm (summarizer). The default summarisers can be customized if required.
API:	The software is written in C and much functionality has been extended with Perl. There is a lot of freely available software to accomplish a variety of tasks.
Metadata Formats supported:	Internally, SOIF records. Other formats are supported through mapping between standards.

ASF demo1 demo2 demo3 demo4
Provider	GILS and partners
Purpose	To provide index and search capability for local and distributed document sets. Intended to be a complete solution.
Approach	Using existing and enhanced "freeware" components, a minimal complete package is provided.
*functionality*
Server: ASFserv, Z39.50 version 3.	Ships with Isearch search engine. Supports both full-text and structured search. The API allows for configuration for use with popular search engines.
Database:	Supports ODBC compliant databases.
Search Tool:	Web browser and standard Z39.50 clients.
Special Capability:	Automated Gathering, HtDig crawls remote document locations to provide data for indexing.
API:	SAI is the programmable interface to the server and allows for different search engines
Metadata Formats Supported:	Compliance with GILS specification. Should be configurable to output well-known metadata formats.

Index+ demo1 demo2 demo3 demo4
Provider	Systems Simulation Ltd.
Purpose	To provide complete commercial solutions for collection management tasks including "multimedia" collections.
Approach	Modular toolkit approach that is customised to the individual application. Will work on all common platforms.
*functionality*
Server:	Index+ Server is the database engine for storage, search and retrieval allowing Web and Z39.50 to the Index+ collections.
Database:	Index+ database by default but access to varied data stores is enabled through individual APIs.
Search Tool:	Purpose-built clients for specific environments customized for each application. Visual tools for creating and configuring clients for specific applications.
Special Capability:	Multilingual support in both client and internal storage. Support for multimedia resource storage and indexing.
API:	C programming interface to basic services as well as libraries for particular environments: C/C++ and Visual Basic where applicable.
Metadata Formats Supported:	Configurable internal storage allows for all common metadata types to be represented. Custom output formats are configurable and programmable.

Isite demo1 demo2 demo3 demo4
Provider	CNIDR
Purpose	To provide Z39.50 version 3 capabilities.
Approach	Uses freely available software to assemble complete scalable gateways.
*functionality*
Server:	Working through the Isearch search engine zserve is a Z39.50 server
Database:	Access Isearch database by default with flexible access to others via call to scripts in a database initialization file.
Search Tool:	Web browsers and Z39.50 clients.
Special Capability:
API:	Called SearchAPI, used to configure the Isearch to work with other storage devices.
Metadata Formats supported:	GILS, FGDC, IAFA Templates, USMARK.

Project ISAAC demo1 demo2 demo3 demo4
Provider	Internet Scout Research Team at University of Wisconsin-Madison
Purpose	Research and development into information subject gateway issues aiming at linking separately maintained and geographically distributed metadata collections.
Approach	To establish a distributed test environment through collaboration with a limited number of institutions with similar aims. Efforts are based on Light-weight Directory Access Protocol for local collection storage and Common Indexing Protocol for the collection of remote collection descriptions.
*functionality*
Server:	Test implementations will use stand-alone LDAP servers. There will be two types of installation: Full-Node, and Collection-Node. Full-Node will store metadata collections (accessible by the LDAP server) and accept queries (via the LDAP client). Collection-Node will store metadata and allow itself to be searched (via its LDAP server) by any Full-Node installation (using the Full-Node's LDAP client) on behalf of its clients. The Collection-Node will be lightweight in that it will not implement a large part of the whole package ( no LDAP client, CIP client, HTTP server, or Index of other remote sites).
Database:	The API is defined to allow use of most relational databases.
Search Tool:	Web browser.
Special Capability:	It is supported in the LDAP protocol.
API:	The LDAP is an enhanced version of Michigan version 3 software which has a well documented C language API and a set of command line tools that can be used with scripting languages. Perl modules are also available.
Metadata Formats Supported:	Directory Information Format (DIF) that easily maps to most formats. Present Internet Scout applications are using DC.

ROADS demo1 demo2 demo3 demo4
Provider	eLib funded project with software written at Loughborough University.
Purpose	To provide a complete solution for subject gateway development. It is intended to be easy to use.
Approach	A toolkit that installs quickly, and is complete for small to medium sized applications.
*functionality*
Server:	Offers Keyword and Boolean searching with customizable result format. Cross-search capability with other whois++ based sites. Z39.50 gateway functionality provided by Index Data's Zebra server.
Database:	ROADS generates a Unix file system based inverted index from a set of template based IAFA records.
Search Tool:	Web browser and standard clients when Z29.50 is enabled.
Special Capability:	Remote record entry via HTML FORM input to the database by cataloguers. Automated gathering has been effected with Harvest collected SOIF records then converted to ROADS records.
API:	Made from Perl modules. Extra functionality, including ODBC type database support is gained by adding or changing Perl scripts.
Metadata Formats Supported:	IAFA based with conversion routines to generate Z39.50 compatible records. Many tools exist to map to common metadata formats.

Meta Web Tools gather search
Provider	DSTC and a consortium of Australian partners.
Purpose	To promote and enhance the use of metadata by Australian institutions.
Approach	To supply (freely to everybody) a set of software tools that enable information gateways.
*functionality*
Server:	Broker/Gatherer system similar to Harvest (see above) that indexes web resources storing metadata in SOIF records. This Harvest-like functionality lends itself to the advanced capabilities inherent in the JAVA language.
Database:	JDBC (Java DataBase Connectivity) drivers for mSQL and ORACLE (for the URL and main metadata databases)are included in the download. JDBC drivers generally are available whenever ODBC drivers exist.
Search Tool:	Web browser. Applet-based search tools?
Special Capability:	Remote entry: HTTP FORM for a remote user to enter the URL of site to be indexed via a Web browser. The URL is stored in a database which remembers the history of the indexing that is carried. Automated collection: The Gatherer selects the next URL to be indexed from the URL database. The site index is then stored in the main metadata database.
API:	JAVA programming interface.
Metadata Formats Supported:	SOIF records that are mapped to standard metadata formats.

SiteSearch Suite demo1 demo2 demo3 demo4
Provider	OCLC Inc.
Purpose	To provide libraries with Z39.50 (version 3) capability with access from WWW. Including image collection support.
Approach	Commercial solution running on UNIX and NT platforms focused on library collections.
*functionality*
Server:	WebZ serves Web clients and integrates with search engine to stored resources.
Database:	The package includes Database Builder for building collections that will be searched with OCLC's Newton Search Engine(also used by "FirstSearch"). Standard JDBC/ODBC compliant databases are supported.
Search Tool:	Web browser and Z39.50 clients.
Special Capability:	Many advanced features: Support for ISO ILL; user authentication; and multiple simultaneous database search; image collection support.
API:	JAVA based interface to allow Z39.50 and other resourses to be searchable WebZ.
Metadata Formats Supported:	MARK record formats, Dublin Core. Will support common metadata formats.

Cheshire II demo1 demo2 demo3 demo4
Provider	SIMS, UC Berkeley
Purpose	To provide access to library and other varied Z39.50 searchable resources. Focus extends to "multimedia" resources.
Approach	This product is being designed and implemented in a working library environment with ongoing evaluation of user response. Advanced Z39.50 capabilities are supported and extended. Cross-searching of diverse Cheshire sites is already offered.
*functionality*
Server:	Cheshire II search engine/server supports full-text and structured searching and with "closest mach" and resutls ranking.
Database:	Uses Berkeley Database.
Search Client:	Has purpose built Z39.50 client. Supports standard search clients.
Special Capability:	Much advanced work on search algorithms among other things (commercial companies are using these new designs).
API:	Source code has been made available. C/C++ with Z39.50 clients created in Tkl/Tk.
Metadata Formats Supported:	SGML for internal use with conversion to all common metadata formats available.

Product Comparison: Information Gateway Software

Product Comparison:
Information Gateway Software