Product Comparison:
Information Gateway Software


Subject based information gateways are one approach to Internet resource discovery that is increasing in popularity.  Such gateways are supported by a variety of software packages.  For example:

Social Science Information Gateway  using ROADS ,
Syracuse University Gateway to Educational Materials using Blue Angel Technologies,
Eco Companion Australasia using ASF,
Astrophysical Journal on-line using CNIDR.

This report is concerned with investigating the availability of resource discovery software for setting up such gateways. It is being carried out as part of the ROADS Project and is influenced by the requirements of the eLib Subject Gateways.  A level of arbitrary choice has been made in selecting the products and enough contrast exists in their nature that direct comparison is not entirely possible due to the high degree of scalability exhibited in some of the software.  It is intended to update this comparison as new products come to our attention. 

The software decisions will always be specific to the needs of a project but it may be useful to specify a generically simple gateway project in order to go through the comparison process in the presence of  such a rich choice.


Search Service Requirements:
A search service is to established which will provide a gateway to an index of a specialised subject area. The index is expected to initially contain several thousand references and to not likely exceed 20,000.  The resources reside primarily on international higher education web sites. Metadata will be created partly by hand,  partly by automated gathering techniques, and partly by a combination of these methods.   

Since the widest possible access is desired, with particular emphasis on the library community, the gateway must be accessible from the WWW via standard browsers as well as provide minimal Z39.50 compatibility. Internet standard technology should be used wherever and whenever possible.  Standard metadata formats need to be supported.

The initial resource descriptions will be added by the database administrator but thereafter the remote resource owner will be able to administer the entry so some form of minor password protection must be part of the implementation along with remote user write access.


Product Comparison:
Information Gateway Software

Both "free" and commercial software packages will be considered.  The products described here represent an example of what what is available. The information is based on their distributors web-site descriptions.  In general, for each product, there is browsable/searchable implementation base. 

Brief descriptions and links to the product suppliers' web-sites follow. The products can be associated with the three common search and retrieve protocols. A closer look at these products is planned with perhaps some consideration and comparison of  inherent advantages or limitations based on this background. 

  Z39.50   Whois++   LDAP
MetaStar c
ASF f
Index+ c
Isite f
SiteSearch c
Cheshire II f
Harvest f
ROADS f
Meta Web c
ISAAC f
c=commercial,f=freeware

 

These product descriptions have been compiled by UKOLN for the ROADS Project as an aid to gateway implementors in identifying ROADS' position among other software choices. Every effort has been made to provide accurate information, however, no liability is accepted for errors, omissions or inaccuracies in the presentation.

 MetaStar Product Suitedot_clear.gif (46 bytes)demo1dot_clear.gif (46 bytes)demo2dot_clear.gif (46 bytes)demo3dot_clear.gif (46 bytes)demo4
Provider Blue Angel Technologies Inc.
Purpose A commercial package aiming at a complete solution. Components may be integrated with other products. 
Approach Able to supply "turn-key" solutions that will run on all popular platforms.
functionality
Server:
Z39.50
version 3 compliant,
Comes with integrated off-the-shelf search engines (e.g., AltaVista, Fulcrum, etc.);
configurable for both full-text and structured searches;
allows simultaneous  distributed  search by a single query with merged results;
implemented in Java and NT and is compatible with most Web servers.
Database: A metadata database management tool that works with standard off-the-shelf ODBC databases provides configurable input and output to support metadata formats. By default, files are imported in XML format.
Search Tools: Z39.50 gateway provides the ability to simultaneously search distributed servers with a single query using a Web-based query interface. Gateway is implemented in Java and provides an ability to customize the look and feel of the interface.
Special Capabilities: Data Entry permits users to enter XML data from their web browsers directly into a relational database. Once the updates clear an optional workflow approval process, they are incrementally indexed and available for discovery in real time.
Metastar Harvester is a crawler that gathers and parses XML and HTML and extracts designated data elements such as HTML tags. Extracted information is translated into either a structured or a custom file formal.
API: Software Development Kits in C++ and Java for both the Client and the Server. Translation SDK for customising metadata for import and export from and to external systems.
Metadata
Formats
Supported:
Using XML internally, output mappings produce standard metadata formats.(GILS, FGDC, DIF, Dublin Core...)


 Harvest dot_clear.gif (46 bytes)       dot_clear.gif (46 bytes)demo1dot_clear.gif (46 bytes)demo2dot_clear.gif (46 bytes)demo3dot_clear.gif (46 bytes)
Provider University of Edinburgh
Purpose Efficient indexing for Web resources with a focus on automated gathering.  
Approach Freely available software used widely.
functionality
Server:
The Broker subsystem  takes information from one or more Gatherers, suppresses duplicate information,
indexes the collection, and provides a WWW query interface to it.
Database: Along with its own harvested collection, its Replicator subsystem replicates brokers (can be web sites) around the Internet allowing for very efficient use of Internet bandwidth.
Search Tool: Web browser.
Special Capability: The Harvest Gatherer uses the Essence subsystem to extract indexing information. Depending on the file type, Essence
different data to index using a type-specific indexing algorithm (summarizer). The default summarisers can be customized if required.
API: The software is written in C and much functionality has been extended with Perl. There is a lot of freely available
software to accomplish a variety of tasks.
Metadata
Formats
supported:
Internally, SOIF records. Other formats are supported through mapping between standards.

 

 ASFdot_clear.gif (46 bytes)             dot_clear.gif (46 bytes)demo1dot_clear.gif (46 bytes)demo2dot_clear.gif (46 bytes)demo3dot_clear.gif (46 bytes)demo4
Provider GILS and partners
Purpose To provide index and search capability for local and distributed document sets. Intended to be a complete solution.
Approach Using existing and enhanced "freeware" components, a minimal complete package is provided.
functionality
Server:
ASFserv,  Z39.50
version 3.
 Ships with Isearch search engine. Supports both full-text and structured search.
The API allows for configuration for use with popular search engines.
Database: Supports ODBC compliant databases.
Search Tool: Web browser and standard Z39.50 clients.
Special Capability: Automated Gathering,  HtDig crawls remote document locations to provide data for indexing.
API: SAI is the programmable interface to the server and allows for different search engines
Metadata
Formats
Supported:
Compliance with GILS specification.
Should be configurable to output well-known metadata formats.

 

 Index+        dot_clear.gif (46 bytes)demo1dot_clear.gif (46 bytes)demo2dot_clear.gif (46 bytes)demo3dot_clear.gif (46 bytes)demo4
Provider Systems Simulation Ltd.
Purpose To provide complete commercial solutions for collection management tasks including "multimedia" collections.
Approach Modular toolkit approach that is customised to the individual application. Will work on all common platforms.
functionality
Server: Index+ Server is the database engine for storage, search and retrieval allowing Web and Z39.50 to the Index+ collections.
Database: Index+ database by default but access to varied data stores is enabled through individual APIs.
Search Tool: Purpose-built clients for specific environments customized for each application. Visual tools for creating and configuring
clients for specific applications.
Special Capability: Multilingual support in both client and internal storage.
Support for multimedia resource storage and indexing.
API: C programming interface to basic services as well as libraries for particular environments: C/C++ and Visual Basic where applicable.
Metadata
Formats
Supported:
Configurable internal storage allows for all common metadata types to be represented.
Custom output formats are configurable and programmable.

 

 Isitedot_clear.gif (46 bytes)                dot_clear.gif (46 bytes)demo1dot_clear.gif (46 bytes)demo2dot_clear.gif (46 bytes)demo3dot_clear.gif (46 bytes)demo4
Provider CNIDR
Purpose To provide Z39.50 version 3 capabilities.
Approach Uses freely available software to assemble complete scalable gateways.
functionality
Server:
Working through the Isearch search engine zserve is a Z39.50 server
Database: Access Isearch database by default with flexible access to others via call to scripts in a database initialization file.
Search Tool: Web browsers and Z39.50 clients.
Special Capability:
API: Called SearchAPI, used to configure the Isearch  to work with other storage devices.
Metadata
Formats
supported:
GILS, FGDC, IAFA Templates, USMARK.

 

 Project ISAAC        dot_clear.gif (46 bytes)demo1dot_clear.gif (46 bytes)demo2dot_clear.gif (46 bytes)demo3dot_clear.gif (46 bytes)demo4
Provider Internet Scout Research Team at University of Wisconsin-Madison
Purpose Research and development into information subject gateway issues aiming at linking separately maintained and geographically distributed metadata collections.
Approach To establish a distributed test environment through collaboration with a limited number of  institutions with similar aims.
Efforts are based on  Light-weight Directory Access Protocol for local collection storage and Common Indexing Protocol for the collection of remote collection descriptions.
functionality
Server:
Test implementations will use stand-alone LDAP servers. There will be two types of installation: Full-Node, and Collection-Node. Full-Node will store metadata collections (accessible by the LDAP server) and accept queries (via the LDAP client). Collection-Node will store metadata and allow itself to be searched (via its LDAP server) by any Full-Node installation (using the Full-Node's LDAP client) on behalf of its clients. The Collection-Node will be lightweight in that it  will not implement a large part of the whole package ( no LDAP client, CIP client, HTTP server, or Index of other remote sites).
Database: The API is defined to allow use of most relational databases.
Search Tool: Web browser.
Special Capability: It is supported in the LDAP protocol.
API: The LDAP is an enhanced version of Michigan version 3 software  which has a well documented C language API and a set of command line tools that can be used with scripting languages. Perl modules are also available.
Metadata
Formats
Supported:
Directory Information Format (DIF) that easily maps to most formats. Present Internet Scout applications are using DC.

 

 ROADSdot_clear.gif (46 bytes)        dot_clear.gif (46 bytes)demo1dot_clear.gif (46 bytes)demo2dot_clear.gif (46 bytes)demo3dot_clear.gif (46 bytes)demo4
Provider eLib funded project with software written at Loughborough University.
Purpose To provide a complete solution for subject gateway development. It is intended to be easy to use.
Approach A toolkit that installs quickly, and is complete for small to medium sized applications. 
functionality
Server: Offers Keyword and Boolean searching with customizable result format.
Cross-search capability with other whois++ based sites. 
Z39.50 gateway functionality provided by Index Data's Zebra server.
Database: ROADS generates a Unix  file system based inverted index from a set of template based IAFA records.
Search Tool: Web browser and standard clients when Z29.50 is enabled.
Special Capability: Remote record entry via HTML FORM input to the database by cataloguers.
Automated gathering has been effected with Harvest collected SOIF records then converted to ROADS records.
API: Made from  Perl modules. Extra functionality, including ODBC type database support  is gained by adding or changing Perl scripts.
Metadata
Formats
Supported:
IAFA based with conversion routines to generate Z39.50 compatible records.
Many tools exist to map to common metadata formats.

 

 Meta Web Tools     dot_clear.gif (46 bytes)gatherdot_clear.gif (46 bytes)searchdot_clear.gif (46 bytes)
Provider DSTC and a consortium of Australian partners.
Purpose To promote and enhance the use of metadata by Australian institutions.
Approach To supply (freely to everybody) a set of software tools that enable information gateways.
functionality
Server:
Broker/Gatherer system similar to Harvest (see above) that indexes web resources storing metadata in SOIF records.
This Harvest-like functionality lends itself to the advanced capabilities inherent in the JAVA language.   
Database: JDBC (Java DataBase Connectivity) drivers for mSQL and ORACLE (for the URL and main metadata databases)are included in the download. JDBC drivers generally are available whenever ODBC drivers exist. 
Search Tool: Web browser. Applet-based search tools?
Special Capability: Remote entry: HTTP FORM for a remote user to enter the URL of site to be indexed via a Web browser.
 The URL is stored in a database which remembers the history of the indexing that is carried.
Automated collection: The Gatherer selects the next URL to be indexed from the URL database.  The site index is then stored in the main            metadata database.  
API: JAVA programming interface.
Metadata
Formats
Supported:
SOIF records that are mapped to standard metadata formats.

 

 SiteSearch Suitedot_clear.gif (46 bytes)      dot_clear.gif (46 bytes)demo1dot_clear.gif (46 bytes)demo2dot_clear.gif (46 bytes)demo3dot_clear.gif (46 bytes)demo4
Provider OCLC Inc.
Purpose To provide libraries with Z39.50 (version 3) capability with access from WWW. Including image collection support.
Approach Commercial solution running on UNIX and NT platforms focused on library collections.
functionality
Server: WebZ  serves Web clients and integrates with search engine to stored resources.
Database: The package includes Database Builder for building collections that will be searched with OCLC's Newton Search Engine(also used by  "FirstSearch"). Standard JDBC/ODBC compliant databases are supported.
Search Tool: Web browser and Z39.50 clients. 
Special Capability: Many advanced features: Support for ISO ILL; user authentication; and multiple simultaneous database search; image collection support.
API: JAVA based interface to allow Z39.50 and other resourses to be searchable WebZ.
Metadata
Formats
Supported:
MARK record formats, Dublin Core. Will support common metadata formats.

 

 Cheshire II       dot_clear.gif (46 bytes)demo1dot_clear.gif (46 bytes)demo2dot_clear.gif (46 bytes)demo3dot_clear.gif (46 bytes)demo4
Provider SIMS, UC Berkeley
Purpose To provide access to library and other varied Z39.50 searchable resources. Focus extends to "multimedia" resources.
Approach This product is being designed and implemented in a working library environment with ongoing evaluation of user response. Advanced Z39.50 capabilities are supported and extended.  Cross-searching of diverse Cheshire sites is already offered.
functionality
Server: Cheshire II search engine/server supports full-text and structured searching and with "closest mach" and resutls ranking.
Database: Uses Berkeley Database.
Search Client: Has purpose built Z39.50 client. Supports standard search clients.
Special Capability: Much advanced work on search algorithms among other things (commercial companies are using these new designs). 
API: Source code has been made available. C/C++ with  Z39.50 clients created in Tkl/Tk.
Metadata
Formats
Supported:
SGML for internal use with conversion to all common metadata formats available.