Subject based information gateways are one approach to Internet resource discovery that
is increasing in popularity. Such gateways are supported by a variety of software
packages. For example:
Social Science Information Gateway using ROADS ,
Syracuse University Gateway to Educational Materials using Blue Angel Technologies,
Eco Companion Australasia using ASF,
Astrophysical Journal on-line using CNIDR.
This report is concerned with investigating the availability of resource discovery software for setting up such gateways. It is being carried out as part of the ROADS Project and is influenced by the requirements of the eLib Subject Gateways. A level of arbitrary choice has been made in selecting the products and enough contrast exists in their nature that direct comparison is not entirely possible due to the high degree of scalability exhibited in some of the software. It is intended to update this comparison as new products come to our attention.
The software decisions will always be specific to the needs of a project but it may be useful to specify a generically simple gateway project in order to go through the comparison process in the presence of such a rich choice.
Search Service Requirements:
Information Gateway Software
Both "free" and commercial software packages will be considered. The products described here represent an example of what what is available. The information is based on their distributors web-site descriptions. In general, for each product, there is browsable/searchable implementation base.
Brief descriptions and links to the product suppliers' web-sites follow. The products can be associated with the three common search and retrieve protocols. A closer look at these products is planned with perhaps some consideration and comparison of inherent advantages or limitations based on this background.
Cheshire II f
Meta Web c
These product descriptions have been compiled by UKOLN for the ROADS Project as an aid to gateway implementors in identifying ROADS' position among other software choices. Every effort has been made to provide accurate information, however, no liability is accepted for errors, omissions or inaccuracies in the presentation.
|MetaStar Product Suitedemo1demo2demo3demo4|
|Provider||Blue Angel Technologies Inc.|
|Purpose||A commercial package aiming at a complete solution. Components may be integrated with other products.|
|Approach||Able to supply "turn-key" solutions that will run on all popular platforms.|
version 3 compliant,
|Comes with integrated
off-the-shelf search engines (e.g., AltaVista, Fulcrum, etc.);
configurable for both full-text and structured searches;
allows simultaneous distributed search by a single query with merged results;
implemented in Java and NT and is compatible with most Web servers.
|Database:||A metadata database management tool that works with standard off-the-shelf ODBC databases provides configurable input and output to support metadata formats. By default, files are imported in XML format.|
|Search Tools:||Z39.50 gateway provides the ability to simultaneously search distributed servers with a single query using a Web-based query interface. Gateway is implemented in Java and provides an ability to customize the look and feel of the interface.|
|Special Capabilities:|| Data Entry
permits users to enter XML data from their web browsers directly into a relational database. Once the updates clear an optional workflow approval process, they are incrementally indexed and available for discovery in real time.
Metastar Harvester is a crawler that gathers and parses XML and HTML and extracts designated data elements such as HTML tags. Extracted information is translated into either a structured or a custom file formal.
|API:||Software Development Kits in C++ and Java for both the Client and the Server. Translation SDK for customising metadata for import and export from and to external systems.|
|Using XML internally, output mappings produce standard metadata formats.(GILS, FGDC, DIF, Dublin Core...)|
|Provider||University of Edinburgh|
|Purpose||Efficient indexing for Web resources with a focus on automated gathering.|
|Approach||Freely available software used widely.|
subsystem takes information from one or more Gatherers, suppresses
indexes the collection, and provides a WWW query interface to it.
|Database:||Along with its own harvested collection, its Replicator subsystem replicates brokers (can be web sites) around the Internet allowing for very efficient use of Internet bandwidth.|
|Search Tool:||Web browser.|
|Special Capability:||The Harvest Gatherer
uses the Essence subsystem to extract indexing information. Depending on
the file type, Essence
different data to index using a type-specific indexing algorithm (summarizer). The default summarisers can be customized if required.
|API:||The software is
written in C and much functionality has been extended with Perl. There is a lot of freely
software to accomplish a variety of tasks.
|Internally, SOIF records. Other formats are supported through mapping between standards.|
|Provider||GILS and partners|
|Purpose||To provide index and search capability for local and distributed document sets. Intended to be a complete solution.|
|Approach||Using existing and enhanced "freeware" components, a minimal complete package is provided.|
| Ships with Isearch
search engine. Supports both full-text and structured search.
The API allows for configuration for use with popular search engines.
|Database:||Supports ODBC compliant databases.|
|Search Tool:||Web browser and standard Z39.50 clients.|
|Special Capability:||Automated Gathering, HtDig crawls remote document locations to provide data for indexing.|
|API:||SAI is the programmable interface to the server and allows for different search engines|
|Compliance with GILS specification.
Should be configurable to output well-known metadata formats.
|Provider||Systems Simulation Ltd.|
|Purpose||To provide complete commercial solutions for collection management tasks including "multimedia" collections.|
|Approach||Modular toolkit approach that is customised to the individual application. Will work on all common platforms.|
|Server:||Index+ Server is the database engine for storage, search and retrieval allowing Web and Z39.50 to the Index+ collections.|
|Database:||Index+ database by default but access to varied data stores is enabled through individual APIs.|
clients for specific environments customized for each application. Visual tools for
creating and configuring
clients for specific applications.
|Special Capability:||Multilingual support in both client and internal storage.
Support for multimedia resource storage and indexing.
|API:||C programming interface to basic services as well as libraries for particular environments: C/C++ and Visual Basic where applicable.|
|Configurable internal storage allows for all common metadata types to be
Custom output formats are configurable and programmable.
|Purpose||To provide Z39.50 version 3 capabilities.|
|Approach||Uses freely available software to assemble complete scalable gateways.|
||Working through the Isearch search engine zserve is a Z39.50 server|
|Database:||Access Isearch database by default with flexible access to others via call to scripts in a database initialization file.|
|Search Tool:||Web browsers and Z39.50 clients.|
|API:||Called SearchAPI, used to configure the Isearch to work with other storage devices.|
|GILS, FGDC, IAFA Templates, USMARK.|
|Project ISAAC demo1demo2demo3demo4|
|Provider||Internet Scout Research Team at University of Wisconsin-Madison|
|Purpose||Research and development into information subject gateway issues aiming at linking separately maintained and geographically distributed metadata collections.|
|Approach||To establish a distributed test environment through collaboration with a
limited number of institutions with similar aims.
Efforts are based on Light-weight Directory Access Protocol for local collection storage and Common Indexing Protocol for the collection of remote collection descriptions.
||Test implementations will use stand-alone LDAP servers. There will be two types of installation: Full-Node, and Collection-Node. Full-Node will store metadata collections (accessible by the LDAP server) and accept queries (via the LDAP client). Collection-Node will store metadata and allow itself to be searched (via its LDAP server) by any Full-Node installation (using the Full-Node's LDAP client) on behalf of its clients. The Collection-Node will be lightweight in that it will not implement a large part of the whole package ( no LDAP client, CIP client, HTTP server, or Index of other remote sites).|
|Database:||The API is defined to allow use of most relational databases.|
|Search Tool:||Web browser.|
|Special Capability:||It is supported in the LDAP protocol.|
|API:||The LDAP is an enhanced version of Michigan version 3 software which has a well documented C language API and a set of command line tools that can be used with scripting languages. Perl modules are also available.|
|Directory Information Format (DIF) that easily maps to most formats. Present Internet Scout applications are using DC.|
|Provider||eLib funded project with software written at Loughborough University.|
|Purpose||To provide a complete solution for subject gateway development. It is intended to be easy to use.|
|Approach||A toolkit that installs quickly, and is complete for small to medium sized applications.|
|Server:||Offers Keyword and
Boolean searching with customizable result format.
Cross-search capability with other whois++ based sites.
Z39.50 gateway functionality provided by Index Data's Zebra server.
|Database:||ROADS generates a Unix file system based inverted index from a set of template based IAFA records.|
|Search Tool:||Web browser and standard clients when Z29.50 is enabled.|
|Special Capability:||Remote record
entry via HTML FORM input to the database by cataloguers.
Automated gathering has been effected with Harvest collected SOIF records then converted to ROADS records.
|API:||Made from Perl modules. Extra functionality, including ODBC type database support is gained by adding or changing Perl scripts.|
|IAFA based with conversion routines to generate Z39.50 compatible records.
Many tools exist to map to common metadata formats.
|Meta Web Tools gathersearch|
|Provider||DSTC and a consortium of Australian partners.|
|Purpose||To promote and enhance the use of metadata by Australian institutions.|
|Approach||To supply (freely to everybody) a set of software tools that enable information gateways.|
system similar to Harvest (see above) that indexes web resources storing metadata in SOIF
This Harvest-like functionality lends itself to the advanced capabilities inherent in the JAVA language.
|Database:||JDBC (Java DataBase Connectivity) drivers for mSQL and ORACLE (for the URL and main metadata databases)are included in the download. JDBC drivers generally are available whenever ODBC drivers exist.|
|Search Tool:||Web browser. Applet-based search tools?|
|Special Capability:||Remote entry:
HTTP FORM for a remote user to enter the URL of site to be indexed via a Web browser.
The URL is stored in a database which remembers the history of the indexing that is carried.
Automated collection: The Gatherer selects the next URL to be indexed from the URL database. The site index is then stored in the main metadata database.
|API:||JAVA programming interface.|
|SOIF records that are mapped to standard metadata formats.|
|SiteSearch Suite demo1demo2demo3demo4|
|Purpose||To provide libraries with Z39.50 (version 3) capability with access from WWW. Including image collection support.|
|Approach||Commercial solution running on UNIX and NT platforms focused on library collections.|
|Server:||WebZ serves Web clients and integrates with search engine to stored resources.|
|Database:||The package includes Database Builder for building collections that will be searched with OCLC's Newton Search Engine(also used by "FirstSearch"). Standard JDBC/ODBC compliant databases are supported.|
|Search Tool:||Web browser and Z39.50 clients.|
|Special Capability:||Many advanced features: Support for ISO ILL; user authentication; and multiple simultaneous database search; image collection support.|
|API:||JAVA based interface to allow Z39.50 and other resourses to be searchable WebZ.|
|MARK record formats, Dublin Core. Will support common metadata formats.|
|Cheshire II demo1demo2demo3demo4|
|Provider||SIMS, UC Berkeley|
|Purpose||To provide access to library and other varied Z39.50 searchable resources. Focus extends to "multimedia" resources.|
|Approach||This product is being designed and implemented in a working library environment with ongoing evaluation of user response. Advanced Z39.50 capabilities are supported and extended. Cross-searching of diverse Cheshire sites is already offered.|
|Server:||Cheshire II search engine/server supports full-text and structured searching and with "closest mach" and resutls ranking.|
|Database:||Uses Berkeley Database.|
|Search Client:||Has purpose built Z39.50 client. Supports standard search clients.|
|Special Capability:||Much advanced work on search algorithms among other things (commercial companies are using these new designs).|
|API:||Source code has been made available. C/C++ with Z39.50 clients created in Tkl/Tk.|
|SGML for internal use with conversion to all common metadata formats available.|