UKOLN submitted a proposal to the British Library Research and Innovation Centre to provide a small amount of support to a WebWatch initiative at UKOLN. The aim of the project was to develop a set of tools to audit and monitor design practice and use of technologies on the Web and to produce reports outlining the results obtained from applying the tools. The reports provided data about which servers are in use, about the deployment of applications based on ActiveX or Java, about the characteristics of Web servers, and so on. This information should be useful for those responsible for the management of Web-based information services, for those responsible for making strategic technology choices and for vendors, educators and developers.
WebWatch aims to improve Web information management by providing a systematic approach to the collection, analysis and reporting of data about Web practices and technologies.
Specific objectives include:
* Develop tools and techniques to assist in the auditing of Web practice and technologies
* Assist the UK Web Focus to:
* develop a body of experience and example which will guide best practice in Web information management
* advise the UK library and information communities
* Set up and maintain a Web-based information resource which reports findings
* Document its findings in a research report.
After a first phase which has seen the Web become a ubiquitous and simple medium for information searching and dissemination we are beginning to see the emergence of a range of tools, services and formats required to support a more mature information space. These include site and document management software, document formats such as style sheets, SGML and Adobe PDF, validation tools and services, metadata for richer indexing, searching and document management, mobile code and so on.
Although this rapid growth is enhancing the functionality of the Web, the variety of tools, services and architectures available to providers of Web services is increasing the costs of providing the service. In addition the subsequent diversity may result in additional complications and expenses in the future. This complication makes it critical to have better information available about what is in use, what the trends are, and what commonality exists.
Benefits include better information for information providers, network information managers, user support staff and others about practice and technologies.
Work proceeded as follows:
* Development of a robot to retrieve institutional Web pages, from a
database of institutional URLs
An automated robot which retrieves Web resources from an input file was developed. The robot was written in Perl, and makes use of the libwww Perl library. Originally use was made of a tailored version of the Harvest Gatherer. As limitations in this robot became apparent modifications were made to a number of modules, until we finally were using a locally-developed robot suite.
* Development of a suite of programs to analyse results retrieved by
Software to analyse the results from the robot was used. This included locally developed Unix utilities to process the data into a format suitable for use with other applications, and desktop applications including Microsoft Excel and SPSS.
* Production of reports
Examples of areas covered by the reports included:
* Report on the quality and type of HTML used in institutional home pages (e.g. conformance to HTML standard, use of HTML extensions such as frames, use of mobile code such as Java) and size of the institutional home page (e.g. number and size of images).
* Report on the use of metadata.
* Report on the numbers of hypertext links in pages.
* Liaison with specific communities:
The robot was used to retrieve resources from particular domains and subject areas, including public libraries and library services within Higher Education. Reports on these trawls enabled:
* Analyses on the uptake of new Web facilities to be monitored within a subject area. For example, use of metadata tags within the Library community.
* Periodic surveys to be carried out to observe trends within the communities.