Brian Kelly
UKOLN, University of Bath, Bath, UK
e-mail: B.Kelly@ukoln.ac.uk
This paper describes an automated approach to the benchmarking of Web sites. The author reviews UKOLN's work in auditing and evaluating Web sites in a number of public sector communities through use of Web-based tools. Although useful information has been provided to members of the communities comparisons of results from different tools have revealed significant differences. The paper argues the need for standard definitions to allow benchmarking results provided by different services to be comparable. The paper concludes by summarising the technical limitations of the approach described and suggesting that a "Web Service" approach is needed.
Keywords: web auditing, web benchmarking, accessibility, standards
The approach to automated benchmarking of Web sites described in this paper involves the use of Web-based tools to analyse Web site characteristics across a community. Measurements of various aspects of the Web sites can be carried out to establish compliance with standards and guidelines, trends within a community and comparisons with related communities.
Web site benchmarking is likely to grow in importance as public sector organisations increase the range of services they provide and seek to maximise the coverage of their services. We are likely to see the development of compliance services which will measure compliance of public sector Web sites with national and international standards and guidelines - for example see the UK Government's e-GIF guidelines [EGIF].
Manual benchmarking will be needed in areas in which automated techniques are not suitable. However the cost of manual benchmarking can be reduced by complementing such surveys with automated approaches. Although one might expect automated benchmarking surveys to produce consistent and reproducible results, in practice this is not the case. This paper reviews a number of approaches to automated benchmarking and highlights areas in which inconsistencies may occur.
Note that the focus of the paper is on the technical approaches to use of automated Web site benchmarking - the paper does not address the statistical validity of the approach.
UKOLN's WebWatch work began with a project to develop and make use of robot software to gather information on usage of Web technologies within a number of communities within the UK. The project developed software which was used to analyse several communities including UK Public Library Web sites [Kelly1997a] and UK University and College entry points [Kelly1997b]. After the project funding ceased it was decided that the WebWatch software should no longer be developed. However the feedback from the surveys had been very positive and it was felt that we should continue to carry out benchmarking surveys across our user communities. It was decided to make use of freely-available Web-based analysis tools to carry out further surveys. Typically these tools are implemented as CGI scripts with a Web interface. Although these services primarily allow users to evaluate their own Web sites, many of them can be used indirectly (by using a URL string which provides a set of input parameters). This approach can be used to evaluate multiple Web sites.
There are many desktop tools which can be used for benchmarking Web sites. Desktop tools may provide richer functionality than Web-based alternatives. However this paper focusses on Web-based services since these services can be used by everyone with access to a Web browser - software does not have to be installed locally. This ensures that the benchmarking process is open and allows everyone to see the tools used, to try out the benchmarks for themselves, enabling the current results to be obtained and allowing Web sites not included in the initial benchmarking survey to be tested and the results compared.
The approach described above has been used to benchmark several communities. It has been noticed that the results provided by different services are not always consistent. A smaller survey of a sample of UK Local Authority Web sites has been benchmarked in order to explore the reasons for the differences and as a pilot to establish if a more comprehensive survey of UK Local Authority Web sites would be useful.
The sample used in this benchmarking study consisted of the 16 candidates for the 2001 SPIN-SOCITM Website Awards [SPIN]. This sample was selected as it was small enough to be manageable while covering a diverse geographical range and type of local authority. In addition since the sample Web sites had been selected as candidates for a national award, it was felt that the Web sites may make use of innovative features which may test the capabilities of the benchmarking tools.
A number of Web-based analysis tools were used including Dr HTML [DrHTML], NetMechanic [NetMechanic], Bobby [CAST], LinkPopularity [LinkPopularity] and Netcraft [Netcraft].
The detailed findings and a discussion of the findings are available at [Kelly2001].
How valid are the results which have been obtained? How can differences in the results obtained from use of different services be explained?
There were several inconsistencies between the tools. Some tools respected the Robot Exclusion Protocol [Koster] and would not analyse files which the REP excluded robots from accessing. Other tools would follow redirects whereas other would simply analyse the redirect headers or page. Similarly some tools would measure an initial "splash screen" whereas others would measure the final page. The tools also treated pages which made use of frames differently. None of the Web sites appeared to measure combination of pages.
This raises the issue of how benchmarking services should analyse personalised or dynamic pages. Techniques such as use of "cookies" and environmental factors such the time of day, the page viewed previously, preferred language, etc. can be used to deliver tailored content. The contents of a page can also change through inclusion of news items, which may vary from day to day or more frequently.
It should also be noted that in general the tools do not make it easy for the numerical results to be easily processed. Typically the results are displayed in a format for viewing and it is difficult to reuse the information.
How useful is this type of survey? Similar surveys have been carried out, such as Barry's survey of Australian Library Web sites [Barry2000]. Web site benchmarking companies are being set up including commercial [WebAudits] and community-based [MAPIT] services. However in order for automated surveys to be useful (especially if they will be used to monitor compliance with standards and may influence funding), it is important that advice is provided to providers and users of benchmarking services.
There is a clear need for standard definitions to enable benchmarking surveys carried out using different applications to provide comparable results. For example there is a need to define the term "Web page". The definition should cater for embedded resources (e.g. style sheets, JavaScript files, etc.), redirects, "splash screens", user-agent negotiation, etc.
Definitions should be provided by an international standards body (e.g. W3C). Although W3C's Web Characterization Activity [W3Ca] is no longer active the work on EARL (Evaluation And Report Language) [W3Cb] which aims to provide a framework for describing data on the evaluation of resources may be of relevance.
Other bodies which may have a role to play in the standardisation work include the Interactive Advertising Bureau [IAB], ABCE [ABCE] and JICWEBS [JICWEBS] - bodies which represent the Web advertising and auditing communities.
Once the definitions have been standardised software vendors will be in a position to provide control over their analyses, in order to comply with the agreed standards. The software vendors should also look into the mechanisms for providing control over their services, in order to provide modularisation, to facilitate change control and to allow results to be easily reproduced. There may be a need to move towards use of "Web Service" technologies such as SOAP [W3Cc]. The development of applications which are designed to be used by other applications will help to address some of the limitations of the approach describe in this paper, including change control and providing management control over access to services.
Commercial software vendors will, of course, make their decisions on whether to provide enhanced access to their services based on commercial considerations. However it could be argued that there is a need for freely available, open source services which will provide Web benchmarking facilities, along the lines of the W3C validation services [W3Cd].
We are likely to see further benchmarking surveys carried across e-government communities. In addition to addressing the inconsistencies in results which have been identified in this paper there is a need to address the statistical validity of survey results.
Comparisons across different countries may provide useful comparisons. The author invites interested parties to make contact.
[ABCE] ABC Electonic <http://www.abce.org.uk/>
[Barry2000] The User Interface: glitz versus content, A. Barry, October 2000 <http://tony-barry.emu.id.au/pubs/2000/alia2000/>
[CAST] Bobby, CAST <http://wwwcast.org/bobby/>
[DrHTML] Dr HTML <http://www.drhtml.com/>
[EGIF] e-Government Interoperability Framework, Office of the E-Envoy, UK Online, <http://www.iagchampions.gov.uk/publications/frameworks/egif2/egif2.htm>
[IAB] Internet Advertising Bureau, <http://www.iab.net/>
[JICWEBS] JICWEBS (The Joint Industry Committee for Web Standards in the UK and Ireland), <http://www.jicwebs.org/>
[Kelly1997a] Robot Seeks Public Library Websites, LA Record, December 1997 Vol. 99(12). Also see <http://www.ukoln.ac.uk/web-focus/webwatch/articles/la-record-dec1997/>
[Kelly1997b] WebWatching UK Universities and Colleges, B. Kelly, Ariadne, Issue 12, November 1997, <http://www.ariadne.ac.uk/issue12/web-focus/>
[Kelly2001] Survey Of Shortlist for 2001 SPIN-SOCITM Website Awards, B. Kelly, UKOLN <http://www.ukoln.ac.uk/web-focus/events/conferences/euroweb-2001/>
[Koster] A Standard for Robot Exclusion, <http://www.robotstxt.org/wc/norobots.html>
[LinkPopularity] LinkPopularity <http://www.linkpopularity.com/>
[MAPIT] Better Connected 2001, MAPIT, SOCITM <http://www.socitm.gov.uk/mapp/w2001.htm>
[Netcraft] Netcraft <http://www.netcraft.com/>
[NetMechanic] NetMechanic <http://www.netmechanic.com/>
[SPIN] EPI 2001: About The Conference: SPIN Awards, SPIN Web Site <http://www.spin.org.uk/epi2001/about.html#spinawards>
[WebAudits] Welcome to WebAudits, WebAudits <http://www.webaudits.net/>
[W3Ca]Web Characterization Activity, W3C <http://www.w3.org/WCA>
[W3Cb] EARL - the Evaluation And Report Language, W3C, <http://www.w3.org/2001/03/earl/>
[W3Cc] SOAP Version 1.2, W3C <http://www.w3.org/TR/soap12/>
[W3Cd] HTML Validator, W3C <http://validator.w3.org/>