UKOLN Good Practice Guide for Developers of Cultural Heritage Web Services

Sector Statistics


Author: Brian Kelly, UKOLN


This section provides advice and guidance on Web site usage statistics intended to provide information on usage across a community of Web site providers.

Proposed Approaches

The four measurements that a Web site owner should record and supply to appropriate bodies are:

Explanation of these measurements are provided below:

User Sessions

The number of distinct user sessions chalked up over the reporting period.

User sessions are determined by grouping together all requests that come from the same IP address within a time interval of no less than 30 minutes between each request. A figure of 30 minutes is widely used and is the default in many Web analysis packages.

Note: This is NOT the "unique users" value. User sessions include repeat visitors to your pages.

Average Duration of User Session

By examining the time of the first and the last request made during a user session, a figure for the length of the user session can be obtained. The average duration of a user session is the average length of all user sessions found.

Note: some good usage statistic analysis packages (notably Analog amongst others) are not able to calculate the user session length, so cannot provide this information. Where this is the case, you should answer with "n/a : " followed by details of the log analysis package in use.

Page Impressions

Total number of requests for files that are defined as pages.

Generally files that have extensions .htm, .html, .shtml, .php, .asp, .pl, .cgi and so forth. The exact set may differ across Web sites, so Web site owner will be expected to configure their analysis packages so that the relevant page type files are measured.

Note: You should not include images, graphics, stylesheets, external script files or other "component" files that together comprise one page.

Note: Requests that can be positively identified as emanating from non-human sources should as a matter of standard practice be excluded from the analysis. You should seek to ensure that your reports do not include data coming from search engine spiders(robots), network monitors, benchmark tests and other sources which do not reflect usage by end users.

Note: Some Web site may generate dynamic URLs where the base URL remains the same, and the page delivered is determined by the contents of the query string: i.e.
In this case, special steps may have to be taken to convert log files to sensible page impressions values.

Average Page Impressions per User Session

The average is obtained by dividing total page impressions by the total number of user sessions recorded.

Limitations Of Aggregated Usage Statistics

It should be noted that, for a variety of technical reasons, Web usage statistics will not necessarily give an accurate indication of the usage of a Web site by end users. In addition, errors can also be introduce when aggregating summaries of usage figures (for example it would not be valid to determine the total numbers of unique visitors by adding the numbers reported for each month, as a visitor in January may well visit the Web site in other months). Finally it should be noted that aggregating usage statistics across different Web sites may also give misleading results since, for example, different methodologies and tools may be used.

Despite such limitations, usage statistics can be useful in identifying trends and in giving an indication of overall usage. In should be noted that limitations in obtaining usage data exist in other fields (e.g. TV viewing figures) and yet, despite such limitations, the data is collected and used for a variety of purposes.


To summarise:

Further Information

The following background information may be useful.

Comments On This Document

This section will be used to provide notes on the section, including details of any changes.

18 October 2005
Document published.