UKOLN Good Practice Guide for Developers of Cultural Heritage Web Services
 
 
 

Guidelines for Setting Up Web Sites

Acknowledgements

Author: Brian Kelly, UK Web Focus, UKOLN
Originally commissioned as part of the Information Paper series from the NOF-digi Technical Advisory Service.

Introduction

This section provides advice and guidance on best practices for setting up Web sites. The information provided in this section is intended to be used during the planning stages for a Web site.

Purpose of the Web Site

Before setting up your Web site it is important to identify its purpose(s). The primary purpose of many cultural heritage Web sites will be delivering cultural heritage and learning materials to end-users. However organisations may also choose to make available information about development activities, its development, remit, etc. to partners, funders and other interested parties. The site could be structured in such a way that different audiences are targeted for specific sections of the site, whilst ensuring that all the materials digitised are freely accessible to the public.

It is necessary to have a clear idea of the purpose of the Web site at the planning stage in order to help with the planning of the structure of the Web site, identify the potential costs, management support, technical resources required and staffing numbers and skill levels, etc.

Domain Name for your Web Site

Users often find it easier to remember a short project- or organisation-specific URL, rather than a long and non-intuitive one. Organisations might wish to consider registering their own domain name (<www.my-project.org.uk>, say) rather than requiring users to remember a less friendly URL (such as <www.my-organisation.gov.uk/projects/nof/my-project/>).

If your Web site is hosted on an organisational server there are several options available. Web sites are often set up within a deep hierarchical structure (e.g. <www.my-organisation.gov.uk/depts/library/projects/NOF-digitise/bar/>). Such URLs can be difficult to remember and are prone to errors when typing. In addition many search engines are believed not to index deeply within a Web site.

Another alternative is to make use of the ~ (tilde) convention (e.g. www.my-organisation.gov.uk/~bar/). Although this approach can provide a short URL novice users may find the ~ key difficult to find. Also users may regard content on Web sites which contain ~ to be personal home pages rather than quality services.

If you change the domain name for your Web site you should ensure that you inform relevant parties, such as the funders of the Web site.

Life Time of the Web Site

What is the expected life time of the Web site? The information contained on a Web site is likely to be valuable even after the project has finished. You should try to ensure that the Web site will continue to exist after the funding has finished and that bookmarks and published URLs will continue to function.

Directory Structures

Resources on your Web site should be located within its directory (e.g. if the directory is <www.my-project.gov.uk/about/> this should be the project entry point and not <www.my-project.gov.uk/about.html>). This allows the Web site to be treated as a unified collection - e.g. to allow the Web site to be downloaded onto a PDA or by an offline browser, without having to download the entire organisation Web site.

The Pre-Release Web Site

A preliminary site may be up and running before the development work is complete and the full version of the Web site launched. You should note that once a Web site has been set up it may be indexed by search engines or linked to from other Web sites. Since the index and links may persist after the official project Web site has been launched it may be desirable to manage the dissemination of the pre-release Web site. For example, you may wish to provide text on the Web site informing users of its status. You should also consider use of the robots.txt file or equivalent <meta> tags in HTML pages to prevent robots from indexing the site until it is officially launched [1].

Promoting the Web Site

Once your Web site has been developed and checked for compliance with appropriate standards and best practices is ready it should be promoted in order to ensure that end users know about the service.

You should make use of a robots.txt file to ensure that quality areas of your Web site are indexed: for example, you may wish to exclude draft documents or personal pages from being indexed by search engines such as Google.

You are advised to be pro-active in submitting your Web site to key search engines (e.g. Google) and directory services (e.g. Yahoo). You should consider use of submission software and services [2].

You should be aware of the difficulties which search engine software can have in indexing your Web site if you provide frames-interfaces (especially if you do not provide a no-frames alternative for accessing the full content of your Web site), if you use 'splash screens' or if you use proprietary file formats, such as Flash.

The use of metadata on key areas of your Web site may be a requirement of the funding body. Metadata (information about a Web resource, such as the author, keywords, brief description, etc.) may be used by search engines such as Google and also by local search engines.

During the lifetime of the Web site there are likely to be several deliverables and items of news which you would like your users to be aware of. You should consider providing a news area on your Web site. You may wish to use an automated notification service such as Trackengine [3] so that users will receive an automated message when the news page is updated. You should also consider use of RSS [4] to allow your news items to be automatically syndicated to remote Web sites.

For further information on Web site promotion see [5].

Content on Your Web Site

The content that your Web site will provide will influence the design of the site, particularly the navigational elements on the Web site. In addition the nature of the content will help in structuring the content and defining the underlying directory structure.

It may be useful to produce a diagram showing the key areas of content which defines how the content will be grouped: try to think about how the Web site will be structured in a couple of year's time. The Web site should be designed to ensure that it will not have to be re-organised as the site grows as this can be very time-consuming and can lead to many broken links.

File Formats

Standards and Proprietary Formats

Cultural heritage Web site should make use of open standard file formats wherever possible in order to maximise access to resources. Open standards (such as HTML and XML) are developed by consortia (such as W3C) and are designed to be platform and application independent. There will often be freely available tools to create and view formats based on open standards.

However, in some cases there may be no relevant open standards or the relevant standards may be sufficiently new that conformant tools are not widely available. In some cases therefore, the use of proprietary standards may be acceptable. Ideally you will document the reasons for selection of the proprietary formats and have outlined a migration strategy to use of open solutions if they become available. Proprietary formats (such as Macromedia Flash and Adobe PDF) are owned by commercial companies and may only work on limited platforms. Use of proprietary formats may require licensed products, and even when their use is free, there can be no guarantee that this will continue.

You may be required to follow the technical standards which may be mandated by a funding body. For example, projects funded by the NOF-digitise programme were required to follow the NOF-digitise Technical Standards and Guidelines [6] which described the recommended file formats for creation and storage of digital resources.

HTML

HTML will be used extensively on your Web site. In order to maximise the range of browsers which can access your Web site you are advised to ensure that your Web site conforms to HTML standards (currently HTML 4.0 or XHTML 1.0 [7] standard) and that you avoid use of proprietary extensions.

Stylesheets

HTML or XHTML should be used to describe the main structural elements on your Web site. Cascading Style Sheets (CSS) [8] s hould be used to define the appearance of the elements on a browser. Separation of the structure of resources on the Web site from its appearance will enable the appearance to be more easily changed and will ensure that resources can be accessed by a variety of devices (digital TV, PDAs, etc.)

Browser Support

Although HTML/XHTML and CSS are the recommended formats for Web sites, unfortunately many older browsers fail to support CSS adequately. Until standards-compliant browsers are widely deployed you may wish to consider use of 'safe' CSS features which can be used with all browsers and which degrade gracefully (see [9]).

Validation

Validation of HTML/XHTML and CSS resources will help to detect errors. Your Web pages may not be displayed correctly or function correctly in all browsers if they contain errors. You should ensure that you systematically check Web pages using validation tools, which may be built into the authoring tool, may be independent applications or may be provided on the Web, such as W3C's validation services [10] and [11].

Audio visual File Formats

Your Web site may contain a variety of images, such as navigational icons, photographs, flow-charts and organisational diagrams, etc., and may also contain sound and video resources. The following requirements may apply to the delivery of resources funded by public sector funding bodies.

Other File Formats

Browser plugin technologies allow a range of other file formats to be provided on the Web such as Macromedia Flash and Adobe PDF. It should be noted that such formats are proprietary and there is no guarantee that plugins will continue to be free. There may also be accessibility considerations for plugin technologies: the content may not be accessible to speaking browsers, digital TVs, etc. For these reasons plugins should be avoided if possible. However in certain circumstances their use may be permitted, e.g. in the design of games, etc. and a case can be made in your project's business plan for their use, provided this case does not contradict any of the 'must' requirements of any technical guidelines document provided by your funder.

HTML Tools

A number of approaches to creating HTML documents can be taken. Experienced HTML authors may make use of text editors to create HTML markup manually. However this approach can be time-consuming and is prone to errors. Many HTML authors prefer to make use of dedicated HTML authoring tools such as FrontPage or Dreamweaver.

If you have large numbers of documents in proprietary formats, such as a word processing format, you may wish to make use of a conversion tool. Some dedicated HTML authoring tools allow formats such as MS Word to be imported and converted, although the quality of the conversion may be poor. Dedicated conversion tools may do a better job, or enable large numbers of documents to be converted in bulk.

Another way of providing access to documents which use a proprietary file format is to use 'on-the-fly' conversion software on the server (i.e. files are dynamically converted by software typically running on the Web server).

Another alternative is to make use of a content management system. A content management system may be regarded as a database which provides management functionality for a Web site. Content management systems normally provide facilities such as reuse of resources, automated removal of expired resources, personalised interfaces, etc. Content management systems may provide a dedicated data entry system in which knowledge of HTML is not required. Content management systems should also provide support for new file formats which may supersede or extend HTML (e.g. provide support for WML to provide access to users of mobile phones). Further information on contentmanagement systems is given elsewhere in the document [13].

Future Developments

As the underlying Web technologies and file formats are constantly being developed it is important to keep up-to-date with developments in order to be in a position to exploit new developments in a timely manner. Important developments to be aware of include XML, XHTML and XSLT.

XML, the Extensible Markup Language, will act as the basis for new file formats [14]. It enables richly structured resources to be described in an open and extensible format. XML is already widely used in many large-scale commercial Web sites and will grow in importance as new browsers become available which will provide native support for XML. In addition to XML itself, there are many related developments which will enhance the functionality of services based on XML.

XHTML [15] is an XML version of HTML. It is designed to provide the benefits of XML (i.e. structured, reusable documents) while allowing resources to be accessed by existing browsers.

XSLT [16] is a transformation language which allows XML resources (such as XHTML pages) to be transformed into other formats such as WML pages for use by mobile phones).

Design Issues

A well-designed Web site will be quick and easy to use and will reflect positively on the organisation. A poorly designed Web site is likely to be difficult to use and will give a poor impression of the organisation. When designing your Web site think about the following issues:

Who designs? Who will be responsible for designing the Web site? Will it be done inhouse, or by an external designer? What skills do they have? Are the design skills relevant to a Web site?

The design brief. It is important to produce a thorough design brief and methodology for approving the proposed designs.

Technologies. What technologies will be used to implement the design? Will the use of technologies such as Shockwave or Flash be acceptable? Are the technologies backwards compatible?

Accessibility. Is the design accessible to people with disabilities or users of older browsers or specialist devices?

Navigation

Browsing

The navigational aids on a Web site should be part of the overall site design. It is desirable that consistent navigational aids are widely available throughout the Web site. It should enable users to quickly access the key areas of the Web site such as the 'home page', a search facility or site map and help information or frequently asked questions.

Searching

A search facility is essential for most Web sites. A wide range of search engines are available, many free-of-charge. If you cannot install a search facility for your project Web site (for example if the institution hosting your Web site does not provide searching across specified areas) you can make use of an externally-hosted search engine. For further information see [17].

Error Pages

Your Web site's 404 error page (the message which is displayed when a user selects a Web page which does not exist) can play an important role in helping users navigate. A well-designed 404 page will provide access to a search facility or a site map. For further information see [18].

Quality Assurance

In order for a Web site to continue to provide a quality service after it has been launched it will be necessary to maintain the service.

The content on the Web site will need maintaining to ensure that it remains up-to-date and relevant. The maintenance process can by assisted by the inclusion of contact details or clearly defining the person or group with responsibility for the information content. User feedback mechanisms, such as email links or Web forms can help to encourage users to report on inaccuracies. The inclusion of a user form on your Web site may be a requirement of your funding body in order to enable users to provide feedback on the quality of your service.

Broken links on a Web site are always irritating. You should ensure that you provide systematic link checking. This should cover both internal links to resources within your Web site and links to external resources.

Although there are many link checking tools available you should bear in mind that broken links can be caused not only by use of the <a> and <img> elements to link to resources and images, but also by technologies, such as style sheets, forms, etc. You can check for other broken links by analysing your Web server's log file. The error log file (which may be a separate file) will give more complete information on errors.

You should ensure that you have procedures for monitoring the availability of your Web service. If you do not have procedures available locally you may wish to make use of remote services such as WatchMyServer [19] and InternetSeer [20].

Performance Indicators

You may be expected to provide performance indicators for your Web site to your funders. You may wish to record performance indicators for your own use and to give information to your management group, to help with future planning for growth of the service.

Web statistics can provide a useful performance indicator, although they should be treated with caution. For example a substantial growth in the number of hits on our Web site may simply indicate a redesign of your Web site with greater numbers of images or that your Web site is being accessed regularly by robot software rather than users. Information on the number of page impressions or user sessions is probably better than number of hits, but again this can be misleading. For example growth in the number of page impressions and user sessions may be the result of large numbers of users finding your Web site using a search engine and leaving the Web site after reading one page and deciding it is not relevant.

Despite these reservations, Web usage statistics should be collected and summaries produced. There are parallels with the published statistics on TV viewing. Audience viewing figures for TV programmes are also difficult to define rigorously: for example the TV may be on but either nobody is watching or many are; the figures may be skewed by videoing programmes for later viewing, etc. Despite these reservations TV viewing statistics are collected and published and for the basis for decision making.

The important point to be made is that any figures must be regarded with some scepticism and care must be taken before using usage figures for making any decisions or comparisons with other services. It will also be important to ensure that a consistent approach is taken towards the collection of the data and any data processing. For example, you should seek to ensure that you collect and process the data in a consistent fashion during the lifetime of your project Web site. For example, should you record only accesses from outside the organisation; should you remove access data from robots; etc.? As well as ensuring that you process the data in a consistent fashion within your organisation, it is also desirable that similar approaches are taken by your peers, such as project work funded by the same funding body.

You must also ensure that care is taken if you aggregate summaries of the usage statistics. For example, if monthly Web log analysis reports shown that you get 1,000 unique visitors per month, you cannot say that over a year you have received 12,000 unique visitors. You may, in fact, have received only 1,000 unique visitors over the year. On the other hand, you may indeed have received 12,000 unique visitors (although this is unlikely).

Other performance indicators are available, such as the number of links to your Web site, the coverage of your Web site by search engines, Web server uptime, user feedback, etc. For further information see [21].

Auditing Your Web Site

You may find it useful to carry out periodic auditing of your Web site. This can help in spotting errors, evaluating the accessibility of the Web site, evaluating the success of your dissemination, etc. For further information on approaches to monitoring and auditing Web sites see the survey of eLib project Web sites [22].

References

  1. Robot Exclusion Protocol
    http://info.webcrawler.com/mak/projects/robots/exclusion.html
  2. Submitting to Search Engines Using "Scrub The Web", Exploit Interactive, 6,
    http://www.exploit-lib.org/issue6/software-used/
  3. Trackengine
    http://www.trackengine.com/
  4. Rich Site Summary Resources, UKOLN
    http://www.ukoln.ac.uk/metadata/resources/rss/
  5. Promoting Your Project Web Site, Exploit Interactive, issue 4
    http://www.exploit-lib.org/issue4/promotion/
  6. NOF-digitise Technical Standards and Guidelines
    http://www.peoplesnetwork.gov.uk/content/technical.asp
  7. Hypertext Markup Language, W3C
    http://www.w3.org/MarkUp/
  8. CSS, W3C
    http://www.w3.org/Style/CSS/
  9. CSS Support Table, RichInStyle
    http://richinstyle.com/bugs/table.html
  10. W3C HTML Validation Service, W3C
    http://validator.w3.org/
  11. W3C CSS Validation Service, W3C
    http://jigsaw.w3.org/css-validator/
  12. PNG, W3C
    http://www.w3c.org/Graphics/PNG/
  13. Good Practice Guide for Developers of Cultural Heritage Web Services: Content Management Systems
    http://www.ukoln.ac.uk/nof/support/gpg/ContentManagementSystems/
  14. The XML FAQ
    http://www.ucc.ie/xml/
  15. XHTML, W3C
    http://www.w3.org/TR/xhtml1/
  16. Extensible Stylesheet Language, W3C
    http://www.w3.org/Style/XSL/
  17. UK University Search Engines, Ariadne, issue 21
    http://www.ariadne.ac.uk/issue21/webwatch/
  18. 404s: What's Missing?, Ariadne, issue 20
    http://www.ariadne.ac.uk/issue20/404/
  19. WatchMyServer
    http://www.watchmyserver.com/
  20. InternetSeer
    http://www.internetseer.com/
  21. Performance Indicators For Web Sites, Exploit Interactive, 5,
    http://www.exploit-lib.org/issue5/indicators/
  22. WebWatching eLib Project Web Sites, issue 26
    http://www.ariadne.ac.uk/issue26/web-watch/

You may also find the Advice to Content Providers document useful (note it is in PDF format).

Comments On This Document

This section will be used to provide notes on the section, including details of any changes.

2 Dec 2004
Document made available to MLA staff for comments.
January 2005
Document added.