This page is for printing out all of the briefing papers. The briefing papers are given in numerical order. Note that some of the internal links may not work.

Compliance with HTML Standards

Why Bother?

Compliance with HTML standards is needed for a number of reasons:

HTML compliant resources are more likely to be accessible to a wide range of Web browsers including desktop browsers such as Internet Explorer, FireFox, Opera, Lynx and specialist browsers on a PDAs, digital TVs, kiosks, etc.
HTML compliant resources are more easily processed and repurposed by other applications.
HTML compliant resources will be rendered more quickly by modern browsers.
HTML compliance is required for AAA compliance with the W3C WAI accessibility guidelines.

Which Standards?

The World Wide Web Consortium, W3C, recommend use of the XHTML 1.0 (or higher) standard. This has the advantage of being an XML application (allowing use of XML tools) and can be rendered by most browsers. However authoring tools in use may not yet produce XHTML. Therefore HTML 4.0 may be used.

Cascading style sheets (CSS) should be used in conjunction with XHTML/HTML to describe the appearance of Web resources.

Approaches To Creating Resources

Web resources may be created in a number of ways. Often HTML authoring tools such as DreamWeaver, FrontPage, etc. are used, although experienced HTML authors may prefer to use a simple editing tool. Another approach is to make use of a Content Management System. An alternative approach is to convert proprietary file formats (e.g. MS Word or PowerPoint). In addition sometimes proprietary formats are not converted but are stored in their native format.

Monitoring Compliance

A number of approaches may be taken to monitoring compliance with HTML standards. For example you can make use of validation features provided by modern HTML authoring tools, use desktop compliance tools or Web-based compliance tools.

The different types of tools can be used in different ways. Tools which are integrated with a HTML authoring tool should be used by the page author. It is important that the author is trained to use such tools on a regular basis. It should be noted that it may be difficult to address systematic errors (e.g. all files missing the DOCTYPE declaration) with this approach.

A popular approach is to make use of SSIs (server-side includes) to retrieve common features (such as headers, footers, navigation bars, etc.). This can be useful for storing HTML elements (such as the DOCTYPE declaration) in a manageable form. However this may cause validation problems if the SSI is not processed.

Another approach is to make use of a Content Management System (CMS) or similar server-side technique, such as retrieving resources from a database. In this case it is essential that the template used by the CMS complies with standards.

It may be felt necessary to separate the compliance process from the page authoring. In such cases use of a dedicated HTML checker may be needed. Such tools are often used in batch, to validate multiple files. In many cases voluminous warnings and error messages may be provided. This information may provide indications of systematic errors which should be addressed in workflow processes.

An alternative approach is to use Web-based checking services. An advantage with this approach is that the service may be used in a number of ways: the service may be used directly by entering the URL of a resource to be validated or live access to the checking service may be provided by including a link from a validation icon as used at <http://www.ukoln.ac.uk/qa-focus/> as shown in Figure 1 (this approach could be combined with use of cookies or other techniques so that the icon is only displayed to an administrator).

Figure 1: Using icons as link to validation service

Another approach is to configure your Web server so that users can access the validation service by appending an option to the URL. For further information on this technique see the QA Focus briefing document A URI Interface To Web Testing Tools> at <http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-59/>. This technique can be deployed with a simple option on your Web server's configuration file.

Briefing 02

Use of Automated Tools For Testing Web Site Accessibility

Accessibility And Web Sites

It is desirable to maximise the accessibility of Web sites in order to ensure that Web resources can be accessed by people who may suffer from a range of disabilities and who may need to use specialist browsers (such as speaking browsers) or configure their browser to enhance the usability of Web sites (e.g. change font sizes, colours, etc.).

Web sites which are designed to maximise accessibility should also be more usable generally, (e.g. for use with PDAs) and are likely to be more easily processed by automated tools.

Accessibility Testing Tools

Although the development of accessible Web sites will be helped by use of appropriate templates and can be managed by Content Management Systems (CMSs), there will still be a need to test the accessibility of Web sites.

Full testing of accessibility with require manual testing, ideally making use of users who have disabilities. The testing should also address the usability of Web sites as well as its accessibility.

Manual testing can however be complemented with use of automated accessibility checking tools. This document covers the use of automated accessibility checking tools.

Accessibility Guidelines

The W3C WAI (Web Accessibility Initiative) have developed guidelines on the accessibility of Web resources. Many institutions are making use of the WAI guidelines and will seek to achieve compliance with the guidelines to A, AA or AAA standards. Many testing tools will measure the compliance of resources with these guidelines.

Examples Of Automated Accessibility Checking Tools

The best-known accessibility checking tool was Bobby which was renamed as WebXact, a Web-based tool for reporting on the accessibility of a Web page and its compliance with W3C's WAI guidelines. However this tool is no longer available.

HiSoft's Cynthia Says provides an alternative accessibility and Web site checking facility - see <http://www.contentquality.com/>.

Note that it can be useful to make use of a multiple checking tools. W3C WAI provides a list of accessibility testing tools at <http://www.w3.org/WAI/ER/tools/>.

Typical Errors Flagged By Automated Tools

When you use testing tools warnings and errors will be provided about the accessibility of your Web site. A summary of the most common messages is given below.

No DOCTYPE: HTML resources must contain a DOCTYPE at the top of the HTML file, which defines the version of HTML used. To provide compliance with HTML standards this must be provided. Ideally this will be provided in the HTML template used by authors.
No Character Encoding: HTML resources should describe the character encoding of the document. This information should be provided. Ideally this will be provided in the HTML template used by authors.
No ALT Tags: ALT tags are used to provide a textual description of images. In order to comply with HTML standards ALT tags must be provided for all images.
Use Relative Sizes And Positioning Rather Than Absolute: Many HTML features can accept relative or absolute size units. In order to ensure that resources can be sized properly on non-standard devices relative values should be used.
Link Phrase Used More Than Once: If multiple links on a page have the same link text the links should point to the same resource.

Caveats

As mentioned previously, automated testing tools cannot confirm that a resource is accessible by itself - manual testing will be required to complement an automated approach. However automated tools can be used to provide an overall picture, to identify areas in which manual testing many be required and to identify problems in templates or in the workflow process for producing HTML resources and areas in which training and education may be needed.

Note that automated tools may sometimes give inaccurate or misleading results. In particular:

Use Of Frames: HTML resources which use frames may be incorrectly analysed by automated tools. You should ensure that the frameset page itself and all individual framed pages are accessible.
Use Of Redirects: HTML resources which use redirects may be incorrectly analysed by automated tools. You should ensure that the original page itself and the destination are accessible. Remember that redirects can be implemented in a number of ways, including server configuration options and use of JavaScript, <meta> tags, etc. on a Web page.
Use Of JavaScript: HTML resources which use JavaScript may be incorrectly analysed by automated tools. You should ensure that the source page itself and the output from the JavaScript are accessible.

Briefing 03

Use Of Proprietary Formats On Web Sites

Use Of Proprietary Formats

Although it is desirable to make use of open standards such as HTML when providing access to resources on Web sites there may be occasions when it is felt necessary to use proprietary formats. For example:

It is necessary to provide control over the page appearance which can only be achieved easily using formats such as MS Word, Adobe PDF, etc.
Open standards are not yet available or tools are not yet widely deployed.
Web access is used as a mechanism for delivering files.
Conversion to an open format would be resource-intensive.
The resource was created by a third-party and you do not have access to tools to convert the resource.

URL Naming Conventions For Access To Proprietary Formats

If it is necessary to provide access to a proprietary file format you should not cite the URL of the proprietary file format directly. Instead you should give the URL of a native Web resource, typically a HTML page. The HTML page can provide additional information about the proprietary format, such as the format type, version details, file size, etc. If the resource is made available in an open format at a later date, the HTML page can be updated to provide access to the open format - this would not be possible if the URL of the proprietary file was used.

An example of this approach is illustrated. In this case access to MS PowerPoint slides are available from a HTML page. The link to the file contains information on the PowerPoint version details.

Converting Proprietary Formats

Various tools may be available to convert resources from a proprietary format to HTML. Many authoring tools nowadays will enable resources to be exported to HTML format. However the HTML may not comply with HTML standards or use CSS and it may not be possible to control the look-and-feel of the generated resource.

Another approach is to use a specialist conversion tool which may provide greater control over the appearance of the output, ensure compliance with HTML standards, make use of CSS, etc.

If you use a tool to convert a resource to HTML it is advisable to store the generated resource in its own directory in order to be able to manage the master resource and its surrogate separately.

You should also note that some conversion tools can be used dynamically, allowing a proprietary format to be converted to HTML on-the-fly.

MS Word

MS Word files can be saved as HTML from within MS Word itself. However the HTML that is created is of poor quality, often including proprietary or deprecated HTML elements and using CSS in a form which is difficult to reuse.

MS PowerPoint

MS PowerPoint files can be saved as HTML from within MS PowerPoint itself. However the Save As option provides little control over the output. The recommended approach is to use the Save As Web Page option and then to chose the Publish button. You should then ensure that the HTML can be read by all browsers (and not just IE 4.0 or later). You should also ensure that the file has a meaningful title and the output is stored in its own directory.

Dynamic Conversion

In some circumstances it may be possible to provide a link to an online conversion service. Use of Adobe's online conversion service for converting files from PDF is illustrated.

It should be noted that this approach may result in a loss of quality from the original resource and is dependent on the availability of the remote service. However in certain circumstances it may be useful.

Briefing 04

Mothballing Your Web Site

About This Document

When the funding for a project finishes it is normally expected that the project's Web site will continue to be available in order to ensure that information about the project, the project's findings, reports, deliverables, etc. are still available.

This document provides advice on "mothballing" a project Web site.

Web Site Content

The entry point for the project Web site should make it clear that the project has finished and that there is no guarantee that the Web site will be maintained.

You should seek to ensure that dates on the Web site include the year - avoid content which says, for example, "The next project meeting will be held on 22 May".

You may also find it useful to make use of cascading style sheets (CSS) which could be used to, say, provide a watermark on all resources which indicate that the Web site is no longer being maintained.

Technologies

Although software is not subject to deterioration due to aging, overuse, etc. software products can cease to work over time. Operating systems upgrades, upgrades to software libraries, conflicts with newly installed software, etc. can all result in software products used on a project Web site to cease working.

It is advisable to adopt a defensive approach to software used on a Web site.

There are a number of areas to be aware of:

If you are using unusual configuration features for the Web server software the Web site may stop working if the server software is upgraded (e.g. a new version of the server software is installed) or replaced (you move from Microsoft's IIS software to Apache).
If you are using special features of the Web site's search engine software aspects of the Web site may cease to work if the search engine software is upgraded or replaced.
If you are using online forms on your Web site these may cease to work if the backend scripts are updated.
If you are using a Content Management System or server-side scripting technologies (e.g. PHP, ASP, etc.) on your Web site these may cease to work if the backend technologies are updated.
If you provide automated feedback, annotation tools, etc. which allow users to give comments on resources on your Web site there is a danger that the tools may be used to submit spam or obscene messages. With popular feedback tools there may be automated devices which will submit inappropriate messages automatically.

Process For Mothballing

We have outlined a number of areas in which a project Web site may degrade in quality once the project Web site has been "mothballed".

In order to minimise the likelihood of this happening and to ensure that problems can be addressed with the minimum of effort it can be useful to adopt a systematic set of procedures when mothballing a Web site.

It can be helpful to run a link checker across your Web site. You should seek to ensure that all internal links (links to resources on your own Web site) work correctly. Ideally links to external resources will also work, but it is recognised that this may be difficult to achieve. It may be useful to provide a link to a report of the link check on your Web site.

It would be helpful to provide documentation on the technical architecture of your Web site, which describes the server software used (including use of any unusual features), use of server-side scripting technologies, content management systems, etc.

It may also be useful to provide a mirror of your Web site by using a mirroring package or off-line browser. This will ensure that there is a static version of your Web site available which is not dependent on server-side technologies.

Contacts

You should give some thought to contact details provided on the Web site. You will probably wish to include details of the project staff, partners, etc. However you may wish to give an indication if staff have left the organisation.

Ideally you will provide contact details which are not tied down to a particular person. This may be needed if, for example, your project Web site has been hacked and the CERT security team need to make contact.

Planning For Mothballing

Ideally you will ensure that your plans for mothballing your Web site are developed when you are preparing to launch your Web site!

Briefing 05

Accessing Your Web Site On A PDA

About This Document

With the growing popularity in use of mobile devices and pervasive networking on the horizon we can expect to see greater use of PDAs (Personal Digital Assistants) for accessing Web resources.

This document describes a method for accessing a Web site on a PDA. In addition this document highlights issues which may make access on a PDA more difficult.

AvantGo

About

AvantGo is a well-known Web based service which provides access to Web resources on a PDA such as a Palm or Pocket PC.

The AvantGo service is freely available from <http://www.avantgo.com/>.

Once you have registered on the service you can provide access to a number of dedicated AvantGo channels. In addition you can use an AvantGo wizard to provide access to any publicly available Web resources on your PDA.

An example of two Web sites showing the interface on a Palm is illustrated.

Benefits

If you have a PDA you may find it useful to use it to provide access to your Web site, as this will enable you to access resources when you are away from your desktop PC. This may also be useful for your project partners. In addition you may wish to encourage users of your Web site to access it in this way.

Other Benefits

AvantGo uses robot software to access your Web site and process it in a format suitable for viewing on a PDA, which typically has more limited functionality, memory, and viewing area than a desktop PC. The robot software may not process a number of features which may be regarded as standard on desktop browsers, such as frames, JavaScript, cookies, plugins, etc.

The ability to access a simplified version of your Web site can provide a useful mechanism for evaluating the ease with which your Web site can be repurposed and for testing the user interface under non-standard environments.

You should be aware of the following potential problem areas:

Entry Point Not Contained In Project Directory: If the project entry point is not contained in the project's directory, it is likely that the AvantGo robot will attempt to download an entire Web site and not just the project area.
Frames: If your Web site contains frames and you do not use the appropriate option to ensure that the full content can be accessed by user agents which do not support frames (such as the AvantGo robot software) resources on your Web site will not be accessible.
Plugin Technologies: If your Web site contains technologies which require plugins (such as Flash, Java, etc.) you will not be able to access the resources.

Summary

As well as providing enhanced access to your Web site use of tools such as AvantGo can assist in testing access to your Web site. If your Web site makes use of open standards and follows best practices it is more likely that it will be usable on a PDA and by other specialist devices.

You should note, however, that use of open standards and best practices will not guarantee that a Web site will be accessible on a PDA.

Briefing 06

404 Error Pages On Web Sites

Importance Of 404 Error Pages

A Web sites 404 error page can be one of the most widely accessed pages on a Web site. The 404 error page can also act as an important navigational tool, helping users to quickly find the resource they were looking for. It is therefore important that 404 error pages provide adequate navigational facilities. In addition, since the page is likely to be accessed by many users, it is desirable that the page has an attractive design which reflects the Web sites look-and-feel.

Types Of 404 Error Pages

Web servers will be configured with a default 404 error page. This default is typically very basic.

In the example shown the 404 page provides no branding, help information, navigational bars, etc.

Figure 1: A Basic 404 Error Message

An example of a richer 404 error page is illustrated. In this example the 404 page is branded with the Web site's colour scheme, contains the Web site's standard navigational facility and provide help information.

Figure 2: A Richer 404 Error Message

Functionality Of 404 Error Pages

It is possible to define a number of types of 404 error pages:

Server Default: The server default 404 message is very basic. It will not carry any branding or navigational features which are relevant to the Web site.
Simple Branding, Navigational Features Or Help Information: The simplest approach to configuring a 404 page is to add some simple branding (such as the name of the Web site) or basic navigation features (link to the home page) or help information (an email address).
Richer Branding, Navigational Features, Help Information Or Additional Features: Some 404 pages will make use of the Web sites visual identity (such as a logo) and will contain a navigational bar which provides access to several areas of the Web site. In addition more complete help information may be provided as well as additional features such as a search facility.
Full Branding, Navigational Features, Help Information And Additional Features: A comprehensive 404 page will ensure that all aspects of branding, navigational features, help information and additional features such as a search facility are provided.
As Above Plus Enhanced Functionality: It is possible to provide enhanced functionality for 404 pages such as context sensitive help information or navigational facilities, feedback mechanisms to the page author, etc.

Further Information

An article on 404 error pages, based on a survey of 404 pages in UK Universities is available at <http://www.ariadne.ac.uk/issue20/404/>. An update is available at <http://www.ariadne.ac.uk/issue32/web-watch/>.

Briefing 07

Approaches To Link Checking

Why Bother?

There are several reasons why it is important to ensure that links on Web sites work correctly:

Web sites are based on hyperlinking, and if hyperlinks fail to work, the Web site can be regarded as not working correctly.
Broken links reflect badly on the body hosting the Web sites.
Hyperlinks are increasingly being used to deliver the functionality of Web sites, through links to JavaScript resources, style sheets files, metadata, etc. Broken links to these resources will result in the Web site not functioning as desired.

However there are resource implications in maintaining link integrity.

Approaches To Link Checking

A number of approaches can be taken to checking broken links.

Web site maintainer may run a link checking tool.
A server-based link checking tool may send email notification of broken links.
A remote link checking service may send email notification of broken links.
Web server error log files may be analysed for requests for non-existent resources.
Web server 404 error pages may provide a mechanism for users notifying the Web site maintainer of broken links.

Note that these approaches are not exclusive: Web site maintainers may choose to make use of several approaches.

Policy Issues

There is a need to implement a policy on link checking. The policy could be that links will not be checked or fixed - this policy might be implemented for a project Web site once the funding has finished. For a small-scale project Web site the policy may be to check links when resources are added or updated or if broken links are brought to the project's attention, but not to check existing resources - this is likely to be an implicit policy for some projects.

For a Web site one which has a high visibility or gives a high priority to the effectiveness of the Web site, a pro-active link checking policy will be needed. Such a policy is likely to document the frequency of link checking, and the procedures for fixing broken links. As an example of approaches taken to link checking by a JISC service, see the article about the SOSIG subject gateway [1].

Tools

Experienced Web developers will be familiar with desktop link-checking tools, and many lists of such tools are available [2] [3]. However desktop tools normally need to be used manually. An alternative approach is to use server-based link-checking software which send email notification of broken links.

Externally-hosted link-checking tools may also be used. Tools such as LinkValet [4] can be used interactively or in batch. Such tools may provide limited checking for free, with a licence fee for more comprehensive checking.

A popular approach is to make use of SSIs (server-side includes) to retrieve common features (such as headers, footers, navigation bars, etc.). This can be useful for storing HTML elements (such as the DOCTYPE declaration) in a manageable form. However this may cause validation problems if the SSI is not processed.

Another approach is to use a browser interface to tools, possibly using a Bookmarklet [5] although UKOLN's server-based ,tools approach [6] is more manageable.

Other Issues

It is important to ensure that link checkers check for links other than <a href=""...> and <img src="...">. There is a need to check external JavaScript, CSS, etc. files (linked to by the <link> tag) and that checks are carried out on personalised interfaces to resources.

It should also be noted that erroneous link error reports may sometimes be produced (e.g. due to misconfigured Web servers).

References

Approaches To 'Spring Cleaning' At SOSIG, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-25/>
Open Directory,
<http://dmoz.org/Computers/Software/Internet/Site_Management/Link_Management/>
Google Directory,
<http://directory.google.com/Top/Computers/Software/Internet/Site_Management/
Link_Management/>
LinkValet,
<http://www.htmlhelp.com/tools/valet/>
Bookmarklets,
<http://www.bookmarklets.com/>
A URI Interface To Web Testing Tools, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-59/>

Briefing 08

Search Facilities For Your Web Site

Background

Web sites which contain more than a handful of pages should provide a search facility. This is important for several reasons:

End users will not be aware of the size of your Web site, and so will wish to use a search tool to explore it.
End users will not necessary understand the browse structure of your Web site.
Search engines are one of the most widely used tools by end users.

Approaches To Providing Search Facilities

The two main approaches to the provision of search engines on a Web site are to host a search engine locally or to make use of an externally-hosted search engine.

Local Search Engine

The traditional approach is to install search engine software locally. The software may be open source (such as ht://Dig [1]) or licensed software (such as Inktomi [2]). It should be noted that the search engine software does not have to be installed on the same system as the Web server. This means that you are not constrained to using the same operating system environment for your search engine as your Web server.

Because the search engine software can hosted separately from the main Web server it may be possible to make use of an existing search engine service within the organisation which can be extended to index a new Web site.

Externally-Hosted Search Engines

An alternative approach is to allow a third party to index your Web site. There are a number of companies which provide such services. Some of these services are free: they may be funded by advertising revenue. Such services include Google [3], Atomz [4] and FreeFind [5].

Pros And Cons

Using a locally-installed search engine gives you control over the software. You can control the resources to be indexed and those to be excluded, the indexing frequency, the user interface, etc. However such control may have a price: you may need to have technical expertise in order to install, configure and maintain the software.

Using an externally-hosted search engine can remove the need for technical expertise: installing an externally-hosted search engine typically requires simply completing a Web form and then adding some HTML code to your Web site. However this ease-of-use has its disadvantages: typically you will lose the control over the resources to be indexed, the indexing frequency, the user interfaces, etc. In addition there is the dependency on a third party, and the dangers of a loss of service if the organisation changes its usage conditions, goes out of business, etc.

Trends

Surveys of search facilities used on UK University Web sites have been carried out since 1998 [6]. This provides information not only on the search engines tools used, but also to spot trends.

Since the surveys began the most widely used tool has been ht://Dig - an open source product. In recent years the licensed product Inktomi has shown a growth in usage. Interestingly, use of home-grown software and specialist products has decreased - search engine software appears now to be a commodity product.

Another interesting trend appears to be in the provision of two search facilities; a locally-hosted search engine and a remote one - e.g. see the University of Lancaster [7].

References

ht://Dig,
<http://www.htdig.org/>
Inktomi,
<http://www.inktomi.com/>
Google,
<http://www.google.com/>
Atomz,
<http://www.atomz.com/>
FreeFind,
<http://www.freefind.com/>
Surveys of Search Engines on UK University Web Sites,
<http://www.ukoln.ac.uk/web-focus/surveys/uk-he-search-engines/>
University of Lancaster Search Page,
<http://www.lancs.ac.uk/search.htm>

Briefing 09

Image QA in the Digitisation Workflow

Introduction

Producing an archive of high-quality images with a server full of associated delivery images is not an easy task. The workflow consists of many interwoven stages, each building on the foundations laid before. If, at any stage, image quality is compromised within the workflow, it has been totally lost and can never be redeemed.

It is therefore important that image quality is given paramount consideration at all stages of a project from initial project planning through to exit strategy.

Once the workflow is underway, quality can only be lost and the workflow must be designed to capture the required quality right from the start and then safeguard it.

Image QA

Image QA within a digitisation project's workflow can be considered a 4-stage process.

1 Strategic QA

Strategic QA is undertaken in the initial planning stages of the project when the best methodology to create and support your images, now and into the future will be established. This will include:

Local Search Engine

Choosing the correct file types and establishing required sizes
Sourcing and benchmarking all equipment
Establishing capture guidelines
Selecting technical metadata

2 Process QA

Process QA is establishing quality control methods within the image production workflow that support the highest quality of capture and image processing, including:

Establishing best 'image capture' and 'image processing' methodology and then standardising and documenting this best practice
Regularly calibrating & servicing all image capture and processing equipment
Training operators and encouraging a pride in quality of work
Accurate capture of metadata

3 Sign-off QA

Sign-off QA is implementing an audited system to assure that all images and their associated metadata are created to the established quality standard. A QA audit history is made to record all actions undertaken on the image files.

Every image must be visually checked and signed off with name and time recorded within audit history
All metadata must be reviewed by operator and signed off with name and time
Equipment must be calibrated and checked regularly
All workflow procedures reviewed and updated as necessary

4 On-going QA

On-going QA is implementing a system to safeguard the value and reliability of the images into the future. However good the initial QA, it will be necessary to have a system that can report, check and fix any faults found within the images and associated metadata after the project has finished. This system should include:

Fault report system that allows faults to be checked and then if possible fixed
Provision for ongoing digital preservation (including migration of image data)
Ownership and responsibility for images, metadata and IMS
A reliable system for the on-going creation of surrogate images as required

QA in the Digitisation Workflow

Much of the final quality of a delivered image will be decided, long before, in the initial 'Strategic' and 'Process' QA stages where the digitisation methodology is planned and equipment sourced. However, once the process and infrastructure are in place it will be the operator who needs to manually evaluate each image within the 'Sign-off' QA stage. This evaluation will have a largely subjective nature and can only be as good as the operator doing it. The project team is the first and last line of defence against any drop in quality. All operators must be encouraged to take pride in their work and be aware of their responsibility for its quality.

It is however impossible for any operator to work at 100% accuracy for 100% of the time and faults are always present within a productive workflow. What is more important is that the system is able to accurately find the faults before it moves away from the operator. This will enable the operator to work at full speed without having to worry that they have made a mistake that might not be noticed.

The image digitisation workflow diagram in this document shows one possible answer to this problem.

Acknowledgements:

This document was written by TASI, the Technical Advisory Service For Images.

Briefing 10

Enhancing Web Site Navigation Using The LINK Element

Introduction

This document provides advice on how the HTML <link> element can be used to improve the navigation of Web sites.

The LINK Element

About

The purpose of the HTML <link> element is to specify relationships with other documents. Although not widely used the <link> element provides a mechanism for improving the navigation of Web sites.

The <link> element should be included in the <head> of HTML documents. The syntax of the element is: <link rel=”relation” href=”url”>. The key relationships which can improve navigation are listed below.

**Table 1: Key Link Relations**
Relation	Function
next	Refers to the next document in a linear sequence of documents.
prev	Refers to the previous document in a linear sequence of documents.
home	Refers to the home page or the top of some hierarchy.
first	Refers to the first document in a collection of documents.
contents	Refers to a document serving as a table of contents.
help	Refers to a document offering help.
glossary	Refers to 1 document providing a glossary of terms that pertain to the current document.

Benefits

Use of the <link> element enables navigation to be provided in a consistent manner as part of the browser navigation area rather than being located in an arbitrary location in the Web page. This has accessibility benefits. In addition browsers can potential enhance the performance by pre-fetching the next page in a sequence.

Browser Support

A reason why <link> is not widely used has been the lack of browser support. This has changed recently and support is now provided in the latest versions of the Opera and Netscape/Mozilla browsers and by specialist browsers (e.g. iCab and Lynx).

Since the <link> element degrades gracefully (it does not cause problems for old browser) use of the <link> element will cause no problems for users of old browsers.

An illustration of how the <link> element is implemented in Opera is shown below.

Browser Support For The <link>Element
Figure 1: Browser Support For The <link> Element

In Figure 1 a menu of navigational aids is available. The highlighted options (Home, Contents, Previous and Next) are based on the relationships which have been defined in the document. Users can use these navigational options to access the appropriate pages, even though there may be no corresponding links provided in the HTML document.

Information Management Challenges

It is important that the link relationships are provided in a manageable way. It would not be advisable to create link relationships by manually embedding them in HTML pages if the information is liable to change.

It is advisable to spend time in defining the on key navigational locations, such as the Home page (is it the Web site entry point, or the top of a sub-area of the Web site). Such relationships may be added to templates included in SSIs. Server-side scripts are a useful mechanism for exploiting other relationships, such as Next and Previous - for example in search results pages.

Further Information

Additional information is provided at
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-10/>.

Briefing 11

What Are Open Standards?

Background

The use of open standards can help provide interoperability and maximise access to resources and services. However this raises two questions: "Why open standards?" and "What are open standards?".

Why Open Standards?

Open standards can provide several benefits:

Application Independence: To ensure that access to resources is not dependent on a single application.
Platform Independence: To ensure that access to resources is not restricted to particular hardware platforms.
Long-term Access: To ensure that quality scholarly resources can be preserved and accessed over a long time frame.
Architectural Integrity: To ensure that the architectural framework for IT developments is robust and can be further developed in the future.

What Are Open Standards?

The term "open standards" is ambiguous. As described in Wikipedia "There is no single definition and interpretations vary with usage" [1]. The EU's definition is [2]:

The standard is adopted and will be maintained by a not-for-profit organisation, and its ongoing development occurs on the basis of an open decision-making procedure available to all interested parties (consensus or majority decision etc.).
The standard is adopted and will be maintained by a not-for-profit organisation, and its ongoing development occurs on the basis of an open decision-making procedure available to all interested parties (consensus or majority decision etc.).
The intellectual property - i.e. patents possibly present - of (parts of) the standard is made irrevocably available on a royalty-free basis.
There are no constraints on the re-use of the standard.

Some examples of recognised open standards bodies are given in Table 1.

**Table 1: Examples Of Independent Standards Organisations**
Standards Body	Comments
W3C	World Wide Web Consortium (W3C). Responsible for the development of Web standards (recommendations). See <http://www.w3.org/TR/>. Relevant standards include HTML, XML, CSS, SMIL, SVG, etc.
IETF	Internet Engineering Task Force (IETF). Responsible for the development of Internet standards (known as IETF RFCs). See list of IETF RFCs at <http://www.ietf.org/rfc.html>. Relevant standards include HTTP, MIME, etc.
ISO	International Organisation For Standardization (ISO). See <http://www.iso.org/iso/en/stdsdevelopment/whowhenhow/how.html>. Relevant standards areas include character sets, networking, etc.
NISO	National Information Standards Organization (NISO). See <http://www.niso.org/>. Relevant standards include Z39.50.
IEEE	Institute of Electrical and Electronics Engineers (IEEE). See <http://www.ieee.org/>.
ECMA	ECMA International. Association responsible for standardisation of Information and Communication Technology Systems (such as JavaScript). See <http://www.ecma-international.org/>.

Other Types Of Standards

The term proprietary refers to formats which are owned by an organisation, group, etc. The term industry standard is often used to refer to a widely used proprietary standard. For example, the proprietary Microsoft Excel format is sometimes referred to as an industry standard for spreadsheets. To make matters even more confusing, the prefix is sometime omitted and MS Excel can be referred to as a standard.

To further confuse matters, companies which own proprietary formats may choose to make the specification freely available. Alternatively third parties may reverse engineer the specification and publish the specification. In addition tools which can view or create proprietary formats may be available on multiple platforms or as open source.

In all these cases, although there may appear to be no obvious barriers to use of the proprietary format, such formats should not be classed as open standards as they have not been approved by a neutral standards body. The organisation owning the format may chose to change the format or the usage conditions at any time. File formats in this category include Microsoft Office formats, Macromedia Flash and Java.

References

Open Standard, Wikipedia, <http://en.wikipedia.org/wiki/Open_standard>
Open Standard: European Union definition, Wikipedia, <http://en.wikipedia.org/wiki/Open_standard#European_Union_definition>

Briefing 12

How To Evaluate A Web Site's Accessibility Level

Background

Many Web developers and administrators are conscious of the need to ensure that their Web sites reach as high a level of accessibility as possible. But how do you actually find out if a site has accessibility problems? Certainly, you cannot assume that if no complaints have been received through the site feedback facility (assuming you have one), there are no problems. Many people affected by accessibility problems will just give up and go somewhere else.

So you must be proactive in rooting out any problems as soon as possible. Fortunately there are a number of handy ways to help you get an idea of the level of accessibility of the site which do not require an in-depth understanding of Web design or accessibility issues. It may be impractical to test every page, but try to make sure you check the Home page plus as many high traffic pages as possible.

Get A Disabled Person To Look At The Site

If you have a disability, you have no doubt already discovered whether your site has accessibility problems which affect you. If you know someone with a disability which might prevent them accessing information in the site, then ask them to browse the site, and tell you of any problems. Particularly affected groups include visually impaired people (blind, colour blind, short or long sighted), dyslexic people and people with motor disabilities (who may not be able to use a mouse). If you are in Higher Education your local Access Centre [1] may be able to help.

View The Site Through A Text Browser

Get hold of a text browser such as Lynx [2] and use it to browse your site. Problems you might uncover include those caused by images with no, or misleading, alternative text, confusing navigation systems, reliance on scripting or poor use of frames.

Browse The Site Using A Speech Browser

You can get a free evaluation version of IBM's Homepage Reader [3] or pwWebSpeak [4], speech browsers used by many visually impaired users of the Web. The browsers "speak" the page to you, so shut your eyes and try to comprehend what you are hearing.

Alternatively, try asking a colleague to read you the Web page out loud. Without seeing the page, can you understand what you're hearing?

Look At The Site Under Different Conditions

As suggested by the World Wide Web Consortium (W3C) Web Accessibility Initiative (WAI) [5], you should test your site under various conditions to see if there are any problems including (a) graphics not loaded (b) frames, scripts and style sheets turned off and (c) browsing without using a mouse. Also, try using bookmarklets or favelets to test your Web site under different conditions: further information on accessibility bookmarklets can be found at [6].

Check With Automatic Validation Tools

There are a number of Web-based tools which can provide valuable information on potential accessibility problems such as Rational Policy Tester Accessibility [7] and The Wave tools [8]. You should also check whether the underlying HTML of your site validates to accepted standards using the World Wide Web Consortium's MarkUp Validation Service [9] as non-standard HTML can also cause accessibility problems

Acting on Your Observations

Details of any problems found should be noted: the effect of the problem, which page was affected, plus why you think the problem was caused. You are unlikely to catch all accessibility problems in the site, but the tests described here will give you an indication of whether the site requires immediate attention to raise accessibility. Remember that improving accessibility for specific groups, such as visually impaired people, will often have usability benefits for all users.

Commission an Accessibility Audit

Since it is unlikely you will catch all accessibility problems and the learning curve is steep, it may be advisable to commission an expert accessibility audit. In this way, you can receive a comprehensive audit of the subject site, complete with detailed prioritised recommendations for upgrading the level of accessibility of the site. Groups which provide such audits include the Digital Media Access Group, based at the University of Dundee or the RNIB, who also audit Web sites for access to the blind.

Further Information

Additional information is provided at
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-12/>.

Acknowledgments

This document was written by David Sloan, DMAG, University of Dundee and originally published at by the JISC TechDis service We are grateful for permission to republish this document.

References

Access Centres,
http://www.nfac.org.uk/
Lynx,
http://lynx.isc.org/release/
Homepage Reader, IBM,
http://www-3.ibm.com/able/solution_offerings/hpr.html
pwWebSpeak,
http://www.soundlinks.com/pwgen.htm
Web Content Accessibility Guidelines, Appendix A, W3C WAI,
http://www.w3.org/TR/WAI-WEBCONTENT/
Bookmarklets: An aid to checking the accessibility of your website, Nicola McIlroy
http://www.dmag.org.uk/resources/design_articles/bookmarklets.asp
Rational Policy Tester Accessibility,
http://www-306.ibm.com/software/awdtools/tester/policy/accessibility/
WAVE,
http://www.wave.webaim.org/
W3C HTML Validator, W3C,
http://validator.w3.org/

Briefing 13

Software Code Development

About

This document gives high-level advice for people who develop software for use either internally within a project or for use externally as a project deliverable.

Background

Each computer programming language has its own coding conventions. However there are a number of general points that you can follow to ensure that your code is well organised and can be easily understood by others. These guidelines are not in any way mandatory but attempt to formalise code so that reading, reusing and maintaining code is easier. Most coding standards are arbitrary but adopting some level of consistency will help create better software.

The key point to remember is that good QA practice involves deciding on and recording a number of factors with your programming team before the onset of your project. Having such a record will allow you to be consistent.

Documentation

In order for programmers to use your software it is important that you include clear documentation. This documentation will take the form of internal and external documentation.

External documentation explains how the software will be used. Internal documentation explains how to implement the code.
Comments - Comments should be added to your code to explain implementation details of the source code. Avoid adding obvious or lengthy information. Prior to your project you should agree on how frequent comments should be and their location, format and length in the file. These conventions may need to be agreed on for block, single-line and end-of-line comments.
Readme file - Every software package should contain a readme file describing and the purpose and functionality of the software and information on external dependencies.

Naming Conventions

Naming conventions of files, procedures and variables etc. should be sensible and meaningful and agreed on before the projects starts. Use of capitalisation may vary in different programming languages but it is sensible to avoid names that differ only in case or look very similar. Also avoid names that conflict with standard library names.

Code

There are a number of key points to remember when writing your code:

Linearity - If using a procedural language make sure your code is linear and starts at the first executable statement and continues to a final return or end of block statement.
If constructs - Avoid complicated "if" constructs. It is better to use several simpler nested "if" constructs rather than a complicated compound one. Generally keep it simple.
Layout - Code layout is very important. It should be formatted to provide visual clues to the flow of the implementation. It is useful to agree on factors such as indentation, location of brackets, use of white space and line spacing used before the project starts. For example how long will your lines be? Will you use tabs or spaces?
External Constants - Define constant values outside of the code as this makes maintenance easier. Changing hard-coded constants can be time-consuming and prone to human error.
Error Handling - It is also important that you write in some form of error handling into the code.
Portability - Portable code allows the source file to be compiled with any compiler and executed on any machines and operating system. Creating portable code is fairly complex. It is useful to keep machine dependent and machine independent code in separate files.

Standards

Standards are "documented agreements containing technical specifications or other precise criteria to be used consistently as rules, guidelines, or definitions of characteristics, to ensure that materials, products, processes and services are fit for their purpose" (ISO 1997). The aim of international standards is to encapsulate most appropriate current practice. The International Organization for Standardization (ISO) [1] is the head of all national standardisation bodies. The most relevant ISO standard for software code development is ISO 9000-3: 1997 (QA for the development, supply, installation and maintenance of computer software). For other relevant standards also check the Institute of Electrical and Electronics Engineers [2] and the American National Standards Institute [3].

Project QA

At the start of development it may help to ask your team the following questions:

Do you have local guidelines for writing code?
Are your software staff aware of the conventions to be used?
Do you have procedures in place for use when creating local software?

References

The International Organization for Standardization (ISO)
<http://www.iso.ch/>
Institute of Electrical and Electronics Engineers (IEEE)
<http://www.ieee.org/
American National Standards Institute (ANSI)
<http://www.ansi.org/

Further information on Software QA at Sticky Minds
<http://www.stickyminds.com/>

Briefing 14

Creating and Testing Web Forms

Background

A Web form is not dissimilar in appearance and use to a paper form. It appears on a Web page and contains special elements called controls along with normal text and mark up. These controls can take the form of checkboxes, text boxes, radio buttons, menus, etc. Users generally fill in the form by entering text and selecting menu items and then submit the form for processing. The agent could be an email or Web server.

Web forms have a variety of uses and are a way to get specific pieces of information from the user. Web sites with forms have their own specific set of problems and need vigorous testing to ensure that they work.

Designing Forms

Some of the key things to consider when designing your form are:

Mandatory Fields

Making fields compulsory can cause problems. Occasionally a user may feel that the question you have asked is inappropriate in context or they just can't provide the information. You need to decide if the information needed is absolutely necessary. Users will be more likely to give information if you explain why the data that you're asking for is needed. It is acceptable to ask the user if they meant to leave out a piece of information and then accept their answer.

Validation of forms can be carried out either client side or by processing on the server. Client side validation requires the use of a scripting language like JavaScript and can be problematic if the user's browser disallows scripting. However server side validation can be more complicated to set up.

Drop Down Lists

Sometimes the categories you offer in a drop down list do not match the answer that the user wants to give you. Sites from the USA often ask for states, which UK users cannot provide. If you want to use a drop down list make sure that your error messages are helpful rather than negative and allow users to select an 'other' option. If you have given a good selection of categories then you should rarely get users picking this.

Also consider if the categories you have provided are subjective enough. There may be issues over the terms used to refer to particular countries (for example if a land area is disputed. If you have to provide a long drop down list then it might be worth offering the common categories first. You could also try subdividing the categories into two-drop downs where the selection from the first dynamically creates the options in the second.

Separate Display

You may wish to have the user see a new page or sidebar when filling in a form. A new page may be easier to look at but can be annoying if it is perceived as a diversion or, even worse, an advertisement. It may also be prevented from opening by window blocking software available on newer browsers.

User Errors

Users will often make typing or transcription errors when filling a form in. These errors can occur in any free text fields on the form.

Occasionally users will press the submit or send button either deliberately or inadvertently when only part-way through the form. Make sure that you have an appropriate error message for this and allow users to go back to the unfinished form. Users also often fill in part of a form and then click on the back button. They may be doing this to lose the data they have filled in, to check previous data or because they think they have finished. These activities suggest poor user interface design.

It is important to provide a helpful message on the submission screen explaining the form has been submitted successfully. You could also give replicate the details inputted for users to print out as hard copy.

Testing Forms

Once you have created your Web form you need to test it thoroughly before release. There are a number of different free software products available that will help you with your testing. Tools such as Roboform [1] are freely available and can be used to store test data in and automatically fill in your forms with data.

When testing your form it is worth bearing in mind some problem areas:

Character sets: If you require users to fill in their names you will have to be ready to deal with different character sets. Creating different characters to test with can be problematic but services such as BabelMap [2] can help with this.
Checking Scripts: Be sure to check you common gateway interface (CGI), server-side scripts and client-side scripts by submitting and resetting form data.
Tab Order: Often when creating a form information is moved about. That is why it is important that you check the tab order of your form. Tab order is especially important for people using screen readers.

References

Roboform,
<http://www.roboform.com/>
BabelMap,
<http://www.babelstone.co.uk/Software/BabelMap.html>

Briefing 15

The Purpose Of Your Project Web Site

Background

Before creating a Web site for your project you should give some thought to the purpose of the Web site, including the aims of the Web site, the target audiences, the lifetime, resources available to develop and maintain the Web site and the technical architecture to be used. You should also think about what will happen to the Web site once project funding has finished.

Purposes

Your project Web site could have a number of purposes. For example:

The Web site could provide information about the project.
The Web site could provide access to the project deliverables.
The Web site could be used to support communications with members of the project team.
The Web site could act as a repository of information about the management of the projects, including minutes of meetings, reports to funders, etc.

Your Web site could, of course, fulfill more than a single role. Alternatively you may choose to provide more than one Web site.

Why You Need To Think About The Different Purposes

You should have an idea of the purposes of your project Web site before creating it for a number of reasons:

You may wish to have more stringent QA procedures for Web sites which are intended for a broad audience and which is intended to have a longer lifetime.
You may wish to be proactive in promoting a Web site intended for public use.
You may wish to be proactive in ensuring that a Web site which is not intended for public use does not become indexing by search engines or is not linked to be mistake.
You may wish to allow a public Web site to be indexed and archived by third parties.
You may wish to ensure that a Web site intended for use by project partners or other closed communities is not indexed or archived, especially if there may be confidentiality or data protection issues.

Web Site For Information About The Project

Once funding has been approved for your project Web site you may wish to provide information about the project, often prior to the official launch of the project and before project staff are in post. There is a potential danger that this information will be indexed by search engines or treated as the official project page. You should therefore ensure that the page is updated once an official project Web site is launched so that a link is provided to the official project page. You may also wish to consider stopping search engines from indexing such pages by use of the Standard For Robot Exclusion [1].

Web Site For Access To Project Deliverables

Many projects will have an official project Web site. This is likely to provide information about the project such as details of funding, project timescales and deliverables, contact addresses, etc. The Web site may also provide access to project deliverables, or provide links to project deliverables if they are deployed elsewhere or are available from a repository. Usually you will be proactive in ensuring that the official project Web site is easily found. You may wish to submit the project Web site to search engines.

Web Site To Support Communications With Project Partners

Projects with several partners may have a Web site which is used to support communications with project partners. The Web site may provide access to mailing lists, realtime communications, decision-making support, etc. The JISCMail service may be used or commercial equivalents such as YahooGroups. Alternatively this function may be provided by a Web site which also provides a repository for project resources.

Web Site As Repository For Project Resources

Projects with several partners may have a Web site which is used to provide a repository for project resources. The Web site may contain project plans, specifications, minutes of meetings, reports to funders, financial information, etc. The Web site may be part of the main project Web site, may be a separate Web site (possibly hosted by one of the project partners) or may be provided by a third party. You will need to think about the mechanisms for allowing access to authorised users, especially if the Web site contains confidential or sensitive information.

References

robots.txt Robots Exclusion Standard,
<http://www.robotstxt.org/>

Briefing 16

URI Naming Conventions For Your Project Web Site

Background

Once you have agreed on the purpose(s) of your project Web site(s) [1] you will need to choose a domain name for your Web site and conventions for URIs. It is necessary to do this since this can affect (a) The memorability of the Web site and the ease with which it can be cited; (b) The ease with which resources can be indexed by search engines and (c) The ease with which resources can be managed and repurposed.

Domain Name

You may wish to make use of a separate domain name for your project Web site. If you wish to use a .ac.uk domain name you will need to ask UKERNA. You should first check the UKERNA rules [2]. A separate domain name has advantages (memorability, ease of indexing and repurposing, etc) but t his may not be appropriate, especially for short-term projects. Your organisation may prefer to use an existing Web site domain.

URI Naming Conventions

You should develop a policy for URIs for your Web site which may include:

Conventions on use of case (e.g. specifying that all resources should be in lower case), separators (e.g. a hyphen should be used to separate components of a URI) and permitted characters (e.g. spaces should not be used in URIs).
Conventions on the directory structure. The directory structure may be based on the main functions provided by your Web site.
Conventions on dates and version control. You may wish to agree on a convention for including dates in URIs. You may also wish to agree on a convention for version control (which could make use of date information).
Conventions for file names and formats.

Issues

Grouping Of Resources

It is strongly recommended that you make use of directories to group related resources. This is particularly important for the project Web site itself and for key areas of the Web site. The entry point for the Web site and key areas should be contained in the directory itself: e.g. use http://www.foo.ac.uk/bar/ to refer to project BAR and not http://www.foo.ac.uk/bar.html) as this allows the bar/ directory to be processed in its entirety, independently or other directories. Without this approach automated tools such as indexing software, and tools for auditing, mirroring, preservation, etc. would process other directories.

URI Persistency

You should seek to ensure that URIs are persistent. If you reorganise your Web site you are likely to find that internal links may be broken, that external links and bookmarks to your resources are broken, that citations to resources case to work. Y ou way wish to provide a policy on the persistency of URIs on your Web site.

File Names and Formats

Ideally the address of a resource (the URI) will be independent of the format of the resource. Using appropriate Web server configuration options it is possible to cite resources in a way which is independent of the format of the resource. This should allow easy of migration to new formats (e.g. HTML to XHTML) and, using a technology known as Transparent Content Negotiation [3] provide access to alternative formats (e.g. HTML or PDF) or even alternative language versions.

File Names and Server-Side Technologies

Ideally URIs will be independent of the technology used to provide access to the resource. If server-side scripting technologies are given in the file extension for URIs (e.g. use of .asp, .jsp, .php, .cfm, etc. extensions) changing the server-side scripting technology would probably require changing URIs. This may also make mirroring and repurposing of resources more difficult.

Static URIs Or Query Strings?

Ideally URIs will be memorable and allow resources to be easily indexed and repurposed. However use of Content Management Systems or databases to store resources often necessitates use of URIs which contain query strings containing input parameters to server-side applications. As described above this can cause problems.

Possible Solutions

You should consider the following approaches which address some of the concerns:

Using file extensions: e.g. foo refers to foo.html or foo.asp
Using directory defaults: e.g. foo/ refers to foo/intro.html or foo/intro.asp
Rewriting dynamic URIs to static URIs

References

The Purpose Of Your Project Web Site
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-15/html/>
UKERNA
<http://www.ukerna.ac.uk>
Transparent Content Negotiation
<http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html>

Briefing 17

Performance Indicators For Your Project Web Site

Background

It is desirable to measure usage of your project Web site as this can give an indication of its effectiveness. Measuring how the Web site is being used can also help in identifying the usability of the Web site. Monitoring errors when users access your Web site can also help in identifying problem areas which need to be fixed.

However, as described in this document, usage statistics can be misleading. Care must be taken in interpreting statistics. As well as usage statistics there are a number of other types of performance indicators which can be measured.

It is also important that consistent approaches are taken in measuring performance indicators in order to ensure that valid comparisons can be made with other Web sites.

Web Statistics

Web statistics are produced by the Web server software. The raw data will normally be produced by default - no additional configuration will be needed to produce the server's default set of usage data.

The server log file records information on requests (normally referred to as a "hit") for a resource on the web server. Information included in the server log file includes the name of the resource, the IP address (or domain name) of the user making the request, the name of the browser (more correctly, referred to as the "user agent") issuing the request, the size of the resource, date and time information and whether the request was successful or not (and an error code if it was not). In addition many servers will be configured to store additional information, such as the "referer" (sic) field, the URL of the page the user was viewing before clicking on a link to get to the resource.

Tools

A wide range of Web statistical analysis packages are available to analyse Web server log files [1]. A widely used package in the UK HE sector is WebTrends [2].

An alternative approach to using Web statistical analysis packages is to make use of externally-hosted statistical analysis services [3]. This approach may be worth considering for projects which have limited access to server log files and to Web statistical analysis software.

Configuration Issues

In order to ensure that Web usage figures are consistent it is necessary to ensure that Web servers are configured in a consistent manner, that Web statistical analysis packages process the data consistently and that the project Web site is clearly defined.

You should ensure that (a) the Web server is configured so that appropriate information is recorded and (b) that changes to relevant server options or data processing are documented.

Limitations

You should be aware that the Web usage data does not necessarily give a true indication of usage due to several factors:

Effects of caching.
Effects of access from robots, off-line browsers, auditing tools, etc.
Difficulties of measuring unique visitors, browser types, etc. accurately.
Difficulties of defining terms such as sessions.

Despite these reservations collecting and analysing usage data can provide valuable information.

Other Types Of Indicators

Web usage statistics are not the only type of performance indicator which can be used. You may also wish to consider:

Monitoring the number of links to your Web site: tools such as LinkPopularity.com can report on the number of links to your Web site.
Numbers of resources indexed: You can analyse the numbers of resources indexed by search engines such as Google.
Error log analysis: Analysis of your server log error file can indicate problem areas.

With all of the indicators periodic reporting will allow trends to be detected.

Conclusions

It may be useful to determine a policy on collection and analysis of performance indicators for your Web site prior to its launch.

References

Web server log files, UKOLN,
<http://www.ukoln.ac.uk/nof/support/help/papers/performance/>
WebTrends,
<http://www.netiq.com/webtrends/>
Externally-hosted statistical analysis services, Exploit Interactive, Issue 5, April 2000,
<http://www.exploit-lib.org/issue5/indicators/>

Briefing 18

QA Procedures For The Design Of CAD Data Models

Background

The creation of CAD (Computer Aided Design) models is an often complex and confusing procedure. To reduce long-term manageability and interoperability problems, the designer should establish procedures that will monitor and guide system checks.

Establish CAD Layout Standards

Interoperability problems are often caused by poorly understood or non-existent operating procedures for CAD. It is wise to establish and document your own CAD procedures, or adopt one of the national standards developed by the BSI (British Standards Institution) or NIBS (National Institute of Building Sciences). These may be used to train new members in the house-style of a project, provide essential information when sharing CAD data among different users, or provide background material when depositing the designs with a preservation repository. Particular areas to standardize include:

Drawing sheet templates
Paper layouts
Text fonts, dimensions, line types and line weights
Layer naming conventions
File naming conventions

Procedures on constructing your own CAD standard can be found in the Construct IT guidelines (see references).

Be Consistent With Layers And Naming Conventions

When creating CAD data models, a consistent approach to layer creation and naming conventions is useful. This will avoid confusion and increases the likelihood that the designer will be able to manipulate and search the data model at a later date.

The designer has two options to ensure interoperability:

Create layers that divide the object according to pre-defined criteria. E.g. a model building may be divided into building part, building phase, site stratum, material, chronological standing, etc. The placement of too many objects on a single layer will increase the computational requirements to process the model and cause unexpected problems when moving objects between layers.
Establish a layer name convention that is consistent and appropriate to avoid confusion in complex CAD model. Many users use 'wall', 'wall1', 'door', etc. to describe specific objects. This is likely to become confusing and difficult to identify when the design becomes complex. Layer names should be short and descriptive. A possible solution is the CSA layer-naming convention that uses each character in the layer name to describe its position within the model.

Ensure Tolerances Are Consistent

When exporting designs between different CAD applications it is common for model relationships to disintegrate, causing entities to appear disconnected or disappear from the design altogether. A common cause is the use of different tolerance levels - a method of placing limits on gaps between geometric entities. The method of calculating tolerance often varies in different applications: some use absolute tolerance levels (e.g. 0.005mm), others work to a tolerance level relative to the model size (e.g. 10-4 the size), while others have different tolerances according to the units used. When considering moving a design between different applications it is useful to ensure the tolerance level can be set to the same value and identify potential problem areas that may be corrupted when the data model is reopened.

Check For Illegal Geometry Definitions

Interoperability problems are also caused by differences in how the system identifies invalid geometry definitions, such as the three-sided degenerate NURBS surfaces. Some systems allow the creation of such entities, others will reject them, whereas some systems know that they are not permissible and in an effort to prevent them from being created, generate twisted four sided surfaces.

Further Information

AHDS Guides to Good Practice: CAD, AHDS,
<http://ads.ahds.ac.uk/project/goodguides/cad/>
National CAD Standard,
<http://www.nationalcadstandard.org/>
CAD Standards: Develop and Document,
<http://www.arch.vuw.ac.nz/papers/bbsc303/assign2/198mc.htm>
Construct I.T: Construct Your Own CAD Standard, (Note direct URL to document is not available)
<http://www.construct-it.org.uk/>
Common Interoperability Problems in CAD,
<http://www.caduser.com/reviews/reviews.asp?a_id=148>

Briefing 19

Making Software Changes to a Web Site

Background

It is desirable to minimise the time a Web site is unavailable. However it may be necessary to bring a Web server down in order to carry out essential maintenance. This document lists some areas to consider if you wish to minimum down time.

Planning

The key to any form of critical path situation is planning. Planning involves being sure about what needs to be done and being clear about how it can be done. Quality Assurance is vital at this stage and final 'quality' checking is often the last act before a site goes live or a new release is launched.

Prior to Down Time

Collect statistics on your site and find out what time and on what day it has the least number of visitors and activity. If it is possible (staff allowing) choose this time to take the site offline.
Before bringing the site down hold a meeting. At this draw up a schedule of what needs to be done and who will complete each task. Create a checklist of testing to be done.
If the whole site will be offline post a maintenance page stating that your site will be down temporarily. Make sure this page is created in advance.

During Down Time

Make all modifications to the site such as installing new software, changing databases etc.
Review all configuration settings and check that the correct files are in place.
Check all server services are running. Ensuring services such as secure sockets layer (SSL) are running is vital if you are working in a business environment.
Check any INI, property files, or other configuration files that may have been changed. A list of configuration file that may change could be created prior to down time.
Use the checklist to check all other relevant change areas. You should look at:
- Visual changes - you could add icons to new pages
- Functionality changes
Run some general user tests, such as ordering a book, retrieving information from the database, submitting a form. It is worth anticipating in advance some scenario-specific check areas that can be looked at.

After Down Time

Again run some general user tests. These should be run from inside and outside your firewall and on a variety of PCs, browsers, etc.
Check that all links to third parties are working correctly.
Keep an eye on how the site runs for the next few days and watch for cracks.
It is important that all the technical support team are notified of any changes that have been made and a problem reporting system is in place.
Have some form of software or system installed that can inform you of unexpected down time or errors [1].

Conclusions

Advance preparation is vital if you want to minimise time your site downtime and avoid confusion when installing new releases.

References

Error Detection on the UKOLN Web site, QA Focus, UKOLN,
< http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-14/>

Briefing 20

Documenting Digitisation Workflow

Background

By documenting the workflow of digitisation, a life history can be built-up for each digitised item. This information is an important way of recording decisions, tracking problems and helping to maintain consistency and give users confidence in the quality of your work.

What to Record

Workflow documentation should enable us to tell what the current status of an item is, and how it has reached that point. To do this the documentation needs to include important details about each stage in the digitisation process and its outcome.

What action was performed at a specific stage? Identify the action performed. For example, resizing an image.
Why was the action performed? Establish the reason that a change was made. For example, a photograph was resized to meet pre-agreed image standards.
When was the action performed? Indicate the specific date the action was performed. This will enable project development to be tracked through the system.
How was the action performed? Ascertain the method used to perform the action. A description may include the application in use, the machine ID, or the operating system.
Who performed the action? Identify the individual responsible for the action. This enables actions to be tracked and identify similar problems in related data.

The actual digitisation of an item is clearly the key point in the workflow and therefore formal capture metadata (metadata about the actual digitisation of the item) is particularly important.

Where to Record the Information

Where possible, select an existing schema with a binding to XML:

TEI (Text Encoding Initiative) and EAD (Encoded Archival Description) for textual documents.
NISO Z39.87 for digital still images.
SMIL (Synchronized Multimedia Integration Language), MPEG-7 or the Library of Congress' METS A/V extension for Audio/Video.

Quality Assurance

To check your XML document for errors, QA techniques should be applied:

Validate XML against your schema or an XML parser.
Check that free text entries follow local rules and style guidelines.

Further Information

Encoded Archival Description,
<http://www.loc.gov/ead/>
A Metadata Primer,
<http://www.cmswatch.com/Features/TopicWatch/FeaturedTopic/?feature_id=85>
Dublin Core Metadata Initiative,
<http://dublincore.org/>
MARC Standards,
<http://www.loc.gov/marc/>
MPEG Standard,
<http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm>
Synchronized Multimedia,
<http://www.w3.org/AudioVideo/>
TEI Consortium,
<http://www.tei-c.org/>
Three SGML metadata formats: TEI, EAD, and CIMI,
<http://hosted.ukoln.ac.uk/biblink/wp1/sgml/tei.rtf>
Z39.87: Technical Metadata For Still Digital Images,
<http://www.niso.org/standards/resources/Z39_87_trial_use.pdf>

Briefing 21

QA for GIS Interoperability

Background

Quality assurance is essential to ensure GIS (Geographic Information System) data is accurate and can be manipulated easily. To ensure data is interoperable, the designer should audit the GIS records and check them for incompatibilities and errors.

Ensure Content Is Available In An Appropriate GIS Standard

Interoperability between GIS standards is encouraged, enabling complex data types to be compared in unexpected methods. However, the varying standards can limit the potential uses of the data. Designers are often limited by the formats available in different tools. When possible, it is advisable to use OpenGIS - an open, multi-subject standard constructed by an international standard consortium.

Resolve Differences In The Data Structures

To integrate data from multiple databases, the data must be stored in a compatible field structure. Complementary fields in the source and target databases must be of a compatible type (Integer, Floating Point, Date, a Character field of an appropriate length etc.) to avoid the loss of data during the integration process. Checks should also be made that specific fields that are incompatible with similar products (e.g. dBase memo fields) are exported correctly. Specialist advice should be taken to ensure the memo information is not lost.

Ensure Data Meet The Required Standards

Databases are often created in an ad hoc manner without consideration of later requirements. To improve interoperability the designer should ensure data complies with relevant standards. Examples include the BS7666 standard for British postal addresses and the RCHME Thesauri of Architectural Types, Monument Types, and Building Materials.

Compensate For Different Measurement Systems

The merging of two different data sources is likely to present specific problems. When combining two GIS tables, the designer should consider the possibility that they have been constructed using different projection measurement systems (a method of representing the Earth's three-dimensional form on a two-dimensional plane and locate landmarks by a set of co-ordinates). The projection co-ordinate systems vary across nations and through time: the US has five primary co-ordinate systems in use that significantly differ with each other. The British National Grid removes this confusion by using a single co-ordinate, but can cause problems when merging contemporary with pre-1940 maps that were based upon Cassini projection. This may produce incompatibilities and unexpected results when plotted, such as moving boundaries and landmarks to different locations that will need to be rectified before any real benefits can be gained. The designer should understand the project system used for each layer to compensate for inaccuracies.

Ensure Precise Measurements Are Accurate

When recreating real-world objects created by two different people, the designer should note the degree of accuracy. One person may measure to the nearest millimetre, while the other measures to the centimetre. To check this, the designer should answer the following questions:

How many numbers are shown after the point (e.g. 2:12 cm)?
Is this figure consistent with the second designers' measurement methods?
Has the value been rounded up or down, or has a third figure been removed?

These subtle differences may influence the resulting model, particularly when designing smaller objects.

Further Information

AHDS Guides to Good Practice, AHDS,
<http://ads.ahds.ac.uk/project/goodguides/gis/>
Geoplace - The Authoritative Source For Spatial Information,
<http://www.geoplace.com/>
GIS - Computational Problems,
<http://www.cise.ufl.edu/~mssz/GIS/GIS-Basics.html>
Using GIS to help solve real-world problems,
<http://www.cadsystems.com/archive/0006f01.html>
Open GIS Consortium, Inc.,
<http://www.opengis.org/>

Briefing 22

Choosing A Suitable Digital Rights Solution

Background

Digital Rights Management (DRM) refers to any method for a software developer to monitor, control, and protect digital content. It was developed primarily as an advanced anti-piracy method to prevent illegal or unauthorised distribution of content. Common examples of DRM include watermarks, licensing, and user registration. It is in use by Microsoft and other businesses to prevent unauthorised copying and use of their software (obviously, the different protection methods do not always work!).

For institutions, DRM can have limited application. Academia actively encourages free dissemination of work, so stringent restrictive measures are unnecessary. However, it will have use in limiting plagiarism of work. An institution is able to distribute lecture notes or software without allowing the user to reuse text or images within their own work. Usage of software packages or site can also be tracked, enabling specific content to be displayed to different users. To achieve these goals different methodologies are available.

Why do I need Digital Rights Management?

As stated above, Digital Rights Management is not appropriate for all organisations. It can introduce additional complexity into the development process, limit use and cause unforeseen problems at a later date. The following questions will assess your needs:

Do you trust your users to use your work without plagiarising or stealing it?
If the answer to question 1 is yes, do you wish to track unauthorised distribution or impose rules to prevent it?
Will you be financially affected if your work is distributed without permission?
Will digital rights restrictions interfere with the project goals and legitimate usage?
In terms of cost and time management, can you afford to implement DRM restrictions?
If the answer to question 5 is yes, can you afford a strong and costly level of protection (restrictive digital rights) or weak protection (supportive) that costs significantly less?

What types of DRM Methodologies Exist?

Digital Rights methodologies can be divided into two types supportive and restrictive. The first relies upon the user's honesty to register or acquire a license for specific functionality. In contrast, the restrictive method assumes the user is untrustworthy, placing barriers (e.g. encryption and other preventive methods) that will thwart casual users who attempt to infringe their work.

1) Supportive digital rights

The simplest and most cost effective DRM strategy is supportive digital rights. This requires the user to register for data before they are allowed access, blocking all non-authorised users. This assumes that individuals will be less likely to distribute content if they can be identified as the source of the leak. Web sites are the most common use of this protection method. For example, Athens, the NYTimes and other portals provide registration forms or license agreement that the user must complete before access is allowed. The disadvantage of this protection method is the individual can easily copy or distribute data once they have it. Support digital rights is suited to organisations that want to place restrictions upon who can access specific content, but do not wish to restrict content being used by legitimate users.

2) Restrictive digital rights

Restrictive digital rights are more costly, but place more stringent controls over the user. It operates by checking if the user is authorised to perform a specific action and, if not, prevents them from doing it. Unlike supportive rights management, it ensures that content cannot be used at a later date, even if it has been saved to hard disk. This is achieved by incorporating watermarks and other identification methods into the content.

Restrictive digital can be divided into two sub-categories:

Self-Protecting - any digital rights process that involves encryption to prevent unauthorized access to confidential data. For viewing purposes, the content is decrypted, but enforces strict licence terms to limit its usage. Common examples include DVD player (and computer DVD-ROMs) that contain a region code. This ensures that discs can only be played on machines that contain the same code. In the event that an exact duplicate is made, the user cannot decrypt the content without the decryption key, rendering the file useless. Self-protecting is the most costly form of DRM and, for services that have a large market, is likely to be defeated by dedicated hackers. In this event, the work will require significant additional funding to establish a new method of self-protection.
Self-describing - Self-describing protection places less obvious restrictions upon the content. The content is unencrypted, but contains a watermark or unique code hidden within the file that identifies the original source of the material. For example, Windows XP identifies the machine ID to prevent hard disks bring transferred to a different PCs

The requirements for Digital rights implementations are costly and time-consuming, making them potentially unobtainable by the majority of service providers. For a data archive it is easier to prevent unauthorised access to resources than it is to limit use when they actually possess the information.

Ensuring Interoperability

Digital rights is reliant upon the need to record information and store it in a standard layout format that others can use to identify copyrighted work. Current digital rights establish a standard metadata schema to identify ownership

Two options are available to achieve this goal: create a bespoke solution or use an established rights schema. An established rights schema provides a detailed list of identification criteria that can be used to catalogue a collection and establish copyright holders at different stages. Two possible choices for multiple media types are:

XrML (eXtensible rights Markup Language) - a general-purpose XML-based specification grammar for defining digital rights and conditions to be associated with digital content, services, or other resources. The language is utilized as the basis for the MPEG-21 and Open eBook rights specifications.
XMCL (eXtensible Media Commerce Language) - a rights management system for media exchanged between businesses (such as web stores, customer tracking, etc.) and trusted delivery and playback systems. It is intended as a consumer-focussed schema that allows businesses to provide rental, subscription, ownership, and video on demand/pay-per-view services.

Summary

Digital rights are an important issue that allows an institution to establish intellectual property rights. However, it can be costly for small organizations that simply wish to protect their image collection. Therefore the choice of supportive or restrictive digital rights is likely to be influenced by value of data in relation to the implementation cost.

Further Information

Athens Access Management Systems
<http://www.athens.ac.uk/>
Directory for Social Issues in computing Copy Protection
<http://www.rajivshah.com/directory/Content/Protection_Schemes/Windows/>
How much is stronger DRM worth?, Lewis
<http://www.cpppe.umd.edu/rhsmith3/papers/Final_session1_lewis.pdf>
XMCL
<http://www.w3.org/TR/2002/NOTE-xmcl-20020919/>
XrML
<http://www.xrml.org/>

Briefing 23

Recording Digital Sound

Background

The digitisation of digital audio can be a complex process. This document contains quality assurance techniques for producing effective audio content, taking into consideration the impact of sample rate, bit-rate and file format.

Sample Rates

Sample rate defines the number of samples that are recorded per second. It is measured in Hertz (cycles per second) or Kilohertz (thousand cycles per second). The following table describes four common benchmarks for audio quality. These offer gradually improving quality, at the expense of file size.

**Table 1: Description of the various sample frequencies available**
Samples per second	Description
8kHz	Telephone quality
11kHz	At 8 bits, mono produces passable voice at a reasonable size.
22kHz	22k, half of the CD sampling rate. At 8 bits, mono, good for a mix of speech and music.
44.1kHz	Standard audio CD sampling rate. A standard for 16-bit linear signed mono and stereo file formats.

The audio quality will improve as the number of samples per second increases. A higher sample rate enables a more accurate reconstruction of a complex sound wave to be created from the digital audio file. To record high quality audio a sample rate of 44.1kHz should be used.

Bit-rate

Bit-rate indicates the amount of audio data being transferred at a given time. The bit-rate can be recorded in two ways - variable or constant. A variable bit-rate creates smaller files by removing inaudible sound. It is therefore suited to Internet distribution in which bandwidth is a consideration. A constant bit-rate, in comparison, records audio data at a set rate irrespective of the content. This produces a replica of an analogue recording, even reproducing potentially unnecessary sounds. As a result, file size is significantly larger than those encoded with variable bit-rates.

Table 2 indicates how a constant bit-rate affects the quality and file size of an audio file.

**Table 2 Indication of audio quality expected with different bit-rates**
Bit rate	Quality	MB/min
1411	CD quality	10.584
192	Good CD quality	1.440
128	Near CD quality	0.960
112	Near CD quality	0.840
64	FM quality	0.480
32	AM quality	0.240
16	Short-wave quality	0.120

Digital Audio Formats

The majority of audio formats use lossy compression to reduce file size by removing superfluous audio data. Master audio files should ideally be stored in a lossless format to preserve all audio data.

**Table 3 Common Digital Audio Formats**
Format	Compression	Streaming support	Bit-rate	Popularity
MPEG Audio Layer III (MP3)	Lossy	Yes	Variable	Common on all platforms
Mp3PRO (MP3)	Lossy	Yes	Variable	Limited support
Ogg Vorbis (OGG)	Lossy	Yes	Variable	Limited support
RealAudio (RA)	Lossy	Yes	Variable	Popular for streaming
Microsoft wave (WAV)	Lossless	Yes	Constant	Primarily for Windows
Windows Media (WMA)	Lossy	Yes	Variable	Primarily for Windows

Conversion between digital audio formats can be complex. If you are producing audio content for Internet distribution, a lossless-to-lossy (e.g. WAV to MP3) conversion will significantly reduce bandwidth usage. Only lossless-to-lossy conversion is advised. The conversion process of lossy-to-lossy will further degrade audio quality by removing additional data, producing unexpected results.

What Is The Best Solution?

Whether digitising analogue recordings or converting digital sound into another format, sample rate, bit rate and format compression will affect the resulting output. Quality assurance processes should compare the technical and subjective quality of the digital audio against the requirements of its intended purpose.

A simple suite of subjective criteria should be developed to check the quality of the digital audio. Specific checks may include the following questions:

Can listeners understand voices in recording?
Can listeners hear quiet sounds?
Can listener hear loud sounds without distortion?
Can the listener distinguish between digitised audio and original recording?

Objective technical criteria should also be measured to ensure each digital audio file is of consistent or appropriate quality:

Is there a documented workflow for creating the digital audio files?
Is the file format and software used to compress the audio documented?
Is the bit rate equal to or less than the available bandwidth?
Does the sample and bit-rate of the digital audio match or exceed that of the original analogue recording (or is the loss of quality acceptable, see subjective tests above)?
For accurate reproduction of an original analogue recording, is the digital audio master file stored in a lossless format?
For accurate reproduction of the original sound is the sample rate at least twice that of the highest frequency sound?

Further Information

MP3Pro Zone,
<http://www.mp3prozone.com/>
Measuring Audio Quality,
<http://www.itworld.com/AppDev/1469/NWW1204revside4/>
Ogg Vorbis,
<http://www.vorbis.com/>
PC Recording,
<http://www.pcrecording.com/>
Real Networks,
<http://www.real.com/>
Slicing and Dicing MP3 bit rates,
<http://www.digitalprosound.com/Htm/WebAudio/2000/Oct/MP3bitrates.htm>
Xorys' MP3 FAQ,
<http://webhome.idirect.com/~nuzhathl/mp3-faq.html>

Briefing 24

Handling International Text

Background

Digital text is one of the oldest description methods, but remains divided by differing file format, encoding methods, schemas, and encoding methods. When choosing a digital text format it is necessary to establish the project needs. Is plain text suitable for the task and are text markup and formatting required? How will the information be displayed and where? This document describes these issues and provides some guidelines for their use.

What is the Best Tool for the Job?

Digital text has existed in one form or another since the 1960s. Many computer users take for granted that they can quickly write a letter without restriction or technical considerations. A commercial project, however, requires consideration of long-term needs and goals. To avoid complications at a later date, the developer must ensure the tools in use are the most appropriate for the task and, if not, what can be used in their place. To achieve this three questions should be answered:

How will textual information be viewable for the user?
What problems may I encounter if textual information is stored incorrectly?
How will textual information be organized?

File Formats

It is often assumed that everyone can read text. However, this is not always the case. Digital text imposes restrictions upon the content that can have a significant impact upon the project.

In particular, there are two main issues:

File format
Character encoding

The choice of format will be dependent upon the following factors:

The platform/application for which the work is intended - A complex recipe stored in MS Word XP will only be useful for Word XP users. Any attempt to open it in earlier MS Word iterations or popular alternatives (such as Open Office) may result in formatting or layout issues, fonts being unavailable and substituted for an alternative, or binary being introduced into the document, appearing as random characters on the screen.

Special formatting required to enhance the document - Does the document require specific formatting, such as headings, bullet points, or tables to be understood? If so, a file format that supports these capabilities (RTF, PDF) can be used. If not, plain text may be useful for maximising the potential audience.
Editing - Does the document require editing by the user? If so, an editable format, such as Rich Text Format (RTF) or text is recommended. If not, the designer can protect their document using PDF.

Character Encoding

For allowing universal information access, plain text remains useful. It has the advantage of being simple to interpret and small in file size. However, there are some differences in the method that is used to encode text characters. The most common variations are ASCII (American Standard Code for Information Interchange) and Unicode.

ASCII - ASCII is a 7-bit code that assigns 128 decimal numbers (0-127) to letters, numbers, punctuation marks and other common characters. The limited character set restricts characters that can be displayed, preventing the use of foreign descriptions within the same document.
Unicode - Unicode resolves the ASCII restrictions by supporting a 16-bit character set. This enables it to store multiple languages in a standard format and display them in a single document. At the time of writing there are three encoding forms that can be used to represent 1,000,000+characters.

Problems

Several problems may be encountered when storing textual information. For text files it is a simple process to convert the file to Unicode. However, for more complex data, such as databases, the conversion process will become more difficult. Problems may include:

Corrupted characters - Foreign or exotic characters saved in ASCII (used by older applications) are likely to be missing when reloading the file. To resolve the issue, install the correct language and save the file to Unicode in a later version of the application.
Layout - Inter-format conversion can cause numerous layout issues. To avoid these problems save the document in the dissemination format from the beginning of the project. For example, avoid the default MS Word format and choose Rich Text or HTML. For existing documents, the editor will be required to manually restructure the converted document so it resembles the original.

Structural Mark-up

Although ASCII and Unicode are useful for storing information, they are only able describe each character, not the method they should be displayed or organized. Structural mark-up languages enable the designer to dictate how information will appear and establish a structure to its layout. For example, the user can define a tag to store book author information and publication date.

The use of structural mark-up can provide many organizational benefits:

Easier to maintain - allows modification to document structure without the need to directly edit the content. An entire site can be updated by changing a single CSS file.
Code reduction - by abstracting the structural element to a separate file, the structural information can be used by multiple documents, reducing the amount of code required.
Portable - The creation of well-formed documents will ensure the document will display correctly on browsers/viewers that support the markup language.
Interoperable - Structural data can be utilized to access information stored in a third party database.

The most common markup languages are SGML and XML. Based upon these languages, several schemas have been developed to organize and define data relationships. This allows certain elements to have specific attributes that define its method of use (see Digital Rights document for more information). To ensure interoperability, XML is advised due to its support for contemporary Internet standards (such as Unicode).

SGML (Standard Generalized Markup Language) - One of the earliest markup languages that enables content to be structured through an external DTD (Document Type Definition). Similar to XML, SGML is not a markup language in the true sense. Instead, it provides the foundation for specialists to create their own markup language that is customized to their area of study. SGML is efficient in design, but unreadable unless you have learnt the language. A 20-line XML document can be expressed in 5 lines using SGML.
XML - Extensible Markup Language - Promoted as the SGML successor, XML offers improved portability and simpler syntax. It offers improved support for Unicode and other internet protocols, enhancing interoperability between resources. Unlike SGML, tags can be understood by non-experts through the use of HTML-like tags and standard English.

Further Information

Dublin Core Metadata Initiative
<http://dublincore.org/>
TEI Home site
<http://www.tei-c.org/>

Briefing 25

Choosing A Suitable Digital Video Format

Background

Digital video can have a dramatic impact upon the user. It can reflect information that is difficult to describe in words alone, and can be used within an interactive learning process. This document contains guidelines to best practice when manipulating video. When considering the recording of digital video, the digitiser should be aware of the influence of file format, bit-depth, bit-rate and frame size upon the quality of the resulting video.

Composition of a Digital Video File

Digital video consists of a series of images played in rapid succession to create the illusion of movement. It is commonly accompanied by an audio track. Unlike graphics and sound that are relatively small in size, video data can be hundreds of megabytes, or even gigabytes, in size.

The visual and audio information are individually stored within a digital 'wrapper' an umbrella structure consisting of the video and audio data, as well as information to playback and resynchronise the data.

What is the Best Solution?

Digital video remains a complex area that combines the problems of audio and graphic data. When choosing to encode video the designer must consider several issues:

Are there any existing procedures to guide the encoding process?
What type of delivery method will be used to distribute the video?
What video quality is acceptable to the user?
What type of problems are likely to be encountered?

Distribution Methods

The distribution method will have a significant influence upon the file format, encoding type and compression used in the project.

Removeable media - Video distributed on CD-ROM or DVD are suited to progressive encoding methods that do not conduct extensive error checking. Although file size is not as critical in comparison to Internet streaming, it continues to have some influence.

The compression type is dependent upon the need of the user and the type of removeable media:

Editing - Video that requires editing should be stored using MJPEG spatial compression on a CD-ROM or, preferrably, a DVD-ROM.
Playback - Video intended for playback only have a more diverse range of options. If the intent is to create video for playback on DVD players, the MPEG-2 encoder and DVD-ROM is the only option. For computer playback, the designer can use a range of file formats. The suitability of each format is shown in Figure 1.

Windows user - Microsoft formats (ASF and WMV) are primarily aimed at Windows users, with limited Mac and Linux support. If providing content intended for Windows users exclusively, these formats are useable. However, they will limit the potential market.
Multiple-platforms - Alternative formats have cross platform support, providing players for Apple MacOS, Windows and Linux users. These include QuickTime, QuickTime Pro and RealMedia. The choice of these formats will be dependent upon the platform used by the organisation and licence costs.

NAME	PURPOSE OF MEDIA			Compression
NAME	Streaming	Progressive	Media	Compression
Advanced Streaming Format (ASF)	Y			Temporal
Audio Video Interleave (AVI)		Y		Temporal
MPEG-1		Y	VideoCD	Temporal
MPEG-2		Y	DVD	Temporal
QuickTime (QT)	Y	Y		Temporal
QuickTime Pro	Y	Y		Temporal
RealMedia (RM)	Y	Y		Temporal
Windows Media Video (WMV)	Y	Y		Temporal
DivX		Y	Amateur CD distribution	Temporal
MJPEG		Y		Spatial

Table 1: A comparison list of the different file formats, highlighting their intended purpose and compression method.

Video Quality

The provision of video data for an Internet-based audience places specific restrictions upon the content. Quality of the video output is dependent upon three factors:

Frame size - the height and width of the video window according to the number of pixels. Higher resolutions produce an equivalent increase in file size and require a greater amount of bandwidth to download.
Frame rate - The number of frames per second. Video encoded at a low frame rate (particularly below 15 frames per second) will appear jerky and unprofessional to the eye.
Bit Depth - determines the number of colours that will be used to view the movie. The balance between image quality and file size should be considered.

Screen Size	Pixels per frame	Bit depth (bits)	Frames per second	Bandwidth required (megabits)
640 x 480	307,200	24	30	221.184
320 x 240	76,800	16	25	30.72
320 x 240	76,800	8	15	9.216
160 x 120	19,200	8	10	1.536
160 x 120	19,200	8	5	0.768

Table 2: Indication of the influence of screen size, bit-depth and frames per second has upon required bandwidth

When creating video, the designer must balance the video quality with the facilities available to the end user. As an example, an 8-bit screen of 160 x 120 pixels, and 10-15 frames per second is used for the majority of content found on the Internet.

Problems

Video presents numerous problems for the designer caused by the complexity of formats and structure. Problems may include:

Synchronicity - Audio and video is stored as two separate data streames and may become out of sync- a character will move their mouth, but the words are delayed by two seconds. To resolve the problem, editing software must be used to resynchronise the data.
Unable to decode video/audio stream - the rapid update of video/audio codecs often results in the user encountering videos they are unable to play. Characteristics include error messages, audio playback without the video, and corrupted treacle-like video. The only solution is to find the relevant decoder required to decompress the file.
File size - File size can be a significant problem when manipulating video data. When encoding large video files, a large hard disk and 700Mb+ memory is recommended.
Editing - A particular issue of current video formats is the inability to edit video files. The majority of video formats use temporal encoding (see definition) to compress video, which cannot be edited. Only the MJPEG format allows the storage of digital video that can be edited at a later date.

Definitions

Temporal Compression - Reduces the amount of data stored over a sequence of frames. Rather than describing every pixel in each frame, temporal compression stores a key frame, followed by descriptive information on changes.

Spatial Compression - Condenses each frame independently by mapping similar pixels within a frame. For example, two shades of red will be merged. This results in a reduction in image quality, but enables the file to be edited in its original form.

Progressive Encoding - Refers to any format where the user is required to download the entire video before they are allowed to watch it.

Internet Streaming - Enables the viewer to watch sections of video without downloading the entire thing, allowing users to evaluate video content after just a few seconds. Quality is significantly lower than progressive formats due to compression being used.

Further Information

Advanced Streaming Format (ASF)
<http://www.microsoft.com/windows/windowsmedia/format/asfspec.aspx>
Apple QuickTime
<http://www.apple.com/quicktime/>
DIVX
<http://www.divx.com/>
Macromedia Flash
<http://www.macromedia.com/>
MPEG Working Group
<http://www.chiariglione.org/mpeg/index.htm>
Real Networks
<http://www.real.com/>
Microsoft Windows Media
<http://www.microsoft.com/windows/windowsmedia/default.aspx>

Briefing 26

Intellectual Property Rights

Introduction

Internet IPR is inherently complex, breaking across geographical boundaries, creating situations that are illegal in one country, yet not in another, or contradict existing laws on Intellectual Property. Copyright is a subset of IPR, which applies to all artistic works. It is automatically assigned to the creator of original material, allowing them to control all public usage (copying, adaptation, performance and broadcasting).

Ensuring that your organization complies with Intellectual Property rights requires a detailed understanding of two processes:

Managing copyright on own work.
Establishing ownership of 3rd party copyright.

Managing Copyright on Own Work

Unless indicated, copyright is assigned to the author of an original work. When producing work it is essential that it be established who will own the resulting product the individual or the institution. Objects produced at work or university may belong to the institution, depending upon the contract signed by the author. For example, the copyright for this document belongs to the AHDS, not the author. When approaching the subject, the author should consider several issues:

Can I establish that I am the author of this work? - At this point the author should provide evidence they produced the work on a specific date. One commonly used method is to post a sealed envelope to yourself or request that a solicitor store evidence within a safe. If ownership is challenged at a later date, the document can be opened in the presence of a solicitor.
Am I using unaccredited copyrighted material produced by others? - Published work that contains unaccredited material infringe upon the intellectual property of others. The results of such discovery will vary: the unaccredited author may request they are credited or a correction is published; the author may request their work is removed; or they make take legal action against the author. To avoid such issues, document all research made during investigation.

When producing work as an individual that is intended for later publication, the author should establish ownership rights to indicate how work can be used after initial publication:

Ownership after publication - Authors are encouraged to retain as many rights as possible to enable the continued use of articles in hard copy and electronic form.
Ownership in different mediums - In addition, where publication in a specific form (e.g. hard-copy) is the intention, rights to publish in other forms (e.g. electronic) should, if possible, be retained.

Copyright Clearance

Copyright is an automatically assigned right. It is therefore likely that the majority of works in a digital collection will be covered by copyright, unless explicitly stated. The copyright clearance process requires the digitiser to check the copyright status of:

Published, unpublished and Web site articles
Photographs and illustrations
Dynamic media (sound, video)
Software components
Database usage

Copyright clearance should be established at the beginning of a project. If clearance is denied after the work has been included in the collection, it will require additional effort to remove it and may result in legal action from the author.

Maintain a negotiation log - A log will document all meetings, outlining subjects of discussion, objections and agreements by either party. This will enable the organization to refer to the relevant section to establish they have gained copyright clearance and refer to a detailed description of the meetings that took place.
Identify who the author is and when it was produced - Current copyright law indicates the author's lifespan plus 70 years as the limit for copyright. Therefore it is possible that a collection may consist of works that are outside current copyright laws (such as the entire works of Shakespeare, Conan Doyle, etc.). If the author is still alive, they must be contacted to gain permission to use their work.
Establish long-term access rights - Internet content may appear in a site archive for several years after it was published. When meeting the author, establish any time factors in use of their work, indicating the length of time that work can be used. If the goal of the project is to enable long-term preservation of work, persuade the individual/s to allow the repository to host work indefinitely and translate it to modern formats when required.

In the event that an author, or authors, is unobtainable, the project is required to demonstrate they have taken steps to contact them. Digital preservation projects are particularly difficult in this aspect, separating the researcher and the copyright owner by many years. In many cases, more recently the 1986 Domesday project, it has proven difficult to trace authorship of 1000+ pieces of work to individuals. In this project, the designers created a method of establishing permission and registering objections by providing contact details that an author could use to identify their work.

Indicating IPR through Metadata

If permission has been granted to reproduce copyright work, the institution is required by law to indicate intellectual property status. Metadata is commonly used for this purpose, storing and distributing IP data for online content. Several metadata bodies provide standardized schemas for copyright information. For example, IP information for a book could be stored in the following format.

<book id="bk112"> <author>Galos, Mike</author> <title>Visual Studio 7: A Comprehensive Guide</title> <publish_date>2001-04-16</publish_date> <publisher>Addison Press</publisher> <copyright>Galos, M. 2001</copyright> </book>

Access inhibitors can also be set to identify copyright limitations and the methods necessary to overcome them. For example, limiting e-book use to IP addresses within a university environment.

Further Information

PADI - Intellectual property rights management
<http://www.nla.gov.au/padi/topics/28.html>
TASI - Looking after Copyright, IPR, Ethics and Data Protection
<http://www.tasi.ac.uk/advice/managing/copyrights.html>

Briefing 27

Implementing Quality Assurance For Digitisation

Background

Digitisation often involves working with hundreds or thousands of images, documents, audio clips or other types of source material. Ensuring these objects are consistently digitised and to a standard that ensures they are suitable for their intended purpose can be complex. Rather than being considered as an afterthought, quality assurance should be considered as an integral part of the digitisation process, and used to monitor progress against quality benchmarks.

Quality Assurance Within Your Project

The majority of formal quality assurance standards, such as ISO9001, are intended for large organisations with complex structures. A smaller project will benefit from establishing its own quality assurance procedures, using these standards as a guide. The key is to understand how work is performed and identify key points at which quality checks should be made. A simple quality assurance system can then be implemented that will enable you to monitor the quality of your work, spot problems and ensure the final digitised object is suitable for its intended use.

The ISO 9001 identifies three steps to the introduction of a quality assurance system:

Brainstorm: Identify specific processes that should be monitored for quality and develop ways of measuring the quality of these processes. You may want to think about:
- Project goals: who will use the digitised objects and what function will they serve.
- Delivery strategy: how will the digitised objects be delivered to the user? (Web site, Intranet, multimedia presentation, CD-ROM).
- Digitisation: how will data be analysed or created. To ensure consistency throughout the project, all techniques should be standardized.
Education: Ensure that everyone is familiar with the use of the system.
Improve: Monitor your quality assurance system and looks for problems that require correction or other ways it may be improved.

Key Requirements For A Quality Assurance System

First and foremost, any system for assuring quality in the digitisation process should be straightforward and not impede the actual digitisation work. Effective quality assurance can be achieved by performing four processes during the digitisation lifecycle:

The key to a successful QA process is to establish a clear and concise work timeline and, using a step-by-step process, document on how this can be achieved. This will provide a baseline against which actual work can be checked, promoting consistency, and making it easier to spot when digitisation is not going according to plan.
Compare the digital copy with the physical original to identify changes and ensure accuracy. This may include, but is not limited to, colour comparisons, accuracy of text that has been scanned through OCR software, and reproduction of significant characteristics that give meaning to the digitised data (e.g. italicised text, colours).
Perform regular audit checks to ensure consistency throughout the resource. Qualitative checks can be performed upon the original and modified digital work to ensure that any changes were intentional and processing errors have not been introduced. Subtle differences may appear in a project that takes place over a significant time period or is divided between different people. Technical checks may include spell checkers and the use of a controlled vocabulary to allow only certain specifically designed descriptions to be used. These checks will highlight potential problems at an early stage, ensuring that staff are aware of inconsistencies and can take steps to remove them. In extreme cases this may require the re-digitisation of the source data.
Finally, measures should be taken to establish some form of audit trail that tracks progress on each piece of work. Each stage of work should be 'signed off' by the person responsible, and any unusual circumstances or decisions made should be recorded.

The ISO 9001 system is particularly useful in identifying clear guidelines for quality management.

Summary

Digitisation projects should implement a simple quality assurance system. Implementing internal quality assurance checks within the workflow allows mistakes to be spotted and corrected early-on, and also provides points at which work can be reviewed, and improvements to the digitisation process implemented.

Further Information

TASI Quality Assurance,
<http://www.tasi.ac.uk/advice/creating/pdf/qassurance.pdf>

Briefing 28

Choosing An Appropriate Raster Image Format

Background

Any image that is to be archived for future use requires specific storage considerations. However, the choice of file format is diverse, offering advantages and disadvantages that make them better suited to a specific environment. When digitising images a standards-based and best practice approach should be taken, using images that are appropriate to the medium they are used within. For disseminating the work to others, a multi-tier approach is necessary, to enable the storing of a preservation and dissemination copy. This document will discuss the formats available, highlighting the different compression types, advantages and limitations of raster images.

Factors to Consider when Choosing Image Formats

When creating raster-based images for distribution file size is the primary consideration. As a general rule, the storage requirements increase in proportion to the improvement in image quality. A side effect of this process is that network delivery speed is halved, limiting the amount that can be delivered to the user. For Internet delivery it is advised that designers provide a small image (30-100k) that can be accessed quickly for mainstream users, and provide a higher quality copy as a link or available on a CD for professional usage.

When digitising the designer must consider three factors:

File format
Bit-depth
Compression type

Distribution Methods

The distribution method will have a significant influence upon the file format, encoding type and compression used in the project.

Photograph archival - For photographs, the 24-bit lossless TIFF format is recommended to allow the image to be reproduced accurately. The side-effect is that file sizes will begin at 10Mb for simpler images and increase dramatically. This is intended for storage only, not distribution.
Photograph distribution - For photographs intended for Internet distribution, the lossy JPEG format is recommended. This uses compression to reduce file size dramatically. However, image quality will decrease.
Simpler images - Simpler images, such as cartoons, buttons, maps or thumbnails, which do not require 16.8 million colours should be stored in an 8-bit format, such as GIF or PNG-8. Though 256 colours images can be stored correctly in a 24-bit format, a side effect of this process is the 8-bit file size is often equal or higher than 24-bit images.

To summarise, Table 1 shows the appropriateness of different file formats for streaming or progressive recording.

	Maximum no. of colours	Compression Type	Suited for:	Issues
BMP	16,777,216	None	General usage. Common on Windows platforms	A Windows format rather than an Internet format. Unsupported by most browsers.
GIF87a	256	Lossless	High quality images that do not require photographic details	File sizes can be quite large, even with compression
GIF89a	256	Lossless	Same as GIF87a, animation facilities are also popular	See above
JPEG	16,777,216	Lossy	High quality photographs delivered in limited bandwidth environment.	Degrades image quality and produces wave-like artefacts on image.
PNG-8	256	Lossless	Developed to replace GIF. Produces 10-30% smaller than GIF files.	File sizes can be large, even with compression.
PNG-24	16,777,216	Lossless	Preserves photograph information	File sizes larger than JPG.
TIFF	16,777,216	Lossless	Used by professionals. Redundant file information provides space for specialist uses (e.g. colorimetry calibration). Suitable for archival material.	Unsuitable for Internet-delivery

Table 1: Comparison table of image file formats

Once chosen, the file format will, to a limited extent, dictate the possible file size, bit depth and compression method available to the user.

Compression Type

Compression type is a third important consideration for image delivery. As the name suggests, compression reduces file size by using specific algorithms. Two compression types exist:

Lossless compression - Lossless compression stores colour information and the location of the pixel with which the colour is associated. The major advantage of this compression method is the image can be restored to its original state without loss of information (hence lossless). However, the compression ratios are not as high as lossy formats, typically reducing file sizes by half. File formats that use this compression type include PNG and GIF.
Lossy compression - Offers significantly improved compression ratio, at the expense of image quality. Lossy compression removes superfluous image information that cannot be regained. The degree of quality loss will depend upon the amount of compression applied to the image (e.g., JPEG uses a percentage system to determine the amount of compression). Therefore it is possible to create an image that is 1:100 the size of the original file.

As an archival format, lossy compression is unsuitable for long-term preservation. However, its small file size is used in many archives as a method of displaying lower resolution images for Internet users.

Bit-depth

Bit-depth refers to the maximum number of colours that can be displayed in an image. The number of colours available will rise when the bit depth is increased. Table 2 describes the relationship between the bit depth and number of colours.

Bit depth	1	4	8	8	16	24	32
Maximum No. of colours	2	16	256	256	65,536	16,777,216	16,777,216

Table 2: A conversion table showing the relationship between bit-depth and maximum number of colours

The reduction of bit-depth will have a significant effect upon image quality. Figure 3 demonstrates the quality loss that will be encountered when saving at a low bit-depth.


24-bit	8-bit	4-bit	1-bit
Original image	Some loss of colour around edges. Suitable for thumbnail images	Major reduction in colours. Petals consist almost solely of a single yellow colour.	Only basic layout data remains.

Figure 3: Visual comparison of different bit modes

Image Conversion Between Different Formats

Image conversion is possible using a range of applications (Photoshop, Paint Shop Pro, etc.). Lossless-to-lossless conversion (e.g. PNG-8 to GIF89a) can be performed without quality loss. However, lossless-to-lossy (PNG-8 to JPEG) or lossy-to-lossy conversion will result in a quality loss, dependent upon the degree of compression used. For dissemination of high-quality images, a lossy format is recommended to reduce file size. Smaller images can be stored in a lossless format.

Further Information

Gimp-Savvy,
<http://gimp-savvy.com/cgi-bin/img.cgi?ufwsEFfOvPeIueo740>
Raster Image Files,
<http://scv.bu.edu/Tutorials/ImageFiles/image101.html>

Briefing 29

Choosing A Vector Graphics Format For The Internet

Background

The market for vector graphics has grown considerably, in part, as a result of improved processing and rendering capabilities of modern hardware. Vector-based images consist of multiple objects (lines, ellipses, polygons, and other shapes) constructed through a sequence of commands or mathematical statements to plot lines and shapes in a two-dimensional or three-dimensional space. For Internet usage, this enables graphics to be resized to ever increasing screen resolutions without concern that an image will become 'jaggy' or unrecognisable.

File Formats

Several vector formats exist for use on the Internet. These construct information in the same way yet provide different functionality. The table below provides a breakdown of the main formats.

Name	Developer	Availability	Viewers	Uses
Scalable Vector Graphics (SVG)	W3C	Open standard	Internet browser	Internet-based graphics
Shockwave/Flash	Macromedia	Proprietary	Flash plugin for browser	Video media and multimedia presentation
Vector Markup Language (VML)	W3C	Open standard	MS Office, Internet Explorer, etc.	XML-based format.

For Internet delivery of static images, the W3 recommend SVG as a standard open format for vector diagrams. VML is also common, being the XML language exported by Microsoft products. For text-based vector files, such as SVG and VML, the user is recommended to save content in Unicode.

If the vector graphics are to be integrated into a multimedia presentation or animation, Shockwave and Flash offer significant benefits, enabling vector animation to be combined with audio.

Creating Vector Graphics

A major feature of vector graphics is its ability to construct detailed objects that can be resized without quality loss. XML (Extensible Markup Language) syntax the basis of the SVG and VML languages is understandable by non-technical users who wish to understand the object being constructed. The example below demonstrates the ability to create shapes using a few commands. The circle, shown on the left, was created by the textual data on the right.

<svg width="8in" height="8in"> <desc>This is a red circle with a black outline</desc> <g><circle style="fill: red; stroke: black" cx="200" cy="200" r="100"/> <text x="2in" y="2in">Hello World</text></g> </svg>

Figure 1: SVG graphics and associated code

XML Conventions

Although XML enables the creation of a diversity of data types it is extremely meticulous regarding syntax usage. To remain consistent throughout multiple documents and avoid future problems, several conventions are recommended:

Lower case should be used through. Capitalisation can be used for tags if it is consistent throughout the document.
Indent buried tags to reduce the time required for a user to recognise groups of information.
Avoid the use of acronyms or other tags that will be unintelligible for anyone outside the project. XML is intended as a human readable format, so obvious descriptions should be used whenever possible.
Avoid the use of white space when defining tags. If two word descriptions are necessary, join them via a hyphen (-). Otherwise concatenate the words by typing the first word in lower case, and capitalising subsequent words. For example, a creation date property would be called 'fileDateCreated'.

The use of XML enables a high level of interoperability between formats. When converting for a target audience, the designer has two options:

Vector-to-Raster conversion - Raster conversion should be used for illustrative purposes only. The removal of all coordination data eliminates the ability to edit files at a later date.
Vector-to-Vector conversion - Vector-to-vector conversion enables data to be converted into different languages. The use of XML enables the user to manually convert between two different formats (e.g. SVG to VML).

At the start of development it may help to ask your team the following questions:

What type of information will the graphics convey? (Still images, animation and sound, etc.)
What type of browser/operating system will be used to access the content? (Older browsers and non Mac/PC browsers have limited or no support for XML-based languages.)

Further Information

Official W3 SVG site,
<http://www.w3.org/Graphics/SVG/Overview.htm8>
An Introduction to VML,
<http://www.infoloom.com/gcaconfs/WEB/chicago98/wu.HTM>
Flash and Shockwave,
<http://www.macromedia.com/>

Briefing 30

Summary of the QA Focus Methodology

Background

In order to provide value for money and a return on investment from the funders there is a need for project deliverables not only to be functional in their own right but also to be widely accessible, easily repurposed and deployed in a service environment.

To achieve these aims projects should ensure that their deliverables comply with appropriate standards and best practices. Although it may be easy to require compliance, it may not always be easy to implement appropriate standards and best practices. In order to ensure that best endeavours are made it is recommended that projects should implement quality assurance (QA) procedures.

QA Focus's Methodology

Projects may be concerned that implementation of QA procedures can be time-consuming. The approach recommended by QA Focus is designed to be lightweight and to avoid unnecessary bureaucracy, while still providing a mechanism for implementation of best practices.

The QA Focus methodology is based on the following:

Documented policies on standards and best practices: if the standards and best practices are not documented it will be difficult to ensure best practices are implemented, especially in light of staff turnover, changing environments, etc.
Documentation of the architecture used: to ensure that the architecture used to implement the system is capable of complying with the standards.
Documented exceptions: There may be occasions when deviations from standards may be allowed. Such deviations should be documented and responsibility for this agreed.
Systematic checking: It is necessary to document systematic procedures for ensuring compliance with standards.
Audit trails: It can be helpful to provide audit trails which can help spotting trends.

It is felt that use of this methodology should not only be beneficial to the projects themselves, but also help to minimise problems when project deliverables are re-used.

Example: QA For Web Sites

As an example of implementation of this approach the QA policy for standards for the QA Focus Web site is given below.

Area: Web site standards

Standards: The Web site will be based on the XHTML 1.0 and CSS 2.0 standards.

Architecture: The Web site will make use of PHP. XHTML 1.0 templates will be provided for use by authors, who will use simple HTML tools such as HTML-kit. Web site will provide access to an MS Access database. This will also comply with XHTML 1.0 and CSS 2.0 standards. The Web site will also host MS Word and MS PowerPoint files. These documents will also be available in HTML.

Exceptions: Resources converted from proprietary formats (such as MS Word and PowerPoint) need not necessarily comply with XHTML and CSS standards if doing so would be too time-consuming.

Responsibilities: The QA Focus project manager is responsible for changing this policy and addressing serious deviations from the policy.

Checking: Resources should be validated when they are created or updated usually using the ,validate tool. When several resources are updated the ,rvalidate tool should be used.

Audit trail: A full audit should be carried out at least quarterly. The findings should be published on the QA Focus Web site, and deviations from the policy documented.

A second example describes the QA policy for link checking of the QA Focus Web site.

Area: Web site: link checking

Best Practice: There should be no internal broken links and links to external resources should work when a page is created. We should seek to fix broken links to external resources. Exceptions: There may be broken links in historical documents or surveys. In addition, if remote Web sites are updated it may be too time-consuming to update the links.

Change Control: The QA Focus project manager is responsible for changing this policy and addressing serious deviations from the policy.

Checking: When resources are created or updated the resource should be link-checked, usually using the ,checklink tool. When several resources are updated the ,rchecklink tool should be used.

Audit trail: A full audit should be carried out at least quarterly. Initially two tools should be used to spot deficiencies in the link-checking software. The findings should be published on the QA Focus Web site, and deviations from the policy documented.

These two examples illustrate that developing QA policies need not be time-consuming. In addition implementation of these policies need not be time-consuming and can improve the quality of the Web site.

Briefing 31

Matrix for Selection of Standards

Background

JISC and the JISC advisory services provide advice on a wide range of standards and best practices which seek to ensure that project deliverables are platform and application-independent, accessibility, interoperable and are suitable for re-purposing.

The standards and best practices which JISC advisory service recommend have been developed with these aims in mind.

Challenges

Although use of recommended standards and best practices is encouraged, there may be occasions when this is not possible:

Building on existing systems: Projects may be based on development of existing systems, which do not use appropriate standards.

Standards immature: Some standards may be new, and there is a lack of experience in their use. Although some organisations may relish the opportunity to be early adopters of new standards, others may prefer to wait until the benefits of the new standards have been established and many teething problems resolved.

Functionality of the standard: Does the new standard provide functionality which is required for the service to be provided?

Limited support for standards: There may be limited support for the new standards. For example, there may be a limited range of tools for creating resources based on the new standards or for viewing the resources.

Limited expertise: There may be limited expertise for developing services based on new standards or there may be limited assistance to call on in case of problems.

Limited timescales: There may be insufficient time to gain an understanding of new standards and gain experience in use of tools.

In many cases standards will be mature and expertise readily available. The selection of the standards to be deployed can be easily made. What should be done when this isn't the case?

A Matrix Approach

In light of the challenges which may be faced when wishing to make use of recommended standards and best practices it is suggested that projects use a matrix approach to resolving these issues.

Area	Your Comments
Standard
How mature is the standard?
Does the standard provide required functionality?
Implementation
Are authoring tools which support the standard readily available?
Are viewing tools which support the standard readily available?
Organisation
Is the organisation culture suitable for deployment of new standards?
Are there strategies in place to continue development in case of staffing changes?

Individual projects will need to formulate their own matrix which covers issues relevant to their particular project, funding, organisation, etc.

Implementation

This matrix approach is not intended to provide a definitive solution to the selection of standards. Rather it is intended as a tool which can assist projects when they go through the process of choosing the standards they intend to use. It is envisaged that projects will document their comments on issues such as those listed above. These comments should inform a discussion within the project team, and possibly with the project's advisory or steering group. Once a decision has been made the rationale for the decision should be documented. This will help to ensure that the reasonings are still available if project teams members leave.

For examples of how projects have addressed the selection of standards can see:

ESDS Web Standards Policy case study:
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-16/>
Standards for e-learning: The e-MapScholar Experience case study:
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-05/>

Briefing 32

Changing A Project's Web Site Address

Background

A project's Web site address will provide, for many, the best means for finding out about the project, reading abouts its activities and using the facilities which the projects provides. It is therefore highly desirable that a project's Web site address remains stable. However there may be occasions when it is felt necessary to change a project's Web site address. This document provides advice on best practices which should help to minimise problems.

Best Practices For A Project Web Site Address

Ideally the entry point for project's Web site will be short and memorable. However this ideal is not always achievable. In practice we are likely to find that institutional or UKERNA guidelines on Web addresses preclude this option.

The entry point should be a simple domain name such as <http://www.project.ac.uk/> or a directory such as <http://www.university.ac.uk/depts/library/project/>. Avoid use of a file name such as <http://www.university.ac.uk/depts/library/project/index.html> as this makes the entry point longer and less memorable and can cause problems if the underlying technologies change.

Reasons For Changing

If the address of a project Web site is determined by institutional policies, it is still desirable to avoid changing the address unnecessarily. However there may be reasons why a change to the address is needed.

Implementing Best Practices:: There may be an opportunity to implement best practices for the address which could not be done when the Web site was launched.
Changes In Organisation's Name:: The name of an institution may change e.g. the institution is taken over or merges with another institution.
Changes In Organisational Structure:: The organisational structure may change e.g. departments may merge or change their name.
Changes In Project Partners:: The project partner hosting the Web site may leave the project.
Project Becomes Embedded In Organisation:: The project may become embedded within the host institution and this requires a change in the address.
Project Is Developed With Other Funding Streams:: The project may continue to be developed through additional funding streams and this requires a change in the address.
Project Becomes Obsolete:: The project may be felt to be obsolete.
Technical Changes:: Technological changes may necessitate a change in the address.
Changes In Policies:: Institutional policy changes may necessitate a change in the address.
Changes In Web Site Function:: The project Web site may change its function or additional Web sites may be needed. For example, the main Web site may initially be about the project and a new Web site is to be launched which provides access to the project deliverables.

Advice On Changing Addresses

Projects should consider potential changes to the Web site address before the initial launch and seek to avoid future changes or to minimise their effect. However if this is not possible the following advice is provided:

Monitor Links:: Prior to planning a change use the www.linkpopularity.com (or equivalent) service to estimate the numbers of links to you Web sites.
Monitor Search Engines:: Examine the numbers of resources from your Web site which are indexed by popular search engines.

This information will give you an indication of the impact a change to your Web site address may have. If you intend to change the address you should:

Consider Technical Issues:: How will the new Web site be managed? How will resources be migrated?
Consider Migration:: How will the change of address be implemented? How will links to the old address be dealt with? How will you inform users of the change?
Inform Stakeholders:: Seek to inform relevant stakeholders, such as funding bodies, partners and others affected by the change.

Checking Processes

It is advisable to check links prior to the change and afterwards, to ensure that no links are broken during the change. You should seek to ensure that links on your Web site go to the new address.

Briefing 33

Implementing A Technical Review

Background

When projects submit an initial proposal the project partners will probably have an idea as to the approaches which will be taken in order to provide the project deliverables. During the project's life it may be desirable to review the approaches which were initially envisaged and, if necessary to make changes. This document describes possible approaches to periodic reviews.

Reasons For A Review

There are a number of reasons why a technical review may be necessary:

Technological issues:: There may be changes with underlying technologies. For example the software which was initially envisaged being used may be found to be inappropriate or alternative software may be felt to provide advantages.
Staffing issues:: There may be staffing changes. For example key technical staff may leave and are difficult to replace.
Organisational issues:: There may be changes within the organisation which is providing the project.
Changing requirements:: There may be changes in the requirements for the project, following, say, a user needs requirements survey.
Ensure that deliverables comply with standards and best practices:: It may be necessary to ensure that the project has implemented quality assurance processes to ensure that project deliverables comply with appropriate standards and best practices.

A project review may, of course, also address non-technical issues.

Approaches To A Review

Projects may find it useful to allocate some time during the project life span to a technical review of the project.

Review by development team:: The project development team may wish to reflect on the approaches they have taken. They may be encouraged to provide a report to the project manager.
Review by project partners:: The project partners may be involved in the review process.
Review involving third parties:: The project team may wish to invite external bodies to participate in the review.
Comparison with one's peers:: You may chose to compare your deliverables with your peers, such as similar projects. This approach is particular suited for reviewing publicly available deliverables such as Web sites.

When organising a project review you should take care to ensure that the review is handled in a constructive manner.

Outputs From A Review

It is important to note that any improvements or changes which may have been identified during a view need not necessarily be implemented. There may be a temptation to implement best practices when good practices are sufficient, and that implementation of best practices may take longer to implement than envisaged. The outputs from a review may be:

Better understanding:: The review may have an educational role and allow project partners to gain a better understanding of issues.
Enhanced workflow practices:: Rather than implementing technical changes the review may identify the need for improvements to workflow practices.
Documenting lessons:: The review may provide an opportunity to document limitations of the existing approach. The documentation could be produced for use by project partners, or could be made more widely available (e.g. as a QA Focus Case Study).
Deployed in other areas:: The recommendations may be implemented in other areas which the project partners are involved in.
Implemented within project:: The recommendations may be implemented within the project itself. If this is the case it is important that the change is driven by project needs and not purely on technical grounds. The project manager should normally approve significant changes and other stakeholders may need to be informed.

Conclusions

It can be useful to allocate time for a mid-project review to ensure that project work is proceeding satisfactorily. This can also provide an opportunity to reassess the project's technical architecture.

Briefing 34

Use Of Cascading Style Sheets (CSS)

Background

This document reviews the importance of Cascading Style Sheets (CSS) and highlights the importance of ensuring that use of CSS complies with CSS standards.

Why Use CSS?

Use of CSS is the recommended way of defining how HTML pages are displayed. You should use HTML to define the basic structure (using elements such as <h1>, <p>, <li>, etc.) and CSS to define how these elements should appear (e.g. heading should be in bold Arial font, paragraphs should be indented, etc.).

This approach has several advantages:

Maintenance:: It is much easier to maintain the appearance of a Web site. If you use a single CSS file updating this file allows the Web site look-and-feel to be altered easily; in contrast use of HTML formatting elements would require every file to be updated to change the appearance.
Functionality:: CSS provides rich functionality, including defining the appearance of HTML pages when they are printed.
Accessibility:: Use of CSS provides much greater accessibility, allowing users with special needs to alter the appearance of a Web page to suit their requirements. CSS also allows Web pages to be more easily rendered by special devices, such as speaking browsers, PDAs, etc.

There are disadvantages to use of CSS. In particular legacy browsers such as Netscape 4 have difficulty in processing CSS. However, since such legacy browsers are now in a minority the biggest barrier to deployment of CSS is probably a lack of understand or inertia.

Approaches To Use Of CSS

There are a number of ways in which CSS can be deployed:

External CSS Files:: The best way to use CSS is to store the CSS data in an external file and link to this file using the <link> HTML element. This approach allows the CSS definitions to be used by every page on your Web site.
Internal CSS:: You can store CSS within a HTML by including it using the <style> element within the <head> section at the top of your HTML file. However this approach means the style definitions cannot be applied to other files. This approach is not normally recommended.
Inline CSS:: You can embed your CSS inline with HTML elements: for example <p style="font-color: red" > uses CSS to specify that text in the current paragraph is red. However this approach means that the style definitions cannot be applied to other paragraphs. This approach is discouraged.

Ensure That You Validate Your CSS

As with HTML, it is important that you validate your CSS to ensure that it complies with appropriate CSS standards. There are a number of approaches you can take:

Within your HTML editor:: Your HTML editing tool may allow you to create CSS. If it does, it may also have a CSS validator.
Within a dedicated CSS editor:: If you use a dedicated CSS editor, the tool may have a validator.
Using an external CSS validator:: You may wish to use an external CSS validator. This could be a tool installed locally or a Web-based tool such as those available at W3C [1] and the Web Design Group [2] .

Note that if you use external CSS files, you should also ensure that you check that the link to the file works.

Systematic CSS Validation

You should ensure that you have systematic procedures for validating your CSS. If, for example, you make use of internal or inline CSS you will need to validate the CSS whenever you create or edit an HTML file. If, however, you use a small number of external CSS files and never embed CSS in individual HTML files you need only validate your CSS when you create or update one of the external CSS files.

References

Validator CSS, W3C, <http://jigsaw.w3.org/css-validator/>
CSSCheck, WDG, <http://www.htmlhelp.com/tools/csscheck/>

Briefing 35

Deployment Of XHTML 1.0

Background

This document describes the current recommended versions of HTML. The advantages of XHTML 1.0 are given together with potential challenges in deploying XHTML 1.0 so that it follows best practices.

Versions Of HTML

HTML has evolved since it was first created, responding to the need to provide richer functionality, maximise its accessibility and allow it to integrate with other architectural developments. The final version of the HTML language is HTML 4.0. This version is mature and widely supported, with a wide range of authoring tools available and support provided in Web browsers.

However HTML has limitation: HTML resources cannot easily be reused; it is difficult to add new features to the HTML language; it is difficult to integrate HTML pages with other markup languages (e.g. MathML for including mathematical expressions, SVG for including scalable vector graphics, etc).

XHTML 1.0

XHTML was developed address these concerns. XHTML is the HTML language described in the XML language. This means that the many advantages of XML (ability to reuse resources using the XSLT language; ability to integrate other XML application, etc.) are available for authors creating conventional Web pages.

In order to support migration from HTML to a richer XHTML world, XHTML has been designed so that it is backwards compatible with the current Web browsers.

Since XHTML 1.0 provides many advantages and can be accessed by current browsers it would seem that use of XHTML 1.0 is recommended. However there are a number of issues which need to be addressed before deploying XHTML 1.0 for your Web site.

Deployment Issues

Compliance

Although HTML pages should comply with the HTML standard, browsers are expected to be tolerant of errors. Unfortunately this has led to an environment in which many HTML resources are non-compliant. This environment makes it difficult to repurpose HTML by other applications. It also makes rendering of HTML resources more time-consuming than it should, since browsers have to identify errors and seek to render them in a sensible way.

The XML language, by contrast, mandates that XML resources comply with the standard. This has several advantages: XML resources will be clean enabling the resources to be more easily reused by other applications; applications will be able to process the resources more rapidly; etc. Since XHTML is an XML application an XHTML resource must be compliant in order for it to be processed as XML.

XHTML 1.0 And MIME Types

Web browsers identify file formats by checking the resource's MIME type. HTML resources use a text/html MIME type. XHTML resources may use this MIME type; however the resources will not be processed as XML, therefore losing the benefits provided by XML. Use of the application/xhtml+xml MIME type allows resources to be processed as XML. This MIME type is therefore recommended if you wish to exploit XML's potential.

Implementation Issues

You should be aware of implementation issues before deploying XHTML 1.0:

Guaranteeing Compliance:: You must ensure that your resources are compliant. Unlike HTML, non-compliant resources should not be processed by XML tools. This may be difficult to achieve if you do not have appropriate tools and processed.
Browser Rendering:: Although use of the application/xhtml+xml MIME type is recommended to maximise the potential of a more structured XML world, this environment is not tolerant of errors. Use of the text/html MIME type will allow non-compliant XHTML resources to be viewed, but exploiting this feature simply perpetuates the problems of a HTML-based Web.
Resource Management:: It is very import that you give thought to the management of a Web site which uses XHTML. You will need to ensure that you have publishing processed which avoids resources becoming non-compliant. You will also need to think about the approaches of allocating MIME types.

Conclusions

Use of XHTML 1.0 and the application/xhtml+xml MIME type provides a richer, more reusable Web environment. However there are challenges to consider in deploying this approach. Before deploying XHTML you must ensure that you have addressed the implementation difficulties.

Briefing 36

IMS Question And Test Interoperability

Introduction

This document describes an international specification for computer based questions and tests, suitable for those wishing to use computer based assessments in courses.

What Is IMS Question And Test Interoperability?

Computers are increasingly being used to help assess learning, knowledge and understanding. IMS Question and Test Interoperability (QTI) [1] is an international specification for a standard way of sharing such test and assessment data. It is one of a number of such specifications being produced by the IMS Global Learning Consortium to support the sharing of computer based educational material such as assessments, learning objects and learner information.

This new specification is now being implemented within a number of assessment systems and Virtual Learning Environments. Some systems store the data in their own formats but support the export and import of question data in IMS QTI format. Other systems operate directly on IMS QTI format data. Having alternative systems conforming to this standard format means that questions can be shared between institutions that do not use the same testing systems. It also means that banks of questions can be created that will be usable by many departments.

Technical Details

The QTI specification uses XML (Extensible Markup Language) to record the information about assessments. XML is a powerful and flexible markup language that uses 'tags' rather like HTML. The IMS QTI specification was designed to be pedagogy and subject neutral. It supports five different type of user response (item selection, text input, numeric input, xy-position selection and group selection) that can be combined with several different input techniques (radio button, check box, text entry box, mouse xy position dragging or clicking, slider bar and others). It is able to display formatted text, pictures, sound files, video clips and even interactive applications or applets. How any particular question appears on the screen and what the user has to do to answer it may vary between different systems, but the question itself, the knowledge or understanding required to answer it, the marks awarded and the feedback provided should all remain the same.

The specification is relatively new. Version 1.2 was made public in 2002, and a minor upgrade to Version 1.2.1 was made early in 2003, that corrected some errors and ambiguities. The specification is complex comprising nine separate documents. Various commercial assessment systems (e.g. Questionmark [2], MedWeb, Canvas Learning [3]) have implemented some aspect of IMS QTI compatibility for their assessments. A number of academic systems are also being developed to comply with the specification. These include the TOIA project [4] which will have editing and course management facilities, the SToMP system [5], which was used with students for the first time in 2002, and a Scottish Enterprise system called Oghma which is currently being developed.

Discipline Specific Features

A disadvantage of such a standard system is that particular features required by some disciplines are likely to be missing. For example, engineering and the sciences need to be able to deal with algebraic expressions, the handling of both accuracy and precision of numbers, the use of alternative number bases, the provision of randomised values, and graphical input. Language tests need better textual support such as the presetting of text entry boxes with specific text and more sophisticated text based conditions. Some of these features are being addressed by groups such as the CETIS assessment SIG [6].

What This Means To You

If you are starting or planning to start using computer based tests, then you need to be aware of the advantages of using a standard-compliant system. It is clearly a good idea to choose a system that will allow you to move your assessments to another system at a later time with the minimum of effort or to be able to import assessments authored elsewhere.

A consideration to bear in mind, however, is that at this early stage in the life of the specification there will be a range of legacy differences between various implementations. It will also remain possible with some 'compliant' systems to create non-standard question formats if implementation specific extensions are used. The degree of conformity of any one system is a parameter that is difficult to assess at any time. Tools to assist with this are now beginning to be discussed, but it will be some time before objective measures of conformance will be available. In view of this it is a good idea to keep in touch with those interested in the development of the specification, and the best way within UK HE is probably via the CETIS Assessment Special Interest Group Web site [7].

It is important that the specification should have subject specific input from academics. The needs of different disciplines are not always well known and the lack of specific features can make adoption difficult. Look at the examples on the CETIS Web site and give feedback on areas where your needs are not being met.

References And Further Information

QTI Specification,
<http://www.imsglobal.org/>
Questionmark,
<http://www.questionmark.com/>
Canvas Learning Author and Player,
<http://www.canvaslearning.com/>
TOIA,
<http://www.toia.ac.uk>
SToMP,
<http://www.stomp.ac.uk/>
CETIS Assessment Special Interest Group,
<http://www.cetis.ac.uk/assessment>
CETIS,
<http://www.cetis.ac.uk/>

The following URLs may also be of interest.

SCAAN (Scottish Computer Assisted Assessment Network),
<http://www.scaan.ac.uk/ims.html>
Scottish Centre for Research into On-Line Learning and Assessment,
<http://www.scrolla.ac.uk/>
QTI Ready (XDL Soft),
<http://www.xdlsoft.com/products.html>
Assesst Designer,
<http://www.xdlsoft.com/ad/ad.html>
Learn eXact (Giunti Labs),
<http://www.learnexact.com/>
Act Elearning,
<http://www.actelearning.com/>
Respondus,
<http://www.respondus.com/>
Intrallect,
<http://www.intrallect.com/>
WebMCQ,
<http://www.webmcq.com/>
MAP: Monitoring Assessment and Provision,
<http://alto.aber.ac.uk/caa/>
Riva e.test ,
<http://www.riva.com/etest/etest.asp>

Acknowledgments

This document was originally written by Niall Sclater and Rowin Cross of CETIS and adapted by Dick Bacon, Department of Physics, University of Surrey, consultant to the LTSN Physical Sciences Centre.

The original briefing paper (PDF format) is available on the CETIS Web site. The version available on this Web site was originally published in the LTSN Physical Science News (Centre News issue 10).

Briefing 37

Top 10 Quality Assurance Tips

The Top 10 Tips

1 Document Your Policies: You should ensure that you document policies for your project - remember that it can be difficult to implement quality if there isn't a shared understanding across your project of what you are seeking to achieve. For example, see the QA Focus policies on Web standards and link checking [1] [2].
2 Ensure Your Technical Infrastructure Is Capable Of Implementing Your Policies: You should ensure that your technical infrastucture which is capable of implementing your policies. For example, if you wish to make use of XHTML on your Web site you are unlikely to be able to achieve this if you are using Microsoft Word as your authoring tool.
3 Ensure That You Have The Resources Necessary To Implement Your Policies: You should ensure that you have the resources needed to implement your policies. This can include technical expertise, investment in software and hardware, investment in training and staff development, etc.
4 Implement Systematic Checking Procedures To Ensure Your Policies Are Being Implemented: Without systematic checking procedures there is a danger that your policies are not implemented in practice. For example, see the QA Focus checking procedures for Web standards and link [3] [4].
5 Keep Audit Trails: You should seek to provide audit trails which provide a record of results of your checking procedures. This can help to spot trends which may indicate failures in your procedures (for example, a sudden growth in the numbers of non-compliant HTML resources may be due to deployment of a new authoring tool, or a lack of adequate training for new members of the project team).
6 Learn From Others: Rather than seeking to develop quality assurance policies and procedures from scratch you should seek to learn from others. You may find that the QA Focus case studies [5] provide useful advice which you can learn from.
7 Share Your Experiences: If you are in the position of having deployed effective quality assurance procedures it can be helpful for the wider community if you share your approaches. For example, consider writing a QA Focus case study [6].
8 Seek 'Fitness For Purpose' - Not Perfection: You should seek to implement 'fitness for purpose' which is based on the levels of funding available and the expertise and resources you have available. Note that perfection is not necessarily a useful goal to aim for - indeed, there is a danger that 'seeking the best may drive out the good'.
9 Remember That QA Is For You To Implement: Although the QA Focus Web site provides a wide range of resources which can help you to ensure that your project deliverables are interoperable and widely accessible you should remember that you will need to implement quality assurance within your project.
10 Seek To Deploy QA Procedures More Extensively: Rather than seeking to implement quality assurance across your project, it can be beneficial if quality assurance is implemented at a higher level, such as within you department or organisation. If you have an interest in more widespread deployment of quality assurance, you should read about the ISO 9000 QA standards [7].

References

Policy on Web Standards, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/qa/policies/web/>
Policy on Linking, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/qa/policies/links/>
Procedures for Web Standards, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/qa/procedures/web/>
Procedures for Linking, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/qa/procedures/links/>
Case Studies, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/>
Contributing To Case Studies, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/#contributing>
Selection and Use of the ISO 9000:2000 family of standards, ISO,
<http://www.iso.org/iso/en/iso9000-14000/understand/selection_use/selection_use.html>

Briefing 38

From Project To Production Service

Background

Projects deliverables are normally expected to be deployed in a service environment. The deliverables could be passed on to an existing JISC service provider. In some cases, however, a project may evolve into a service. This document outlines some of the issues that need to be considered to facilitate such a transition.

If evolving to a service is not relevant to your project, the issues services need to address when deploying your project deliverables may still be of interest.

Hosting

Hosting of your project deliverables is one of the first issues to be considered. A prototype service may be developed on in-house equipment and in an environment which may not be appropriate for long term production. Issues to consider include:

Host in-house or on a national data centre or other host? A factor to consider is the speed of your connection to JANET. If your service is likely to be delivering large volumes of data (e.g. large graphics files, moving images etc), it may be important to locate the service at a site directly connected to JANET.
If you are not going to host the service yourself, is the software environment of the data centre/other host compatible with the development environment? This includes versions of operating systems and other infrastructure products.

Data Feeds

Your service may require regular updates of the raw data which the service is delivering to users. Issues to consider when moving into a production environment include:

How frequently will the database be updated. Data which is updated on a regular basis (e.g. weekly, monthly) should be updated according to a published schedule if at all possible. Users then know when to revisit the site for new information.
H" Will the data be 'pushed' to the service site, or 'pulled' from the source? Each has its advantages and disadvantages. If the data is pushed to the service, a service operator has to check that it has arrived and loaded correctly. Setting up processes to automatically pull the data from the source and load it (for example overnight) is more satisfactory, but an alerting process needs to be built in for instances when the data load fails.

Gateway Links

The JISC supports a range of subject-specific gateway services. Decide which gateway, if any, your service fits into. The subject matter of your service may span more than one area and therefore need to be incorporated in more than one gateway.

Review the RDN [1] and the services within it and see where a description and link to your service may fit. Arrange for your service to be made visible. The more links that are established to your service, the more likely it is to become visible to search engines such as Google and the more successful it is likely to be in terms of awareness and take-up.

Legal Issues

When an experimental or development system is turned into a production service, there are a number of copyright, licensing and other legal issues that need to be carefully considered.

Does your service contain any material that is subject to copyright or IPR legislation? This could include such things as images, artwork, extracts from publications, sound or movie clips and so on. If it does, you will need to get permission before you can 'publish' your site.

Have you considered how accessible your service is to those with special needs or disabilities? There are now legal obligations that need to be taken into account before releasing a new system.

The JISC TechDis service [2] provides information on how to make your Web site conform. Also consult the appropriate QA Focus document on Accessibility Testing [3] and the JISC Legal Information Service [4] for a range of advice on issues such as Data Protection, Freedom of Information, Disability and the Law, Intellectual Property and much else.

Managing Expectations

As soon as you have a reliable release date, publicise the fact on relevant JISCMail and other lists. Keep people informed as to the progress of the new service as launch day approaches.

As soon as delays appear inevitable, let people know, even if a revised date hasn't been fixed. This will help front-line staff, who will have to support your service, decide on their own local information strategy.

Launching the Service

The move of an experimental or development service into a full production service provides a 'hook' for raising its profile. Things to consider include:

Think about marking the start of a new service with a launch event. Try to get a relevant high profile personality, chair of a relevant JISC committee or some other significant person to make a presentation. Make the event worth a day out of the office by including hard information and if possible include live demonstrations. The event should be free, but take bookings so you can estimate numbers for accommodation and catering purposes. Try to get a report into relevant parts of the press by inviting journalists. Remember that academics read newspapers too
Alternatively launch the service within the context of another event such as a national conference or exhibition.
Try to time the launch so it doesn't clash with other events, national holidays or the start of the academic year.

Support and Publicity

Consider the kind of support and publicity materials that are appropriate for your service. Examples include:

Promotional flyers
Posters
User guides
PowerPoint files
Examples of using the service to answer specific questions (self-help guides)
More detailed reference material
Other promotional items (pens, mugs, etc.)

Think about the target audience for the material. You may want to produce different versions for users from different backgrounds and experience. Consider which items may be worth printing (as opposed to being made available on the Web). For example posters and flyers are useful for distribution at events such as conferences and exhibitions. Review what other JISC services have done and discuss their experiences with them.

You should also seek advice from the JISC's Communications and Marketing Team [5] who maintain a register of key events and are able to help with such things as preparing and issuing press releases.

Service Development

Once your service is in production there will be a requirement to improve or update the service and to fix problems. User feedback on suggested service improvements or errors should be gathered through a contact publicised on the service's Web site.

Presentations and demonstrations provide forums for discussion and constructive criticism. Find out if there is an existing user group who will extend their remit to cover your service.

When changes are identified and implemented, ensure that the change is publicised well in advance. Unless the change is an important bug fix, try to make the changes infrequently, preferably to coincide with term-breaks.

Service Monitoring

Check if your service will come under the remit of the JISC's Monitoring Unit [6]. If it does, you will need to agree a service level definition with them. Typically you will also need to:

Provide quarterly reports in an agreed format
Measure usage levels
Record availability and downtime (scheduled and unscheduled)
Measure turnaways
Measure helpdesk performance
Record how quickly updates are added to the service

References

Resource Discovery Network (RDN),
<http://www.rdn.ac.uk/>
TechDis,
<http://www.techdis.ac.uk/>
Accessibility Testing, QA Focus, briefing paper no. 2,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-02/>
JISC Legal Information Service,
<http://www.jisclegal.ac.uk/>
Communications and Marketing Team, JISC,
<http://www.jisc.ac.uk/index.cfm?name=people_masterlist#outreach>
JISC Monitoring Unit,
<http://www.mu.jisc.ac.uk/>

Briefing 39

Planning An End User Service

Background

For some projects, it will be clear from the start that the intention is to transition the project into an end-user service, either hosted by the project itself, or by another host such as a national data centre.

Other projects may have the potential for development into a production service, but without this being a declared aim of the project.

In both cases, it is sensible to think carefully about how the system might fit into a service environment at the planning and design stage, to avoid costly re-engineering and retro-fitting of features later on.

Software Environment

The software regime that may seem most appropriate for an experimental development environment may not be the best choice when running a large-scale end-user service. Issues to think about include:

Software versions; does the software development environment you are intending to use match the versions that are in general use on service delivery platforms?
Do you have a strong reason for using commercial products (e.g. database management systems)? If so, check to see if there are likely to be high costs when employed in an environment with large numbers of concurrent users. Try to select industry standard public domain products for preference.
Are the systems you are intending to use supported by the major national service providers? This will be an important issue if they are expected to adopt your project and host it for you.
Is your project likely to have to integrate with a family of similar products or services? If so, try to ensure that they have compatible operating environments.
When in doubt, consult with others. Make sure that you do this before development work starts to avoid costly reversals of policy at a later stage.

Consultations

A key factor in the success of any project is careful preparation and planning. If you intend your project to develop into an end-user production service, it is worth spending time and effort in the early stages of the project testing your ideas and designs. It is easier to rewrite a specification document than to re-engineer a software product.

Depending on the nature of the project, some of the following may be worth considering:

Surveys: Carrying out a survey of needs and expectations from typical potential users/customers.
Brainstorming sessions: Getting together a group of people interested in the outcome (representing your customers) and carrying out exercises to identify the features and facilities that are most important to them.
Consult other JISC service creators: Their experience may help you avoid pitfalls.
Wireframes: Mocking up a series of screens with active links so that the general functionality of the service can be demonstrated is a powerful way of testing the structure of the service before committing to a full implementation.
Prioritising: Not all functions and features will be equally important. Rank them so that you ensure that the most important ones are implemented first. You may decide to relegate some of them to a 'further development' phase.
Document your design: Ensure that all parties concerned agree to a written specification of what you are aiming to create.

Authentication and Authorisation

Controlling access to your service may not be an issue when it is in an experimental or development phase, but will become an important consideration if it is released into service.

Some issues to review include:

Is your service likely to be free or charged for? If it is likely to be free, is this open-ended, or does it depend on central funding? If the latter, what will happen when the funding stops? Will you then need to introduce a subscription fee and, therefore, access controls?
Even if you expect your service to be free, there may be restrictions on who can use it. For example, the funding of the project may require you to limit access to UK only or higher and further education only.
Bear in mind that, even if the service is free to UK users, there may be an option for charging for access by non-UK education sector users.
If you decide you do need to build in access control mechanisms, are you going to use Athens?Athens [1] now supports single sign on (AthensSSO), meaning that users can access several different compliant services with only one password challenge.
This is a developing area and, depending on the timescale of your project, you may need to keep a watching brief on issues such as the potential for using digital certificates, and Internet 2 related activities such as Shibboleth [2].
Are any of the remote resources your service depends on IP authenticated? This can cause confusion for users, especially if they are accessing the service from off-campus.
Even if you don't expect your project to become a service with controlled access, it would be wise to bear in mind that this could happen in the future, and to structure your service so that an access control mechanism can be easily fitted in at a later stage.

Legal Issues

When your project reaches the stage of being turned into an production service with large numbers of users, consideration will need to be given to issues which are less important during the development phase.

It is helpful to be aware of these at an early stage in the planning and design of the project to avoid difficult problems later. Some things you should think about include:

Copyright material: are you thinking of incorporating items such as images, artwork, extracts from publications, sound or movie clips. If so, are there going to be copyright or IPR implications of making this material more generally available? This can apply not only to Web site material but also printed promotional material. If you have a choice, try to select clipart etc that is in the public domain.
Accessibility: there is government legislation that needs to be taken into account when designing a new system. You need to think carefully about how your system might be used by those with special needs or disabilities. The JISC TechDis service [3] provides information on how to make your Web site conform.
Consult the appropriate QA Focus document on Accessibility Testing [4] and the JISC Legal Information Service [5] for a range of advice on issues such as Data Protection, Freedom of Information, Disability and the Law, Intellectual Property and much else.

Planning for Maintenance

It is to be expected that a Web-based user service will require maintenance, revision and updating during its lifetime. There may be requests for new features, or for modifications to the way existing facilities work.

Bear in mind that the people doing this work may not be the original project team that created the service. It is important that the end-products are designed and structured in such a way as to allow parts of the system to be modified and updated by others who are less familiar with the system without unexpected consequences.

Therefore, when starting to develop a new system:

Ensure that you structure the system in a modular fashion.
Document as the work proceeds, not after the project is complete.
Note any software environment dependencies or support products including versions/releases.

References

Athens access management services,
<http://www.athens.ac.uk/>
Internet 2,
<http://middleware.internet2.edu/>
TechDis,
<http://www.techdis.ac.uk/>
Accessibility Testing, QA Focus briefing paper no. 2, UKOLN
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-02/>
JISC Legal Information Service,
<http://www.jisclegal.ac.uk/>

Briefing 40

Top 10 Tips For Service Deployment

About This Document

This document provides top tips which can help to ensure that project deliverables can be deployed into a service environment with the minimum of difficulties.

The Top 10 Tips

1 Document The Technical Architecture For Your Project: Provide a description of the technical architecture of aspects of your project which are intended for deployment into service. The description will be helpful for the service provider. In addition it can help the funders in gaining an appreciation of the technical approaches being taken by projects across a digital library programme as well as being of value to your project team (especially if staff leave).
2 Document Any Deviations From Use Of Recommended Standards Or Best Practices: You should ensure that your technical infrastructure which is capable of implementing your policies. For example, if you wish to make use of XHTML on your Web site you are unlikely to be able to achieve this if you are using Microsoft Word as your authoring tool.
3 Document Use Of Unusual Or Innovative Aspects Of Your Project: If you are making use of any new standards or unusual technologies you should document this, and explain the reasons for your choice. This could include use of emerging standards (e.g. SVG, SMIL), use of Content Management Systems, etc.
4 Have An Idea Of Where You Envisage Your Project Deliverables Being Deployed: Give some thought to where your deliverables will be deployed. This could be by a JISC Service, within your institution, within other institutions or elsewhere.
5 Seek To Make The Service Provider Aware Of Your Project: You should seek to make contact with the service provider for your deliverables. You should seek to gain an understanding of their requirements (e.g. see [1] [2]). In addition it can help if the service provider is aware of your work and any special requirements associated with your project.
6 Be Aware Of Legal, IPR, etc. Barriers To Service Deployment: The service provider will need to ensure that there are no legal barriers to the deployment of your deliverables. This can include clarifying copyright, IPR and accessibility issues.
7 Ensure Your Have Any Documentation Which Is Necessary To Assist Service Deployment: You should ensure that you provide installation documentation which should list dependencies on other software and cover any security or performance issues. As well as the installation documentation you should also provide user documentation which can help the service provide support for end users.
8 Remember To 'Let Go': Although it can be helpful of your project team is in a position to provide advice to the service provider after the end of the project, the project team should also be willing to relinquish control over the project if, for example, the service provider needs to make changes to your deliverables.
9 Learn From Others: Learn from the experiences of others. For example, read the case studies which provide various examples of porting systems into a service environment [3] [4].
10 Share Your Experiences: Be willing to share your experiences. For example, consider writing a case study for QA Focus [5] [5].

References

From Project To Production Service, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-38/>
Planning An End User Service, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-39/>
Launching New Database Services: The BIDS Experience, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-27/>
Providing Access To Full Text Journal Articles, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-28/>
Contributing To Case Studies, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/#contributing>

Briefing 41

Introduction To Metadata

What is Metadata?

Metadata is often described as "data about data". The concept of metadata is not new - a Library catalogue contains metadata about the books held in the Library. What is new is the potential that metadata provides in developing rich digital library services.

The term metadata has come to mean structured information that is used by automated processes. This is probably the most useful way to think about metadata [1].

The Classic Metadata Example

The classic example of metadata is the library catalogue. A catalogue record normally contains information about a book (title, format, ISBN, author, etc.). Such information is stored in a structured, standardised form, often using an international standard known as MARC. Use of this international standard allows catalogue records to be shared across organisations.

Why is Metadata So Important?

Although metadata is nothing new, the importance of metadata has grown with the development of the World Wide Web. As is well-known the Web seeks to provide universal access to distributed resources. In order to develop richly functional Web applications which can exploit the Web's global information environment it is becoming increasingly necessary to make use of metadata which describes the resources in some formal standardised manner.

Metadata Standards

In order to allow metadata to be processed in a consistent manner by computer software it is necessary for metadata to be described in a standard way. There are many metadata standards available. However in the Web environment the best known standard is the Dublin Core standard which provides an agreed set of core metadata elements for use in resource discovery.

The Dublin Core standard (formally known as the Dublin Core Metadata Element Set) has defined 15 core elements: Title, Creator, Subject, Description, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage and Rights [2].

The core element set is clearly very basic. A mechanism for extending Dublin Core elements has been developed. This allows what is known as Qualified Dublin Core elements to refine the core elements. For example DC.Date.Created refines the DC.Date element by allowing the date of creation of the resource to be described. DC.Date.Modified can be used to describe the date on which the resource was changed. Without the qualifiers, it would not be possible to tell which date related to which event. Work is in progress in defining a common framework for qualifiers.

Using Metadata

The Dublin Core standard defines a set of core elements. The standard does not specify how these elements should be deployed on the Web. Initially consideration was given to using Dublin Core by embedding it within HTML pages using the <meta> element e.g. <meta name="DC.Creator" content="John Smith">. However this approach has limitations: initially HTML was not rich enough to all metadata schemes to be including (which could specify that a list of keywords are taken from the Library Of Congress list); it is not possible to define relationships for metadata elements (which may be needed if, for example, there are multiple creators of a resource) and processing the metadata requires the entire HTML document to be downloaded.

In order to address these concerns a number of alternative approaches for using metadata have been developed. RDF (Resource Description Framework) [3], for example, has been developed by W3C as a framework for describing a wide range of metadata applications. In addition OAI (Open Archives Initiative) [4] is an initiative to develop and promote interoperability standards that aim to facilitate the efficient dissemination of content.

In addition to selecting the appropriate standards use of metadata may also require use of a metadata management system and a metadata repository.

References

Metadata Demystified, NISO,
<http://www.niso.org/standards/resources/Metadata_Demystified.pdf>
Dublin Core Metadata Element Set, DCMI,
<http://dublincore.org/documents/dces/>
Resource Description Framework (RDF), W3C,
<http://www.w3.org/RDF/>
Open Archives Initiative (OAI),
<http://www.openarchives.org/>
Information Environment Home, JISC,
<http://www.jisc.ac.uk/index.cfm?name=ie_home>

Briefing 42

Metadata Deployment

Introduction

This document describes the issues you will need to address in order to ensure that you make use of appropriate approaches for the deployment of metadata within your project.

Why Do You Wish To Use Metadata?

The first question you should address is "Why do you wish to use metadata?". You may have heard that metadata is important. You may have heard that metadata will help solve many problems you have with your project. You may have heard that others are using metadata and you don't wish to be left behind. Although all of these points have some validity, they are not sufficient in isolation to justify the time and effort needed in order to deploy metadata effectively.

You should first specify the problem you wish to address using metadata. It may be that you wish to allow resources on your Web site to be found more easily from search engines such as Google. It may be that you wish to improve local searching on your Web site. It may be that you wish to interoperate with other projects and services. Or it may be that you wish to improve the maintenance of resources on your Web site. In all of these cases metadata may have a role to play; however different approaches may be needed to tackle these different problem and, indeed, approaches other than use of metadata may be more effective (for example, Google makes only limited use of metadata so an alternative approach may be needed).

Identifying The Functionality To Be Provided

Once you have clarified the reasons you wish to make use of metadata you should identify the end user functionality you wish to provide. This is needed in order to define the metadata you will need, how it should be represented and how it should be created, managed and deployed.

Choosing The Metadata Standard

You will need to choose the metadata standard which is relevant for your purpose. In many cases this may be self-evident - for example, your project may be funded to develop resources for use in an OAI environment, in which case you will be using the OAI application.

Metadata Modelling

It may be necessary for you to decide how to model your metadata. For example if you wish to use qualified Dublin Core metadata you will have to chose the qualifiers you wish to use. A QA Focus case study illustrates the decision-making process [1].

Metadata Management

It is important that you give thought to the management of the metadata. If you don't you are likely to find that your metadata becomes out-of-date. Since metadata is not normally displayed to end users but processed by software you won't even be able to use visual checking of the metadata. Poor quality metadata is likely to be a major barrier to the deployment of interoperable services.

If, for example, you embed metadata directly into a file, you may find it difficult to maintain the metadata (e.g. the creator changes their name or contact details). A better approach may be use of a database (sometimes referred to as a metadata repository) which provides management capabilities.

Example Of Use Of This Approach

The Exploit Interactive [2] e-journal was developed by UKOLN with EU funding. Metadata was required in order to provide enhanced searching for the end user. The specific functionality required was the ability to search by issue, article type, author and title and by funding body. In addition metadata was needed in order to assist the project manager producing reports, such as the numbers of different types of articles. This functionality helped to identify the qualified Dublin Core elements required.

The MS SiteServer software used to provide the service provided an indexing and searching capability for processing arbitrary metadata. It was therefore decided to provide Dublin Core metadata stored in <meta> tags in HTML pages. In order to allow the metadata to be more easily converted into other formats (e.g. XHTML) the metadata was held externally and converted to HTML by server-side scripts.

A case study which gives further information (and describes the limitations of the metadata management approach) is available [3].

References

Gathering the Jewels: Creating a Dublin Core Metadata Strategy, QA Focus,
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-13/>
Exploit Interactive,
<http://www.exploit-lib.org/>
Managing And Using Metadata In An E-Journal, QA Focus,
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-01/>

Briefing 43

Quality Assurance For Metadata

Introduction

Once you have decided to make use of metadata in your project, you then need to agree on the functionality to be provided, the metadata standards to be used and the architecture for managing and deploying your metadata. However this is not the end of the matter. You will also need to ensure that you have appropriate quality assurance procedures to ensure that your metadata provides fitness for its purposes.

What Can Go Wrong?

There are a number of ways in which services based on metadata can go wrong, such as:

Incorrect content:: The content of the metadata may be incorrect or out-of-date. There is a danger that metadata content is even more likely to be out-of-date than normal content, as content is normally visible, unlike metadata which is not normally displayed on, say, a Web page. In addition humans can be tolerant of errors, ambiguities, etc. in ways that software tools normally aren't.
Inconsistent content:: The metadata content may be inconsistent due to a lack of cataloguing rules and inconsistent approaches if multiple people are involved in creating metadata.
Non-interoperable content:: Even if metadata is consistent within a project, other projects may apply different cataloguing rules. For example the date 01/12/2003 could be interpreted as 1 December or 12 January if projects based in the UK and USA make assumptions about the date format.
Incorrect format:: The metadata may be stored in a non-valid format. Again, although Web browsers are normally tolerant of HTML errors, formats such as XML insist on compliance with standards.
Errors with metadata management tools:: Metadata creation and management tools could output metadata in invalid formats.
Errors with the workflow process:: Data processed by metadata or other tools could become corrupted through the workflow. As a simple example a MS Windows character such as © could be entered into a database and then output as an invalid character in a XML file.

QA For Metadata Content

You should have procedures to ensure that the metadata content is correct when created and is maintained as appropriate. This could involve ensuring that you have cataloguing rules, ensuring that you have mechanisms for ensuring the cataloguing rules are implemented (possibly in software when the metadata is created). You may also need systematic procedures for periodic checking of the metadata.

QA For Metadata Formats

As metadata which is to be reused by other applications is increasingly being stored in XML it is essential that the format is compliant (otherwise tools will not be able to process the metadata). XML compliance checking can be implemented fairly easily. More difficult will be to ensure that metadata makes use of appropriate XML schemas.

QA For Metadata Tools

You should ensure that the output from metadata creation and management tools is compliant with appropriate standards. You should expect that such tools have a rich set of test suites to validate a wide range of environments. You will need to consider such issues if you develop your own metadata management system.

QA For Metadata Workflow

You should ensure that metadata does not become corrupted as it flows through a workflow system.

A Fictitious Nightmare Scenario

A multimedia e-journal project is set up. Dublin Core metadata is used for articles which are published. Unfortunately there are documented cataloguing rules and, due to a high staff turnover (staff are on short term contracts) there are many inconsistencies in the metadata (John Smith & Smith, J.; University of Bath and Bath University; etc.)

The metadata is managed by a home-grown tool. Unfortunately the author metadata is output in HTML as DC.Author rather than DC.Creator. In addition the tool output the metadata in XHTML 1.0 format which is embedded in HTML 4.0 documents.

The metadata is created by hand and is not checked. This results in a large number of typos and use of characters which are not permitted in XML without further processing (e.g. £, — and &).

Rights metadata for images which describes which images can be published freely and which is restricted to local use becomes separated from the images during the workflow process.

Briefing 44

Metadata Harvesting

Background

As the number of available digital resources increases so does the need for quick and accurate resource discovery. In order to allow users to search more effectively many resource discovery services now operate across the resources of multiple distributed content providers. There are two possible ways to do this. Either by distributed searching across many metadata databases or by searching harvested metadata.

Metadata harvesting is the aggregation of metadata records from multiple providers into a single database. Building applications or services that use these aggregated records provides additional views of those resources, assisting in access across sectors and greater exposure of those resources to the wider community.

Open Archives Initiative Protocol for Metadata Harvesting

When metadata harvesting is carried out within the JISC Information Environment the Open Archives Initiative Protocol for Metadata Harvesting (OAI PMH) [1] version 2.0 is recommended. The Open Archives Initiative [2] had it roots in the e-prints community who were trying to improve access to scholarly resources. The OAI PMH was developed initially by an international technical committee in 1999. It is a light-weight low cost protocol that is built on HTTP and XML. The protocol defines six requests, known as verbs:

GetRecord Identify
ListIdentifiers
ListMetadataFormats
ListRecords
ListSets

In order for metadata to be shared effectively two things need to happen:

Content/data providers need to make metadata records available in a commonly understood form.
Service providers need to obtain these metadata records from the content providers and hold them in a repository.

OAI PMH provides a means of doing the above.

Record Format

At the lowest level a data provider must support the simple Dublin Core [3] record format ('oai_dc'). This format is defined by the OAI-PMH DC XML schema [4]. Data providers may also provide metadata records in other formats. Within the JISC Information Environment if the repository is of value to the learning and teaching community projects should also consider exposing metadata records that conform to the UK Common Metadata Framework [5] in line with the IMS Digital Repositories Specification using the IEEE LOM XML schemas [6] .

OAI-PMH also provides a number of facilities to supply metadata about metadata records for example rights and/or provenance information can be provided in the <about> element of the GetRecord response. Also collection-level descriptions can be provided in the <description> element of the Identify response.

Example OAI DC metadata record

The following example is taken from the Library of Congress Repository 1).


<oai_dc:dc>
<dc:title>Empire State Building. [View from], to Central Park</dc:title>
<dc:creator>Gottscho, Samuel H. 1875-1971, photographer.</dc:creator>
<dc:date>1932 Jan. 19</dc:date>
<dc:type>image</dc:type>
<dc:type>two-dimensional nonprojectible graphic</dc:type>
<dc:type>Cityscape photographs.</dc:type>
<dc:type>Acetate negatives.</dc:type>
<dc:identifier>http://hdl.loc.gov/loc.pnp/gsc.5a18067</dc:identifier>
<dc:coverage>United States--New York (State)--New York.</dc:coverage>
<dc:rights>No known restrictions on publication.</dc:rights>
</oai_dc:dc>

Conformance Testing for Basic Functionality

The OAI gives information on tests an OAI repository must successfully complete in order to be entered in the registry. For example:

For every protocol request, the repository return a response that is valid XML (the XML successfully passes through an XML parser) and conforms to the XML schema defined for the response (passes the XSV XML schema validator).
For the ListMetadataFormats request, the repository must return at least one metadata format and in the list of metadata formats, return the mandatory oai_dc metadata format with the URL of the OAI-defined XML schema.

More information on the tests necessary is available from the OAI Web site [7]. Projects could use the tests listed to create a checklist to measure their repository's conformance.

References

The Open Archives Initiative Protocol for Metadata Harvesting,
<http://www.openarchives.org/OAI/openarchivesprotocol.html>
Open Archives Initiative,
<http://www.openarchives.org/>
Dublin Core,
<http://dublincore.org/>
OAI-PMH DC XML Schema,
<http://www.openarchives.org/OAI/2.0/oai_dc.xsd>
UK Common Metadata Framework,
<http://metadata.cetis.ac.uk/guides/>
IMS Digital Repositories Specification,
<http://www.imsglobal.org/digitalrepositories/>
Registering as a Data Provider,
<http://www.openarchives.org/data/registerasprovider.html>

Further Information

OAI FAQ,
<http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/faq/oai/>
OAI Software Tools,
<http://www.openarchives.org/tools/>

Briefing 45

Top 10 Tips For Preserving Web Sites

About This Document

This document provides top tips which can help to ensure that project Web sites can be preserved.

The Top 10 Tips

1 Make Use Of Open Standards: You should seek to make use of open standard formats for your Web site. This will help you to avoid lock-in to proprietary formats for which access may not be available in the future.
2 Define The Purpose(s) Of Your Web Site: You should have a clear idea of the purpose(s) of your project Web site, and you should document the purposes. Your Web site could, for example, provide access to project deliverables for end users; could provide information about the project; could be for use by project partners; etc. A policy for preservation will be dependent of the role of the Web site.
3 Have A URI Naming Policy: Before launching your Web site you should develop a URI naming policy. Ideally you should contain the project Web site within its own directory, which will allow the project Web site to be processed (e.g. harvested) separately from other resources on the Web site.
4 Think Carefully Before Having Split Web Sites: The preservation of a Web site which is split across several locations may be difficult to implement.
5 Think About Separating Web Site Functionality: On the other hand it may be desirable to separate the functionality of the Web site, to allow, for example, information resources to be processed independently of other aspects of the Web site. For example, the search functionality of the Web site could have its own sub-domain,(e.g. search.foo.ac.uk) which could allow the information resources (under www.foo.ac.uk) to be processed separately.
6 Explore Potential For Exporting Resources From A CMS: You should explore the possibility of exporting resources from a backend database or Content Management Systems in a form suitable for preservation.
7 Be Aware Of Legal, IPR, etc. Barriers To Preservation: You need to be aware of various legal barriers to preservation. For example, do you own the copyright of resources to be preserved; are there IPR issues to consider; are confidential documents (such as project budgets, minutes of meetings, mailing list archives, etc.) to be preserved; etc.
8 Test Mirroring Of Your Web Site: You should test the mirroring of your project Web site to see if there are technical difficulties which could make preservation difficult. See, for example, the QA Focus document on Accessing Your Web Site On A PDA [1].
9 Provide Documentation: You should provide technical documentation on your Web site which will allow others to preserve your Web site and to understand any potential problem areas. You should also provide documentation on your policy of preservation.
10 Share Your Experiences: Learn from the experiences of others. For example read the case study on Providing Access to an EU-funded Project Web Site after Completion of Funding [2] and the briefing document on Mothballing Web Sites [3].

References

Accessing Your Web Site On A PDA, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-05/>
Providing Access to an EU-funded Project Web Site after Completion of Funding, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-17/>
Mothballing Web Sites , QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-04/>

Briefing 46

QA for Web Sites: Useful Pointers

Quality Assurance

Below are some key pointers that can help you enhance the quality assurance procedures used for your Web site.

Useful Pointers

1 Authoring Tools

Are the tools that you use to create your Web site appropriate for their tasks? Do they produce compliant and accessible code? Can the tools be configured to incorporate QA processes such as HTML validation, link checking, spell checking, etc? If not, perhaps you should consider evaluating other authoring tools or alternative approaches to creating and maintaining your content.

2 Tracking Problems

How do you deal with problem reporting? Consider implementing a fault reporting log. Make sure that all defects are reported, that ownership is assigned, details are passed on to the appropriate person, a schedule for fixes is decided upon, progress made is recorded and the resolution of problem is noted. There could also be a formal signing off procedure.

3 Use A QA Model

A model such as the QA Focus Timescale Model will help you to plan the QA you will need to implement over the course of your project:

Strategic QA:: Carried out before development takes place. This involves establishing best methodology for your Web site, the choice of standards, etc.
Workflow QA:: Carried out as formative QA before and during development. This involves establishing and documenting a workflow, processes etc.
Sign-off QA:: Carried out as summative QA once one stage of development has been carried out. This involves establishing an auditing system where everything is reviewed.
On-going QA:: Carried out as summative QA once one stage of development has been carried out. This involves establishing a system to report check, fix any faults found etc.

4 Use Automated Testing Tools

There are a variety of tools out there for use and a number are open source or free to use. These can be used for HTML and CSS validation, link checking, measuring load times, etc.

5 Don't Forget Manual Approaches

Manual approaches to Web site testing can address areas which will not be detecting through use of automated tools. You should aim to test key areas of your Web site and ensure that systematic errors which are found are addressed in areas of the Web site which are not tested.

6 Use A Benchmarking Approach

A benchmarking approach involves comparisons of the findings for your Web site with your peers. This enables comparisons to be made which can help you identify areas in which you may be successful and also areas in which you may be lagging behind your peers.

7 Rate The Severity Of Problems

You could give a severity rating to problems found to decide whether the work be done now or it can wait till the next phase of changes. An example rating system might be:

Level 1:: There is a failure in the infrastructure or functionality essential to the Web site.
Level 2:: The functionality is broken, pages are missing, links are broken, graphics are missing, there are navigation problems, etc.
Level 3:: There are browser compatibility problems, page formatting problems, etc.
Level 4:: There are display issues, for example with the font, or text issues such as grammar.

8 Learn From The Problems You Find

Make sure that you do not just fix problems you find. Recognising why the problems have occurred allows you to improve your publishing processes so that the errors do not reoccur.

Useful URLs

The following resources provide additional advice on quality assurance for Web sites.

Understanding Quality, <http://www.philosophe.com/qa/quality.html>
Testing Without A Formal Test Plan, <http://www.philosophe.com/testing/without_testplans.html>
" Quality Assurance Tips For Webmasters, W3C, <http://www.w3.org/QA/Tips/>
My Web site is standard! And yours?, W3C, <http://www.w3.org/QA/2002/04/Web-Quality>
" Web Quality Assurance, W3Schools, <http://www.w3schools.com/quality/>

Briefing 47

Transcribing Documents

Digitising Text by Transcription

Transcription is a very simple but effective way of digitising small to medium volumes of text. It is particularly appropriate when the documents to be digitised have a complex layout (columns, variable margins, overlaid images etc.) or other features that will make automatic digitisation using OCR (Optical Character Recognition) software difficult. Transcription remains the best way to digitise hand written documents.

Representing the Original Document

All projects planning to transcribe documents should establish a set of transcription guidelines to help ensure that the transcriptions are complete, consistent and correct.

Key issues that transcription guidelines need to cover are:

What to do about illegible text
How to record important information indicated by position, size, italics, bold or other visual features of the text
What to do about accents, non-Latin characters and other language issues

It is generally good practice to not correcting factual errors or mistakes of grammar or spelling in the original.

Avoiding Errors

Double-entry is the best solution - where two people separately transcribe the same document and the results are then compared. Two people are unlikely to make the same errors, so this technique should reveal most errors. It is, however often impractical because of the time and expense involved. Running a grammar and spell checker over the transcribed document is a simpler way of finding many errors (but assumes the original document was spelt and written according to modern usage).

Transcribing Structured Documents

Structured documents, such as census returns or similar tabular material may be better transcribed into a spreadsheet package rather than a text editor. When transcribing tables of numbers, a simple but effective check on accuracy is to use a spreadsheet to calculate row and column totals that can be compared with the original table. Transcriber guidelines for this type of document will need to consider issues such as:

What to do about 'ditto' and other ways of referring to an earlier entry in a list or table - should the value or the placeholder be transcribed?
Should incorrect values be transcribed 'as is'

It is good practice to record values, such as weights, distances, money and ages as they are found, but also to include a standardised representation to permit calculations (e.g. 'baby, 6m' should be transcribed verbatum, but an addition entry of 0.5, the age in years, could also be entered)

Further Information

Many genealogical groups transcribe documents, and provide detailed instructions. Examples include:

The USGenWeb Census Project,
<http://www.us-census.org/help/1910.html>
The Immigrant Ships Transcribers Guild,
<http://www.immigrantships.net/guild/typing2.html>

Briefing 48

Top 10 Tips For Database Design

About This Document

This document provides 10 tips which can help to ensure that databases can be easily exported and manipulated with the minimum of difficulties.

The Top 10 Tips

1 Develop A Prototype: Significant time can be saved by creating the structure in a simple desktop database (such as Microsoft Access) before finalising the design in one of the enterprise databases. The developer will be able to recognise simple faults and makes changes more rapidly than would be possible at a later date.
2 Split database structure into multiple tables: Unlike paper-based structures, databases do not require the storage of all fields in a single table. For large databases it is useful to split essential information into multiple tables. Before creating a database, ensure that the data has been normalised to avoid duplication.
3 Use understandable field names: The developer should avoid field names that are not instantly recognisable. Acronyms or internal references will confuse users and future developers who are not completely familiar with the database.
4 Avoid illegal file names: It is considered good practice to avoid exotic characters in file or field names. Exotic characters would include ampersands, percentages, asterisks, brackets and quotation marks. You should also avoid spaces in field and table names.
5 Ensure Consistency: Remain consistent with data entry. If including title (Mr, Miss, etc.) include it for all records. Similarly, if you have established that house number and address belong in different fields, always split them.
6 Avoid blank fields: Blank fields can cause problems when interpreting the data at a later date. Does it mean that you have no information, or you have forgotten to enter the information? If information is unavailable it is better to provide a standard response (e.g. unknown).
7 Use standard descriptors for date and time: Date and time can be easily confused when exporting database fields in a text file. A date that reads '12/04/2003' can have two meanings, referring to April 12^th or December ^th, 2003. To avoid ambiguity always enter and store dates with a four-digit century and times of day using the 24 hour clock. The ISO format (yyyy-mm-dd) is useful for absolute clarity, particularly when mixing databases at a later date.
8 Use currency fields if appropriate: Currency data types are designed for modern decimal currencies and can cause problems when handling old style currency systems, such as Britain's currency system prior to 1971 that divided currency into pounds, shillings and pence.
9 Avoid proprietary extensions: Care should be taken when using proprietary extensions, as their use will tie your database to a particular software package. Examples of proprietary extensions include the user interface and application-specific commands.
10 Avoid the use of field dividers: Commas, quotation marks and semi-colons are all used as methods of separating fields when databases are exported to a plain text file and subsequently re-imported into another database. When entering data into a database you should choose an alternative character that represents these characters.

Further Information

SQL Course,
<http://www.sqlcourse.com/>
A Tutorial on Basic Normalization, Part 1-9,
<http://www.datawarehouse.com/article/?articleid=2983>

Briefing 49

Top Tips For Resolving Poor Performance in Database Design

About This Document

This document provides top tips which can help to ensure that databases are created that can be easily exported and manipulated with the minimum of difficulties.

The Top Tips

1 Normalise database structure: The majority of database performance issues are caused by un-normalised or partially normalised data. Normalisation is the technique used to simplify the design of a database in a way that removes redundant data and improves the efficiency of the database design. It consists of three levels (1st, 2nd and 3rd) normal forms that require the removal of duplicate information, removal of partial (when the value in a field is dependent on part of the primary key) and transitive (when the value in one non-key field is dependent on the value in another non-key field) dependencies.
2 Create an index: About 70% of good SQL performance can be attributed to proper and efficient indexes. Indexes are used to provide fast and efficient access paths to data, to enforce uniqueness on column values, to contain primary key values, to cluster data, and to partition tables.
3 Are indexes being used consistently?: Indexes have many benefits, but they have disadvantages. Each index requires storage space and must be modified each time a new row is inserted or deleted, as well as each time a column value in the key is updated. You should ensure that indexes are only used when necessary. In many circumstances it may be more appropriate to modify the structure of an existing one? Use the EXPLAIN statement to find the access path chosen by the optimiser.
4 Check the query: Ensure the query is structured correctly by rechecking the WHERE clause. Are the host variables defined correctly and are the predicates designed to use existing indexes?
5 Avoid unnecessary sorting of data: Unnecessary data sorting can also have a detrimental impact upon processing speed. You should ensure that all sorts (ORDER BY, GROUP BY, UNION, UNION ALL, joins, DISTINCT) only refer to the necessary data.
6 Avoid unnecessary row counts: When developing stored procedures (a series of SQL commands), use the SET NOCOUNT ON option at the start of your procedure. This will prevent the superfluous "row count" messages from being generated, and will improve performance by eliminating wasted network traffic.
7 Check table JOINs: Remove unnecessary JOINS and sub queries - Would the application be more efficient without the join or sub-query? Are simple or multiple queries more efficient?
8 8. Check connection delays when connecting to an external database: Many problems can be encountered when connecting to an organisational database from home or anywhere outside the faculty. Many delays are caused by DNS lookup timeout. Check that the database server can resolve the IP address. If the intervening firewall uses NAT, then the IP address will match the firewall's interface closest to the database server. If you are troubleshooting the connection, gather more information using 'tcpdump' and examine the packet timings to determine where the delay is occurring.
9 Think about the database location: Many performance issues are caused by the host application rather than the database itself. When identifying performance issues it is useful to perform an Internet search using application keywords to identify problematic combinations. For example, tests have found that the use of a MS Access database run from a NetWare server can dramatically increase the query time if the database is not stored in the drive root.
10 Export queries in desktop databases if necessary: Though it is theoretically possible to use SQL (Structured Query Language) script files between databases, the range of implementations in desktop databases differ. This may cause significant delays. In practice code needs to be recreated altered to account for implementation differences.

Further Information

SQL Course,
<http://www.sqlcourse.com/>
A Tutorial on Basic Normalization, Part 1-9,
<http://www.datawarehouse.com/iknowledge/articles/>
MS Access Database Run From Netware Server Excessively Slow,
<http://support.novell.com/cgi-bin/search/searchtid.cgi?/10065312.htm>

Briefing 50

Improving Interoperability Between Multiple Databases

About This Document

A relational database is a set of structured data, organised according to a data model. When exporting data from one application to another, it is a simple process to export the data as an ASCII text file that will describe every field within a table. However, many problems can be encountered that will increase the amount of effort and time required to import the data. This paper describes specific quality-based techniques that should be used in the development process to minimise the difficulty encountered at a later date.

Documenting The Database Structure

The key to continued access of a digital resource is documentation. This avoids the problems that arise when an administrator leaves the project and essential knowledge is lost. Before exporting data you should make a note of the table relationships and primary keys. This will allow the data to be recombined using the same structure in an alternative package. You should also identify specific requirements of each field. For example, the field size, import mask, validation rules, default value, indexing, etc.

Use Appropriate Descriptors

Two problems relating to the database organisation can be avoided by the use of appropriate descriptors. The first is to understand the importance of table and field names when identifying information. A row of numbers has little meaning until we identify the context, i.e. payroll numbers, lottery numbers, etc. This will make it easier to interpret and recombine the data at a later date.

The second issue to consider when choosing fields names is the possibility that this data will become corrupt at a later date or will be misinterpreted by the application. This is caused when specific reserved characters used for distinguishing between field (commas, semi-colons, tabs, quotations, etc.), or system-illegal characters (ampersands, asterisks, hash, or other mathematical symbols) are used.

It is important to avoid such issues by restricting yourself to the English alphabet or numerical values, and avoiding other symbols.

Ensure Consistency

When handling data from multiple databases it is good practice to standardise the responses so that they can be understood and manipulated more easily. This may involve a simple process of replacing all reference to one value with another (e.g. changing 1,2,3 to Mr, Mrs, Miss). In other circumstances you may need to write a query to split the postcode from the main address field.

You should also ensure that date and time are referenced correctly. These can be easily confused when exporting database fields in a text file. For example, a date that reads '12/04/2003' can be interpreted as April 12th or December 4th, 2003. To avoid ambiguity always enter and store dates with a four-digit century and times of day using the 24 hour clock. The ISO format (yyyy-mm-dd) is useful for absolute clarity, particularly when mixing databases at a later date.

Proprietary Extensions

Care should be taken when using proprietary extensions, as their use will tie your database to a particular software package. Unlike SQL commands, these application-specific elements cannot be exported to other applications without extensive work to convert or recreate the resource. Examples of proprietary extensions include the user interface and application-specific commands.

Further Information

SQL Course,
<http://www.sqlcourse.com/>
A Tutorial on Basic Normalization, Part 1-9,
<http://www.datawarehouse.com/iknowledge/articles/>

Briefing 51

Intellectual Rights Clearance On The Internet

Introduction

The Internet contains an assortment of copyrighted work owned by millions of people or organisations throughout the world. It can be a legal minefield for anyone attempting to establish intellectual rights to specific works. In most cases it is extremely difficult to establish the author or owner to gain permission for its use.

One way of addressing IPR (Intellectual Property Rights) issues is to describe ownership in as much depth as possible: establishing who is responsible for specific works can help a producer protect themselves from potential legal difficulties.

This document provides guidelines on gaining copyright clearance for using third party works within your own project. It encourages the use of standard practices that will simplify the process and improve the quality of copyright clearance information stored, providing a protection against future legal action.

Copyright Clearance

Copyright is an automatically assigned right. It is therefore likely that the majority of works in a digital collection will be covered by copyright, unless explicitly stated. The copyright clearance process includes requiring the digitiser to check the copyright status of:

Published, unpublished and Web site articles
Photographs and illustrations
Dynamic media (sound, video)
Software components
Database usage

Copyright clearance should be undertaken at the beginning of a project. If clearance is denied after the work has been included in the collection, it will require additional effort to remove it and may result in legal action from the author. Therefore:

Maintain a negotiation log: A log will document all meetings, outlining subjects of discussion, objections and agreements by either party. This will enable the organization to refer to the relevant section to establish they have gained copyright clearance and refer to a detailed description of the meetings that took place.
Identify who the author is and when it was produced: Current copyright law in the UK indicates the author's lifespan plus 70 years as the limit for copyright. Therefore it is possible that a collection may consist of works that are outside current copyright laws (such as the entire works of Shakespeare, Conan Doyle, etc.). If the author is still alive, they must be contacted to gain permission to use their work.
Establish long-term access rights: Internet content may appear in a site archive for several years after it was published. When reaching agreement with the author, establish any time limits for the use of their work, indicating the length of time that work can be used. If the goal of the project is to enable long-term preservation of work, persuade the individual/s to allow the repository to host work indefinitely and to allow the conversion of it to modern formats when required.

Indicating IPR through Metadata

If permission has been granted to reproduce copyright work, the institution should take measures to reflect intellectual property. Metadata is commonly used for this purpose, storing and distributing IP data for online content. Several metadata bodies provide standardized schemas for copyright information. For example, IP information for a book could be stored in the following format:


<book id="bk112">
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
      <publish_date>2001-04-16</publish_date>
      <publisher>Addison Press</publisher>
      <copyright>Galos, M. 2001</copyright>
</book>

Access inhibitors can also be set to identify copyright limitations and the methods necessary to overcome them. For example, limiting e-book use to IP addresses within a university environment.

Further Information

PADI - Intellectual property rights management,
<http://www.nla.gov.au/padi/topics/28.html>
TASI - Looking after Copyright, IPR, Ethics and Data Protection, TASI,
<http://www.tasi.ac.uk/advice/managing/copyrights.html>
An Introduction to the Law of Copyright, JISC Legal Information Service,
<http://www.jisclegal.ac.uk/>

Briefing 52

Protecting Copyright On Your Own Work

Introduction

The Internet contains an assortment of copyrighted work owned by millions of people or organisations throughout the world. The ease of publication and availability of text, graphics and video allow anyone to become their own publisher. As an effect, modern web sites contain a jigsaw of copyrighted works produced by multiple authors.

This free attitude to copyright presents a challenge to authors - what measures can be taken for authors to protect their own work? More accurately, can copyrighted work be protected in some way?

This document provides guidelines for protecting your own work. It describes methods of establishing authorship, possible licencing models that meet your needs, and methods of reflecting copyright on the Internet.

Choosing the Correct Licence

The Internet has forced an increasing debate on the role of IPR and copyright. This has resulted in alternatives to traditional intellectual property rights appearing.

To protect your work it is important that the distribution license is considered before you release your work. This can be achieved by answering several questions:

Do I wish to allow others to improve my work without permission?
Do I wish for others to distribute my work without permission?
Do I wish for others to create derivative works without my permission?

If the answer to these questions is no, you are automatically assigned rights to copyright your work. However, if the answer is yes, you should seek alternative license agreements that preserve your right to place your work into the public domain or allow the user to perform certain actions. Popular variants include CopyLeft, notably the GPL, and Collective Commons - two different license agreements that avoid traditional copyright restrictions, by establishing permission to distribute content without restriction. More information can be found on these subjects in the QA Focus document 'Choosing Alternative Licences For Digital Content' [1].

Managing Copyright on Own Work

Unless indicated, copyright is assigned to the author of an original work. When producing work it is essential that it be established who will own the resulting product - the individual or the institution. Objects produced at work or university may belong to the institution, depending upon the contract signed by the author. For example, the copyright for this document belongs to the AHDS, not the author. When approaching the subject the author should consider several issues:

Can I establish that I am the author of this work?: At this point the author should provide evidence they produced the work on a specific date. One commonly used method is to post a sealed envelope to yourself or request that a solicitor store evidence within a safe. If ownership is challenged at a later date, the document can be opened in the presence of a solicitor.
Am I using unaccredited copyrighted material produced by others?: Published work that contains unaccredited material infringe upon the intellectual property of others. The results of such discovery will vary: the unaccredited author may request they are credited or a correction is published; the author may request their work is removed; or they make take legal action against the author. To avoid such issues, document all research made during investigation.

When producing work as an individual that is intended for later publication, the author should establish ownership rights to indicate how work can be used after initial publication:

Ownership after publication: Authors are encouraged to retain as many rights as possible to enable the continued use of articles in hard copy and electronic form.
Ownership in different media: Where publication in a specific form (e.g. hard-copy) is the intention, rights to publish in other forms (e.g. electronic) should, if possible, be retained.

Indicating IPR through Metadata


<book id="bk112">
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
      <publish_date>2001-04-16</publish_date>
      <publisher>Addison Press</publisher>
      <copyright>Galos, M. 2001</copyright>
</book>

Access inhibitors can also be set to identify copyright limitations and the methods necessary to overcome them. For example, limiting e-book use to IP addresses within a university environment.

References

Choosing Alternative Licences For Digital Content, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-54/>

Further Information

PADI - Intellectual Property Rights Management,
<http://www.nla.gov.au/padi/topics/28.html>
TASI - Looking after Copyright, IPR, Ethics and Data Protection, TASI,
<http://www.tasi.ac.uk/advice/managing/copyrights.html>
An Introduction to the Law of Copyright, JISC Legal Information Service,
<http://www.jisclegal.ac.uk/>

Briefing 53

How To Protect Your Rights With A Licence Agreement

Introduction

The Internet is often promoted as a means of getting information to the widest possible audience at the lowest possible cost. Barriers to the flow of information are not encouraged, and few repositories establish formal agreements with depositing authors.

Although mutual benefit is the primary goal of many collaborative projects, some method of formalizing the relationship between author and distributor is useful. A deposit agreement can be used to define a consensual contract between the depositing author and the repository, clarifying the rights and obligations of both.

The deposit agreement dictates several requirements of both parties:

Defines how the repository will manage the content.
Protection and ensure your rights
Specifies what rights, if any, the repository gains over the content
Indicates ownership of content and rights of the author if the repository closes

Licencing Terms

The first aspect of a licence agreement that should be determined is the licencing terms. This indicates the distribution type permitted. Two types exist:

Exclusive distribution Exclusive licenses impose specific and wide-ranging restrictions upon distribution. They are primarily used for commercial repositories that are restricted by copyright or charges that non-exclusive distribution will devalue the content.
Non-exclusive licenses Non-exclusive licenses, typically found in academic-orientated repositories offer a useful alternative to commercial distribution that encourages the author to voluntarily submit work as a method of gaining wider public exposure. These non-exclusive licenses establish the rights of the depositor to submit work to differing repositories at a subsequent date without legal restrictions.

Requirements

To protect the organization from legal threats at a later date the licence agreement requires several issues to be considered during the submission lifetime. In the initial stages the repository should establish content ownership, audience and potential use, migration and distribution rights. In the long-term the repository should consider withdrawal criteria.

Initial Stages of Development

Establish ownership: A licence agreement must first establish whom the owners are, and if it differs from the author. This may help to minimize the repository's legal liability by formally establishing that the depositor holds the necessary copyright to deposit the material and is able to do so without infringement.
Confirm ownership: The licence agreement should clearly indicate that the depositor retains ownership. This is a particularly important inclusion in a deposit agreement, designed to protect the repository from potential legal action taken as a result of the actions of the author. Equally, the deposit agreement can help establish that the author is not legally responsible for ensuring the accuracy of the information they have provided if, for example, it later becomes out-of-date.
Audience and potential use: In some circumstances, particularly exclusive distribution, a licence agreement will need to establish terms permitted by the author relating to potential usage. This may be prompted by concerns that wide dissemination will damage the long-term value of the content. Institutional repositories may wish to clarify with depositors that deposited e-prints will only be used for non-commercial or academic uses.

Mid-term Considerations

Migration Strategy: For repositories such as the AHDS a migration strategy will be particularly important. This enables the repository to migrate the content to a different file format if the original submitted format becomes obsolete.

Long-term Considerations

Withdrawal criteria: The licence agreement should establish the situations under which the author may withdraw their work from the repository and whether the repository can continue to hold relevant metadata records after it is withdrawn.

Licence agreements should be considered an essential part of an e-print repository's operation. They can resolve many of the potential problems that might arise. For the repository, it provides a formal framework that defines what the repository can and cannot do, making it easier to manage the e-print in the long-term while helping to reduce its legal liabilities.

Further Information

Common Deposit Agreement Form, AHDS,
<http://ahds.ac.uk/depositform.htm>
American Chemical Society. - Copyright Status,
<http://pubs.acs.org/instruct/copyright.pdf>
Copyright Assignment, Science Magazine,
<http://www.sciencemag.org/feature/contribinfo/prep/copyright.pdf>
Deposit Agreement, Tardis, University of Southampton,
<http://tardis.eprints.org/discussion/e-Prints%20Soton%20-deposit%20agreeement.doc>

Briefing 54

Choosing Alternative Licences For Digital Content

Introduction

Licences are a core part of intellectual property rights management. Licences allow the copyright holder to devolve specific rights to use, store, copy and disseminate work to another party.

Licences are typically restrictive, and acceptable uses of the licensed work are carefully delineated. However, copyright holders may wish to encourage widespread sharing and use of their work. In these situations an alternative licensing model may be appropriate.

Should I Choose an Alternative Licence?

To identify if an alternative licence is appropriate, the following questions should be addressed:

Do I wish to allow others to use and improve my work without restriction?
Do I wish for others to distribute my work without restriction?
Do I wish for others to create derivative works without restriction?

If the answer to these questions is yes, then an alternative licence agreement may be appropriate.

The developer has a number of options when planning to release their work: including creating their own licence or using an existing one. Both options have recognisable benefits. The bespoke licence allows the developer to define their own terms and conditions, while rejecting conditions with which they disagree. However, the creation of a licence can be a long process that may result in the licence containing legal loopholes.

An alternative is to use an existing 'copyleft' licence. Copyleft is an umbrella term that may refer to several similar licences. When choosing a licence, the developer must consider their own needs:

What rights do I want to give to a user?
What rights do I wish to retain for myself?
What development process will the software follow under this licence?
How do I wish to distribute my work?

Licensing Digital Works

Many authors argue the traditional copyright restrictions opposes the free distribution of digital works, whether they are text, graphics, or sound, on the Internet. This could be for a variety of reasons; the author wishes to spread their ideas; they wish to attract feedback on their work, etc. For these purposes, traditional copyright and public domain licences are unsuitable.

Creative Commons is a particularly popular licencing model available to all creative works. It is therefore usual to find it applied to Web sites, scholarship, music, film, photography and literature that are not traditionally covered by similar distribution schemes.

Similar to the GNU General Public Licence, Creative Commons licences allow the author to give the reader specific rights, such as permission to distribute the work and make derivatives, without resolving copyright of the original work. Though these freedoms encourage comparisons to the public domain, the Creative Commons licences are more restrictive, placing specific provisos upon the work:

The author must be credited.
Any derivative works must meet the licensing criteria established by the author. Derivatives cannot be place in public domain without permission.

To encourage authors to use Creative Commons [1] the developers provide an online multiple-choice form to choose the most appropriate licence model. At the time of writing, eleven variations exist that differ according to four different values (attribution, non-commercial, derivative works and share alike). These can be found on the Creative Commons licence agreement page [2]. In addition, a specifically developed metadata set is provided, allowing an individual to easily find and use work.

The Creative Commons licence may be suitable for individuals who wish their work to be seen and used by as many people as possible, but do not want to give away their rights. It may be unsuitable for businesses that wish to charge for access or restrict content in some way.

Dual-Licencing

Copyleft licences [3] such as the Creative Commons may promote free dissemination, however there is little encouragement for businesses that wish to make a profit to use them. The solution is to categorise your software under a dual-licence; one for free open-source distribution, the other for proprietary commercial distribution. This model allows a business to take contributions made in the open source version, apply it to their for-cost version and sell it at retail price.

References

Creative Commons,
<http://creativecommons.org/>
Creative Commons Licences,
<http://creativecommons.org/license/>
What is Copyleft?,
<http://www.gnu.org/copyleft/copyleft.html>

Briefing 55

Top 10 Tips For Web Sites

The Top 10 Tips

1 Ensure Your Web Site Complies With HTML Standards: You should ensure that your Web site complies with HTML standards. This will involve selecting the standard for your Web site (which currently should be either HTML 4.0 or XHTML 1.0); implementing publishing procedures which will ensure that your Web pages comply with the standard and quality assurance procedures to ensure that your publishing processes work correctly [1] [2].
2 Make Use Of CSS - And Ensure The CSS Is Compliant: You should make use of CSS (Cascading Style Sheets) to define the appearance of your HTML pages. You should seek to avoid use of HTML formatting elements (e.g. avoid spacer GIFs, <font> tags, etc.) - although it is recognised that use of tables for formatting may be necessary in order to address the poor support for CSS-positioning in some Web browsers. You should also ensure that your CSS is compliant with appropriate standards [3].
3 Provide A Search Facility For Your Web Site: You should provide a search facility for your Web site if it contains more than a few pages [4].
4 Ensure Your 404 Error Page Is Tailored: You should aim to ensure that the 404 error page for your Web site is not the default page but has been configured with appropriate branding, advice and links to appropriate resources, such as the search facility [5].
5 Have A URI Naming Policy For Your Web Site: You should ensure that you have a URI naming policy for your Web site [6].
6 Check Your Links - And Have a Link-Checking Policy: You should ensure that you check for broken links on your Web site. You should ensure that links work correctly when pages are created or updated. You should also ensure that you have a link checking policy which defines the frequency for checking links and your policy when broken links are detected [7].
7 Think About Accessibility: You should address the accessibility of your Web site from the initial planning stages. You should ensure that you carry out appropriate accessibility testing and that you have an accessibility policy [8].
8 Think About Usability: You should address the usability of your Web site from the initial planning stages. You should ensure that you carry out appropriate usability testing and that you have a usability policy.
9 Use Multiple Browsers For Checking: You should make use of several browsers for testing the accessibility, usability and functionality of your Web site. You should consider making use of mainstream browsers (Internet Explorer and FireFox) together with more specialist browsers such as Opera.
10 Implement QA Policies For Your Web Site: You should ensure that you have appropriate quality assurance procedures for your Web site [9] [10].

References

Compliance with HTML Standards, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-01/>
Deployment Of XHTML 1.0, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-35/>
Use Of Cascading Style Sheets (CSS), QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-34/>
Search Facilities For Your Web Site, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-08/>
404 Error Pages On Web Sites, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-06/>
URI Naming Conventions For Your Project Web Site, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-16/>
Approaches To Link Checking, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-07/>
Accessibility Testing, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-02/>
Summary of the QA Focus Methodology, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-30/>
Implementing Your Own QA, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-58/>

Briefing 56

Using Instant Messaging Software

About Instant Messaging

Instant messaging (IM) is growing in popularity as the Internet becomes more widely used in a social context. The popularity of IM in a social context is leading to consideration of its potential for work purposes in providing real time communications with colleagues and co-workers.

Popular IM applications include MSN Messenger, Yahoo Messenger and AOL Messenger [1]. In addition to these dedicated applications a number of Web-based services also provide instant messaging facilities within the Web site, such as YahooGroups [2]. The JISCMail list management service also provides a Web-based instant messaging facility [3].

The Benefits

Instant Messaging software can provide several benefits:

The immediacy provided by instant communications
Avoiding swamping list members with unnecessary messages
Various value-added features, such as sharing desktop applications

Instant messaging fans appreciate the immediacy of communications it provides, which can be particularly valuable when working on small-scale concrete tasks.

Possible Problems

There is a need to be aware of potential problems which can be encountered when using instant messaging software:

Need to install an appropriate IM client
Lack of interoperability across IM clients from different vendors
Dealing with interruptions
Lack of an archive of discussions, missing messages when away, etc.
Difficulties in following discussions when used by several people

Critics of instant messaging argue that, although IM may have a role to play for social purposes, for professional use email should be preferred.

Policies For Effective Use of Instant Messaging

Instant messaging may prove particularly useful when working with remote workers or if you are involved in project work with remote partners. However in order to make effective use of instant messaging tools there is a need to implement a policy governing its usage which addresses the problem areas described above.

Software:: You will have to select the IM software. Note you may find that users already have an ID for a particular IM application and may be reluctant to change. There are multi-protocol IM tools available, such as gaim [4] and IM+ [5] although you should be aware that these may have limited functionality. In addition to these desktop applications, there are also Web-based tools such as JWChat [6].
Usage:: You will need to define how instant messaging is to be used and how it will complement other communications channels, such as email.
Privacy, security, etc issues:: You will need to define a policy on dealing with interruptions, privacy and security issues.
It is important to note that different IM environments (e.g. Jabber and MSN) work in different ways and this can affect privacy issues.
Records:: You will need to define a policy on recording instant messaging discussions. Note that a number of IM clients have built-in message archiving capabilities.

As an example of a policy on use of instant messaging software see the policy produced for the QA Focus project [7] together with the QA Focus case study [8]. As an example of use of IM in an online meeting see the transcript and the accompanying guidelines at [9].

References

Instant Messenger FAQs, University of Liverpool,
<http://www.liv.ac.uk/CSD/helpdesk/faqs/instant/>
YahooGroups,
<http://groups.yahoo.com/>
DISCUSS Discussion Room at JISCMail, JISCMail,
<http://www.jiscmail.ac.uk/lists/discuss.html>
GAIM,
<http://gaim.sourceforge.net/>
IM+, Shape Services,
<http://www.shapeservices.de/eng/im/>
Jabber Web Chat, JWChat,
<http://jwchat.sourceforge.net/>
Policy on Instant Messaging, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/qa/policies/instant-messaging/>
Implementing A Communications Infrastructure, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-12/>
Approaches To Web Development: Online Discussion, Web Focus, UKOLN,
<http://www.ukoln.ac.uk/web-focus/events/online/VLS-aug-2001/>

Briefing 57

Accessibility Testing In Web Browsers

About This Document

This document provides advice on configuring popular Web browsers in order to ensure your Web site is widely accessible. The document covers Internet Explorer 7.0, Firefox 3 and Opera 9.6 running on Microsoft Windows.

Disabling JavaScript

Some browsers do not support JavaScript. Some organisations / individuals will disable JavaScript due to security concerns.

Browser	Technique
Internet Explorer	Select `Tools` menu and `Internet Options` option. Select the `Security` tab, choose the `Internet` icon choose the `Custom` level option. Scroll to the `Scripting` option and choose the `Disable` (or `Prompt`) option.
Firefox	Select `Tools` menu and `Options` option. Open `Content`, unselect the `Enable Javascript` option and select `OK`.
Opera	Select `File` menu and choose `Preferences` option. Choose the `Multimedia` option, disable JavaScript option and select `OK`.

Resizing Text

Some individuals will need to resize the display in order to read the information provided.

Browser	Technique
Internet Explorer	Select `View` menu and choose `Text Size` option.
Firefox	Select `View` menu and choose the `Zoom` option. Choose the option to Zoom in. Repeat using Zoom out
Opera	Select `View` menu and choose `Zoom` option. Then zoom by a factor or, say, 50% and 150%.

Disabling Images

Some people cannot see images and some may disable images for performance or privacy reasons.

Browser	Technique
Internet Explorer	Select `Tools` menu and `Internet Options` option. Uncheck the `Show pictures automatically` option.
Firefox	Select `View` menu and `Options` option. Open the `Content` tab, uncheck the `Load images automatically` tab and select `OK`.
Opera	Select `File` menu and choose `Preferences` option. Choose `Multimedia` option, select the `Show images` pull-down menu and choose the `Show no images` option and select `OK`.

Disabling Popup Windows

Some browsers and assistive technologies may not support pop-up windows. Individuals may disable pop-up windows due to their misuse by some commercial sites.

Browser	Technique
Internet Explorer	Select the `Tools` tab and `Pop-Up Blocker` option. Ensure that the `Pop-Up Blocker` option is selected.
Firefox	Select `Tools` menu and `Options` option. Select the `Content` tab and click on the `Block pop-up windows` option.
Opera	Select `File` menu and choose `Preferences` option. Choose `Windows` option in the `Pop-ups` pull-down menu and choose the `Refuse Pop-ups` option and select `OK`.

Systematic Testing

You should use the procedures in a systematic way: for example as part of a formal testing procedure in which specific tasks are carried out.

Use of Bookmarklets And FireFox Extensions

Bookmarklets are browser extension may extend the functionality of a browser. Many accessibility bookmarklets are available (known as Firefox Extensions for the Firefox browser). It is suggested that such tools are used in accessibility testing. See Interfaces To Web Testing Tools at <http://www.ariadne.ac.uk/issue34/web-focus/>

Briefing 58

Implementing Your Own QA

About This Document

This document describes how you can implement your own quality assurance policies and procedures to support your development work.

The QA Focus Methodology

The QA Focus methodology aims to ensure that IT development work produces services which are widely accessible and interoperable. It seeks to do this by developing a quality assurance framework which developers can make use of.

As described in the QA Focus briefing document "Summary of the QA Focus Methodology" [1] the QA Focus methodology is based on:

Documented policies on standards and best practices:: If the standards and best practices are not documented it will be difficult to ensure best practices are implemented, especially in light of staff turnover, changing environments, etc.
Documentation of the architecture used:: A description of the architecture is needed to ensure that the architecture used to implement the system is capable of complying with the standards.
Documented exceptions:: There may be occasions when deviations from standards may be allowed. Such deviations should be documented and responsibility for this agreed.
Systematic checking:: It is necessary to document systematic procedures for ensuring compliance with standards.
Audit trails:: It can be helpful to provide audit trails which can help spotting trends.

Implementing Your Own QA

The QA Focus briefing document "Summary of the QA Focus Methodology" [1] provides examples of implementing QA in the areas of Web standards and link checking. In this document we provide a template which can be used for any relevant aspect of IT development work.

QA Template

The following template can be used for developing your own QA framework.

Area:: The area covered by the QA (e.g. Web, software development, usability, ...)
Standards:: The standards which are relevant to the area and which you intend to make use of.
Best Practises:: The best practices which are relevant to the area and which you intend to make use of.
Architecture:: The architecture you intend to use.
Exceptions:: A summary of the exceptions to best practices and recommended standards and a justification for the exceptions.
Change Control:: A description of the responsibility for changing this QA document and the process for changing the policy.
Checking:: A description of the systematic checking procedures which will ensure that you are complying with the policies you have established.
Audit trail:: A description of audit trails (if any) which provide a record your compliance checking, in order to identify any trends.

As can be seen this QA template is simple and straightforward to use. The QA Focus methodology recognises the lack of resources which can hinder the deployment of more comprehensive QA frameworks and so has developed a more light-weight approach.

Examples

Examples of use of this approach can be found on the QA Focus Web site, which includes details of QA policies and procedures in the areas of Web standards [2], linking [3], usage statistics [4] and instant messaging [5].

References

Summary of the QA Focus Methodology, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-30/>
Policy On Web Standards, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/qa/policies/web/>
Policy On Linking, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/qa/policies/links/>
Policy On Usage Statistics, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/qa/policies/statistics/>
Policy On Instant Messaging, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/qa/policies/instant-messaging/>

Briefing 59

A URI Interface To Web Testing Tools

Background

As described in other QA Focus briefing document [1] [2] it is important to ensure that Web sites comply with standards and best practices in order to ensure that Web sites function correctly, to provide widespread access to resources and to provide interoperability. It is therefore important to check Web resources for compliance with standards such as HTML, CSS, accessibility guidelines, etc.

This document summarises different models for such testing tools and describes a model which is based on provided an interface to testing tools through a Web browsers address bar.

Models For Testing Tools

There are a variety of models for testing tools:

Desktop checking tools: Tools installed on a desktop computer, such as link-checking software. Such tools will be familiar to many Web developers.
Server-based tools: Tools installed on a server computer. Such tools normally require systems administrators privileges.

Although a variety of models are available, they all suffer from the lack of integration will the normal Web viewing and publishing process. There is a need to launch a new application or go to a new Web resource in order to perform the checking.

A URI Interface To Testing Tools

A URI interface to testing tools avoids the barrier on having to launch an application or move to a new Web page. With this approach if you wish to validate a page on your Web site you could simply append an argument (such as ,validate) in the URL bar when you are viewing the page. The page being viewed will then be submitted to a HTML validation service. This approach can be extended to recursive checking: appending ,rvalidate to a URI will validate pages beneath the current page.

This approach is illustrated. Note that this technique can be applied to a wide range of Web-based checking services including:

CSS compliance
Link checking
Automated accessibility checking
HTTP header analysis
...

This approach has been implemented on the QA Focus Web site (and on UKOLN's Web site). For a complete list of tools available append ,tools to any URL on the UKOLN Web site or see [3].

Implementing The URI Interface

This approach is implemented using a simple Web server redirect. This has the advantage of being implemented in a single place and being available for use by all visitors to the Web site.

For example to implement the ,validate URI tool the following line should be added to the Apache configuration file:

RewriteRule /(.*),validate http://validator.w3.org/check?uri=http://www.foo.ac.uk/$1 [R=301]

where www.foo.ac.uk should be replaced by the domain name of your Web server (note that the configuration details should be given in a single line).

This approach can also be implemented on a Microsoft IIS platform, as described at [3].

References

Compliance with HTML Standards, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-01/>
Use Of Cascading Style Sheets (CSS), QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-34/>
Web Site Validation and Auditing Tools, UKOLN,
<http://www.ukoln.ac.uk/site/tools/>

Briefing 60

Top Tips For Selecting Open Source Software

About this document

Performance and reliability are the principal criteria for selecting software. In most procurement exercises however, price is also a determining factor when comparing quotes from multiple vendors. Price comparisons do have a role, but usually not in terms of a simple comparison of purchase prices. Rather, price tends to arise when comparing "total cost of ownership" (TCO), which includes both the purchase price and ongoing costs for support (and licence renewal) over the real life span of the product. This document provides tips about selecting open source software.

The Top Tips

Consider The Reputation: Does the software have a good reputation for performance and reliability? Here, word of mouth reports from people whose opinion you trust is often key. Some open source software has a very good reputation in the industry, e.g. Apache Web server, GNU Compiler Collection (GCC), Linux, Samba, etc. You should be comparing "best of breed" open source software against its proprietary peers. Discussing your plans with someone with experience using open source software and an awareness of the packages you are proposing to use is vital.
Monitor Ongoing Effort: Is there clear evidence of ongoing effort to develop the open source software you are considering? Has there been recent work to fix bugs and meet user needs? Active projects usually have regularly updated web pages and busy development email lists. They usually encourage the participation of those who use the software in its further development. If everything is quiet on the development front, it might be that work has been suspended or even stopped.
Look At Support For Standards And Interoperability: Choose software which implements open standards. Interoperability with other software is an important way of getting more from your investment. Good software does not reinvent the wheel, or force you to learn new languages or complex data formats.
Is There Support From The User Community?: Does the project have an active support community ready to answer your questions concerning deployment? Look at the project's mailing list archive, if available. If you post a message to the list and receive a reasonably prompt and helpful reply, this may be a sign that there is an active community of users out there ready to help. Good practice suggests that if you wish to avail yourself of such support, you should also be willing to provide support for other members of the community when you are able.
Is Commercial Support Available?: Third party commercial support is available from a diversity of companies, ranging from large corporations such as IBM and Sun Microsystems, to specialist open source organizations such as Red Hat and MySQL, to local firms and independent contractors. Commercial support is most commonly available for more widely used products or from specialist companies who will support any product within their particular specialism.
Check Versions: When was the last stable version of the software released? Virtually no software, proprietary or open source, is completely bug free. If there is an active development community, newly discovered bugs will be fixed and patches to the software or a new version will be released. For enterprise use, you need the most recent stable release of the software, be aware that there may have been many more recent releases in the unstable branch of development. There is, of course, always the option of fixing bugs yourself, since the source code of the software will be available to you. But that rather depends on your (or your team's) skill set and time commitments.
Think Carefully About Version 1.0: Open source projects usually follow the "release early and often" motto. While in development they may have very low version numbers. Typically a product needs to reach its 1.0 release prior to being considered for enterprise use. (This is not to say that many pre-"1.0" versions of software are not very good indeed, e.g. Mozilla's 0.8 release of its Firefox browser).
Check The Documentation: Open source software projects may lag behind in their documentation for end users, but they are typically very good with their development documentation. You should be able to trace a clear history of bug fixes, feature changes, etc. This may provide the best insight into whether the product, at its current point in development, is fit for your purposes.
Do You Have The Required Skill Set?: Consider the skill set of yourself and your colleagues. Do you have the appropriate skills to deploy and maintain this software? If not, what training plan will you put in place to match your skills to the task? Remember, this is not simply true for open source software, but also for proprietary software. These training costs should be included when comparing TCOs for different products.
What Licence Is Available?: Arguably, open source software is as much about the license as it is about the development methodology. Read the licence. Well-known licenses such as the General Public License (GPL) and the Lesser General Public License (LGPL) have well defined conditions for your contribution of code to the ongoing development of the software or the incorporation of the code into other packages. If you are not familiar with these licenses or with the one used by the software you are considering, take the time to clarify conditions of use.
What Functionality Does The Software Provide?: Many open source products are generalist and must be specialised before use. Generally speaking the more effort required to specialise a product, the greater is its generality. A more narrowly focused product will reduce the effort require to deploy it, but may lack flexibility. An example of the former is GNU Compiler Collection (GCC), and an example of the latter might be Evolution email client, which works well "out of the box" but is only suitable for the narrow range of tasks for which it was intended.

Further Information

The Open Source Software Definition,
<http://www.opensource.org/docs/definition.php> which sets out the distribution terms for software to count as open source.
The Free Software Definition,
<http://www.gnu.org/philosophy/free-sw.html> which clarifies the sense of "free" that relates to the freedom to run, distribute and change the software.
The Cathedral and the Bazaar,
<http://www.catb.org/~esr/writings/cathedral-bazaar/> the classic text on open source development methodologies.
Open Sources: Voices from the Open Source Revolution
<http://www.oreilly.com/catalog/opensources/book/toc.html> with essays from many of the important figures in the free software and open source movements.
SourceForge.net,
<http://sourceforge.net/> which is a repository of thousands of open source projects.

Acknowledgements

This document was written by Randy Metcalfe of OSS Watch. OSS Watch is the open source software advisory service for UK higher and further education. It provides neutral and authoritative guidance on free and open source software, and about related open standards.

The OSS Watch Web site ia available at http://www.oss-watch.ac.uk/.

Briefing 61

Deployment Of Software Into Service

Background

The start of your project will involve a great deal of planning and work scheduling. If you will be developing software, this is also the best time to consider and plan for its long-term future and viability. Decisions on software development made in the early stages of a project are important as they will often govern the options open to you for deployment beyond the life of the project. Although some choices may be influenced by the current technical environment of your institution, early consideration of a range of deployment issues will allow the possibility of a greater number of hosting options at the end of project, so ensuring continued existence of the software you have developed, and long-term access to it.

Careful choices will also reduce the cost of the work required for deployment, and allow you to minimize the portion of your budget that you have to allocate to the Service Provider.

Choice of Platform

If possible, software should be developed on the same platform that will eventually be used for service delivery. Microsoft Windows and Unix (especially Solaris) servers are the main options.

Porting software developed on one platform to another may not be straightforward, even if the chosen application software is claimed to run on both platforms. Proven technical solutions are preferred - do you have examples where your chosen application software has been used on both platforms?

Development Environment

Software and Licensing Issues

If software licenses are required by the Service Provider, these must be available at a cost within the service budget. Be aware of licensing conditions: a Service Provider may require a commercial license if a charge is to be made for the service, whereas the project may be able to use an educational license. The cost of the various types of license may vary.

Care is also needed when choosing software that is free at the point of use to project staff, such as a University site licence for a commercial database system. Even though the project itself may incur no additional costs, licences could be prohibitively expensive for the Service Provider.

Consider the use of open source software [1] to avoid most licence problems! Good quality open source software can greatly reduce the cost of software development. Developers should be aware, however, that some 'open source' software is poorly written, inadequately documented, and entirely unsupported. Be aware that the costs of ongoing software maintenance, often undertaken by staff outside the original project, should also be factored in.

Best Practice

Good programming practice and documentation is very important. Well-written and structured software with comprehensive documentation will ease transition to a service environment and aid the work of the Service Provider [2]. It is better for a project to recruit a good engineer used to working in a professional development environment, than to recruit purely on the basis of specific technical skills. Also, try to code in languages commonly adopted in your application area: for example Java or Perl for Web programming. You can write Web applications in Fortran, but don't.

If possible a modular architecture is best. It will maximise the options for transfer to a Service Provider and also any future development. For example, if one application were used for a Web user interface and another for a database back end then, provided these communicate using open standards (Z39.50, standard SQL, for example), Web Services might be added to the service at a future date. A service built with a fully integrated single package of components that use proprietary native protocols might have to be scrapped and rebuilt to satisfy even fairly minor new requirements.

Use of Open Standards [3] should ensure portability, but there will still need to be technical structures supporting their use and deployment, whether in a project or service environment. You will need to document all the technical layers that need to be reproduced by the Service Provider in order for your software to run. Open standards can also give flexibility; for example the project and the service provider do not necessarily need to use the same SQL database, provided the standard is followed.

Usability

Be aware of your intended user base. Ensure that any user interface developed during the project has been through usability tests and allow time for any feedback to be incorporated into the final design. A well-designed interface will mean less support calls for the Service Provider.

When designing your user interface remember that there are legal requirements to fulfil with regard to disability access which the Service Provider will need to be satisfied are met. The JISC TechDis [4] service provides information and advice. You may wish to consider provision of user documentation and training documentation in support of the service, which the Service Provider could use and make available.

Monitoring & Auditing

Comprehensive error reporting should be a feature of the deployed application. This will aid the Service Provider in identifying and solving problems. You should consider building comprehensive error reporting mechanisms into your software from the beginning, along with various mechanisms for escalating the severity of reported errors that threaten the viability of the service. These may range from simply logging errors to a file, through to emailing key personnel.

Services must be monitored. It should be possible to use a simple HTTP request (or equivalent for non-Web interfaces) to test the service is available and running, without requiring a multi-step process (such as log in, initiate session and run a search).

Logging is crucial for services, especially subscription services where customers need to monitor usage to assess value for money. Project COUNTER [5] defines best practice in this area. If project staff are still available, the Service Provider will then be able to provide you with logging information and potentially seek your input on future activity and development.

Authentication

Authentication and authorisation should be flexible since requirements are subject to change. Enable the service provider to execute an external script, or at least write their own module or object, rather than embedding the authentication mechanism in the user interface code.

Machine to machine connections

Where the product makes use of external middleware services (an example being for OpenURL support), ensure these are totally configurable by the service provider. Configuration files are good; but the ability to add modules or objects for these features is better.

Legal Issues

Although not a technical consideration, it is important and worth emphasising that the Service Provider will require that all copyright and IPR issues be clarified. Where software has been developed, does the institution at which project staff work have any IPR guidelines that must be followed? What provision is needed to allow the Service Provider to make changes to the software? Is a formal agreement between the project institution and the Service Provider needed?

Service Environment

If you have identified where your software could be hosted then make early contact with the Service Provider to discuss costs and any constraints that may arise in deployment.

The Service Provider will need to be confident that your application will be stable, will scale, and will perform acceptably in response to user demand. If this is not the case then the application will eventually bottleneck and tie up machine resources unproductively which will lead to unresponsiveness.

You should ensure that the application is stress tested by an appropriate number of users issuing a representative number of service requests. There are also several tools available to stress test an application, but a prerequisite to this step is that the project team should be aware of their intended user base and the anticipated number of users and requests. The project team should also be aware of project and service machine architectures as divergence in architecture will affect the viability of any stress testing metrics generated. The Service Provider will want estimates of memory and processor use scaled by the number of simultaneous users. Performance and scalability will remain unresolved issues unless the project software can be tested in a service environment. If this is the case it is especially important to stick to proven technical solutions. You should discuss stress-testing results with your Service Provider as soon as possible.

Adopting best practices is a good start to ensuring that your application will be stable. The discipline of producing well-written and properly documented code is one safeguard against the generation of bugs within code.

If there are likely to be service updates you will need to consider the procedures involved and detail how the new data will be made available and incorporated into the service. Service Providers will generally wish to store two copies of databases that require updates; one being used for service with the other instance being used for updates and testing. Updated databases will also require frequent backups whilst static data requires only one copy of the database without regular backups. Consider splitting large data sets into separate segments: a portion that is static (for example archive data added prior to 2001) and a smaller portion that is updated (data added since 2001). Also, aim to keep data and application software as separate as possible. Again, this will aid a backup regime in a service environment.

You should anticipate that the Service Provider may need to make changes to your software. This may be due to possible technical conflicts with other services hosted by the Service Provider, or may be due to their implementation policy or house style. Again, early contact with a possible Service Provider will highlight these issues and help avoid potential difficulties. Also consider if project staff will be available for referrals of errors or omissions in functionality. If not, you will need to allow the Service Provider to make changes to your software.

If further development of the software beyond the project is feasible you should agree a development schedule and a timetable for transfer to production, as provision of a continued and stable service will be of prime importance to the Service Provider. Major changes to a user interface will also have implications for support and user documentation. If no continued development is planned, the Service Provider may still wish to introduce bug fixes or new versions of any software you have used. Again, good documentation and well-documented code will ensure that problems are minimised.

You should consider under what circumstances your software should be withdrawn and cease to be made available through a Service Provider. If you would expect to be involved in the decision to withdraw the service then contact with project personnel will need to be maintained, or you will need to provide guidance at time of transfer to service about the possible lifetime of the hosting agreement.

Moving Your Software

Allow time before the end of the project to work with the Service Provider. The availability and expertise of project staff will influence the success of moving to service deployment and ultimately the associated costs.

A complete handover of the software without good contact with the project team and without support may well cause problems and will also take longer. This is particularly true if the application contains technologies unfamiliar to the Service Provider. The project team should be prepared to assist the Service Provider in areas where it has specialist expertise and, if possible, factor in continued access to project personnel beyond the end of the project.

Complete and full documentation detailing the necessary steps for installation and deployment of your software and the service architecture, will aid an optimum transition to hosting by a Service Provider. The Service Provider may not have exactly the same understanding and skill set as the project team itself, and will require explicit instructions. Alternatively, the Service Provider may request help from the project team in identifying a particular aspect of the service architecture that could be replaced with a preferred and known component. Deploying technologies that are unfamiliar to the Service Provider will reduce their responsiveness and effectiveness in handling problems with the application.

Consideration should be given to development of a test bed and test scripts that will allow the Service Provider to confirm correct operation of your software once installed in the service environment.

Things Not To Worry About

Backup and disaster recovery procedures are the responsibility of the Service Provider; do not waste project time on defining specific procedures for the service (but do, of course, back up your project work for your own benefit).

So You Want To Be Different...

Project activity is by its nature about exploring possibilities to develop new service functionality, and you may choose, or need, to utilise emerging tools and technologies. This approach to development and the software it produces may not fit comfortably with the desire of the Service Provider. A Service Provider wants software of service quality based on known solutions that ensures good use of resources and sustainability in a service environment. It is recognised that these opposing drives may be inevitable and that projects must be allowed to explore new technologies and methods, even at the expense of placing additional demands on Service Providers to resolve the problems of deployment.

If relatively immature technologies are being used it is very important that modular development procedures are used as much as possible. Where software has been developed in a modular fashion it will often be relatively straightforward to replace individual components; for example, to change to a different database application or servlet container. During the development process this means competing technologies, which may be at different stages of maturity, can be benchmarked against each other. At the deployment stage it means the option that provides best 'service quality' can be adopted.

Whatever choice of software environment is made, it is always wise to follow best practise by producing well-written and documented code.

It is worth stressing the benefits of contacting possible Service Providers to explore options at the start of a project: they too may be considering future strategy and it is possible that both your and their plans might benefit from co-operation.

References

Top Tips For Selecting Open Source Software, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-60/>
Software Code Development, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-13/>
What Are Open Standards?, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-11/>
TechDis,
<http://www.techdis.ac.uk>
Project COUNTER,
<http://www.projectcounter.org>

Briefing 62

Digitising Data For Preservation

Background

Digitisation is a production process. Large numbers of analogue items, such as documents, images, audio and video recordings, are captured and transformed into the digital masters that a project will subsequently work with. Understanding the many variables and tasks in this process - for example the method of capturing digital images in a collection (scanning or digital photography) and the conversion processes performed (resizing, decreasing bit depth, convert file formats, etc.) - is vital if the results are to remain consistent and reliable. By documenting the workflow of digitisation, a life history can be built-up for each digitised item. This information is an important way of recording decisions, tracking problems and helping to maintain consistency and give users confidence in the quality of your work.

What to Record

What action was performed at a specific stage? Identify the action performed. For example, resizing an image.
Why was the action performed? Establish the reason that a change was made. For example, a photograph was resized to meet pre-agreed image standards.
When was the action performed? Indicate the specific date the action was performed. This will enable project development to be tracked through the system.
How was the action performed? Ascertain the method used to perform the action. A description may include the application in use, the machine ID, or the operating system.
Who performed the action? Identify the individual responsible for the action. This enables actions to be tracked and identify similar problems in related data.

By recording the answers to these five questions at each stage of the digitisation process, the progress of each item can be tracked, providing a detailed breakdown of its history. This is particularly useful for tracking errors and locating similar problems in other items. The actual digitisation of an item is clearly the key point in the workflow, and therefore formal capture metadata (metadata about the actual digitisation of the item) is particularly important.

Where to Record the Information

Where possible, select an existing schema with a binding to XML:

TEI (Text Encoding Initiative) and EAD (Encoded Archival Description) for textual documents
NISO Z39.87 for digital still images.
SMIL (Synchronized Multimedia Integration Language), MPEG-7 or the Library of Congress' METS A/V extension for Audio/Video.

Quality Assurance

To check your XML document for errors, QA techniques should be applied:

Validate XML against your schema or an XML parser
Check that free text entries follow local rules and style guidelines

Further Information

Encoded Archival Description,
<http://www.loc.gov/ead/>
A Metadata Primer,
<http://www.cmswatch.com/Features/TopicWatch/FeaturedTopic/?feature_id=85>
Dublin Core Metadata Initiative,
<http://dublincore.org/>
MARC Standards,
http://www.loc.gov/marc/>
MPEG- 7 Standard,
<http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm>
Synchronized Multimedia,
<http://www.w3.org/AudioVideo/>
TEI Consortium,
<http://www.tei-c.org/>
Three SGML metadata formats: TEI, EAD, and CIMI,
<http://hosted.ukoln.ac.uk/biblink/wp1/sgml/tei.rtf>
Z39.87: Technical metadata for still digital images,
<http://www.niso.org/standards/resources/Z39_87_trial_use.pdf>

Briefing 63

Choosing a Metadata Standard For Resource Discovery

Background

Resource discovery metadata is an essential part of any digital resource. If resources are to be retrieved and understood in the distributed environment of the World Wide Web, they must be described in a consistent, structured manner suitable for processing by computer software. There are now many formal standards. They range from simple to rich formats, from the loosely structured to the highly structured, and from proprietary, emerging standards, to international standards.

There is no set decision-making procedure to follow but here are some factors that should normally be considered:

Purpose of metadata: A well-articulated definition of purposes at the outset can act as a benchmark against which to compare standards. Metadata may be for:

Choosing the correct file types and establishing required sizes
Retrieval: Can I find the resource?
Identification: Can I distinguish the resource from other similar resources (e.g. similar titles, or other editions or versions)?
Access: Can I use the resource (e.g. are there legal restrictions on access and usage and is it in a format I can handle)?

Attributes of resource: It is important that you also identify your resource type (e.g. text, image), its domain of origin (e.g. library, archive or museum), subject (e.g. visual arts, history) and the specific features that are essential to an understanding of it. Datasets, digital texts, images and multimedia objects, for instance, clearly have very different attributes. Does your resource have pagination or is it three-dimensional? Was it born digital or does it have a hard-copy source? Which attributes will the user need to know to understand the resource?

Design of standard: Metadata standards have generally been developed in response to the needs of specific resource types, domains or subjects. Therefore, once you know the type, domain and broad subject of your resource, you should be able to draw up a shortlist of likely standards. Here are some of the better-known ones:

Anglo-American Cataloguing Rules (AACR2) Library resources
See <http://www.aacr2.org/>
Data Documentation Initiative (DDI) Social sciences, datasets
See <http://www.icpsr.umich.edu/DDI/>
Dublin Core (DC) All domains, resource types, and subjects
See <http://dublincore.org/>
Encoded Archival Description (EAD) Archives.
See <http://www.loc.gov/ead/>
ISAD (G) Guidelines for the preparation of archival descriptions
See <http://www.hmc.gov.uk/icacds/eng/ISAD(G).pdf>
MARC 21 - Libraries, bibliographic records
See <http://www.loc.gov/marc/>
RSLP Collection-level description Collections of all subjects, domains and types.
See <http://www.ukoln.ac.uk/metadata/cld/>
SPECTRUM Museum objects
See <http://www.mda.org.uk/spectrum.htm>
Text Encoding Initiative (TEI) Digital texts
See <http://www.tei-c.org/>
VRA Core 3.0 Visual art images
See <http://www.vraweb.org/vracore3.htm>

The key attributes of your resource can be matched against each standard in turn to find the best fit. Is there a dedicated element for each attribute? Are the categories of information relevant and at a suitable level of detail?

Granularity: At this point it is worth considering whether your metadata should (as is usual) be created at the level of the text, image or other such item or at collection level. Collection-level description may be provided where item-level metadata is not feasible or as an additional layer providing an overview of the resource. This could be valuable for large-scale digitisation projects or portals where item-level searching may retrieve an unmanageable number of 'hits'. Digital reproductions may be grouped like their real world sources e.g. by subject or provenance - or be assigned to multiple 'virtual collections'. The RSLP Collection Level Description is emerging as the leading format in this area.

Interoperability: It is important, wherever possible, to choose one of the leading standards (such as those listed above) from within your subject community or domain. This should help to make your resource accessible beyond the confines of your own project. Metadata that is in a recognisable common format may be harvested by subject or domain-wide portals and cross-searched with resources from many other institutions. In-house standards may be tailored to your precise needs but are unlikely to be compatible with other standards and should be used only where nothing suitable already exists. If your over-riding need is for interoperability across all domains or subjects, Dublin Core may be the most suitable standard but it may lack the richness required for other purposes. Care should be taken to ensure that in-house standards at least map to Dublin Core or one of the DC Application profiles.

Support: Using a standard that is well supported by a leading institution can also bring cost benefits. Implementation guidance, user guidance, examples, XML/RDF schemas, crosswalks, multi-lingual capacity, and software tools may pre-exist, thus easing the process of development, customisation and update.

Growth: Consider too whether the standard is capable of further development? Are there regular working groups and workshops devoted to the task?

Extensibility: Also, does the standard permit the inclusion of data elements drawn from other schemas and the description of new object types? It may be necessary to 'mix and match' elements from more than one standard.

Reputation: Funding bodies will be familiar with established, international standards - something, perhaps, to remember when applying for digitisation grants.

Ease of use: Be aware that the required level of expertise can vary greatly between standards. AACR2 and MARC 21, for instance, may produce rich bibliographic description but require the learning of rules. The simpler Dublin Core may allow creators to produce their own metadata records with no extensive training.

Existing experience: Have staff at your organisation used the metadata standard before? If so, the implementation time may be reduced.

Summary

There is no single standard that is best for all circumstances. Each is designed to meet a need and has its own strengths and weaknesses. Start by considering the circumstances of the individual digital project and identify the need(s) or purpose(s) that the metadata will need to satisfy. Once that is done, one can evaluate rival metadata schemas and find the best match. A trade-off will normally have to be made between the priorities listed above.

Summary

Further Information

Digital libraries: metadata resources, Sophie Felfoldi, International Federation of Library Associations and Institutions.
<http://www.ifla.org/II/metadata.htm>
Application profiles: mixing and matching metadata schemas, Rachel Heery and Manjula Patel. In: Ariadne, No. 25, 24 September 2000,
<http://www.ariadne.ac.uk/issue25/app-profiles/>

Briefing 64

Metadata And Subject Searching

Introduction

Digital collections are only likely to make an impact on the Web if they are presented in such a way that users can retrieve their component parts quickly and easily. This is true even if they have been well selected, digitised to a suitable standard and have appropriate metadata formats. Subject-based access to the collection through searching and/or browsing a tree-like structure can greatly enhance the value of your resource.

Subject Access - Some Options

Subject-based access can be provided in several ways:

Keywords: A simple but crude method is to anticipate the terms that an unguided searcher might intuitively choose and insert them into a keyword field within relevant records. For instance, the text of Ten days that shook the world [1], a classic narrative of the events of 1917, is more likely to be retrieved if the keywords Russian Revolution are added by the cataloguer (based on his/her analysis of the resource and subject knowledge) and if the keyword field is included in the search. In the absence of an agreed vocabulary, however, variant spellings (labor versus labour), and synonyms or near synonyms (Marxist versus Communist) that distort retrieval are likely to proliferate.

Thesauri and subject schemes: Controlled vocabularies, known as thesauri, can prevent inconsistent description and their use is recommended. They define preferred terms and their spelling. If the thesaurus structure is shown on the search interface, users may be guided through broader-term, narrower-term and associated-term relationships to choose the most appropriate keyword with which to search. Take care to choose a vocabulary appropriate to the scope of your resource. A broad and general collection might require a correspondingly universal vocabulary, such as the Library of Congress Subject Headings (>LCSH) [2]. A subject-specific vocabulary, such as the Getty Art and Architecture Thesaurus (AAT) [3], may provide a more limited but detailed range of terms appropriate for a tightly focused collection.

Classification schemes: Keywords and thesauri are primarily aids to searching but browsing can often be a more rewarding approach - particularly for users new to a given subject area. Thesauri are not always structured ideally for browsing as when related or narrower terms are listed alphabetically rather by topical proximity. Truly effective browsing requires the use of a subject classification scheme. A classification scheme arranges resources into a hierarchy on the basis of their subject but differs from a thesaurus in using a sophisticated alphanumeric notation to ensure that related subjects will be displayed in close, browsable, proximity. A well-designed classification scheme should present a navigable continuum of topics from one broad subject area to another and in this way guide the user related items that might otherwise be missed, as in this example from the Dewey Decimal Classification (DDC) [4].

700	Arts, fine and decorative
740	Drawing and decorative arts
745	Decorative arts
745.6	Calligraphy, heraldic design, illumination
745.66	Heraldic design

The notation does not necessarily have to be displayed on screen, however. The subject terms, rather than their respective numbers, may mean more to the user. Another tip is to assign multiple classification numbers to any item that crosses subjects. Digital items can have several 'virtual' locations, unlike a book, which is tied to a single position on a shelf.

Keywords, thesauri and classification can be used in combination or individually.

Choosing a Classification Scheme

The most important consideration when choosing a classification scheme is to select the one that best fits the subject, scope and intended audience of your resource.

Universal classification schemes: These are particularly appropriate where collections and their audiences span continents, subjects and languages. Dewey Decimal Classification (DDC) [5], for instance, is the most widely recognised scheme worldwide, whilst UDC (Universal Decimal Classification) [6] is predominant in Europe and Asia. Well-established schemes of this sort are most likely to have user-friendly online implementation tools.

National or subject-specific schemes: More specific collections are usually best served by schemes tailored to a single country (e.g. BC Nederlandse Basisclassificatie) [7], language, or subject (e.g. NLM National Library of Medicine) [8]. If nothing suitable exists, a scheme can be created in-house.

Homegrown schemes: Project-specific schemes can be flexible, easy to change and suited wholly to one's own needs so that there are no empty categories or illogical subject groupings to hinder browsing. However, the development process is costly, time-consuming and requires expert subject-knowledge. Users are sure to be unfamiliar with your categories and, perhaps worst of all, such schemes are unlikely to be interoperable with the broader information world and will hinder wider cross searching. They should be regarded very much as a last resort.

Adapting an existing scheme: A better approach is normally to adapt an existing scheme by rearranging empty classes, raising or lowering branches of the hierarchy, renaming captions, or extending the scheme. Be aware, though, that recurring notation may be found within a scheme at its various hierarchical levels or the scheme might officially be modified over time, both of which can lead to conflict between the official and customised versions. Take care to document your changes to ensure consistency through the lifetime of the project. Some well-known Internet search-services (e.g. Yahoo!) [9] have developed their own classifications but there is no guarantee that they will remain stable or even survive into the medium term.

Double classification: It may be worthwhile classifying your resource using a universal scheme for cross-searching and interoperability in the wider information environment and at the same time using a more focused scheme for use within the context of your own Web site. Cost is likely to be an issue that underpins all of these decisions. For instance, the scheme you wish to use may be freely available for use on the Internet or alternatively you may need to pay for a licence.

References

Ten Days That Shook the World, Reed, J. New York: Boni & Liveright, 1922; Bartleby.com, 2000,
<http://www.bartleby.com/79/>
Library of Congress Authorities,
<http://authorities.loc.gov/>
Art & Architecture Thesaurus Online,
<http://www.getty.edu/research/conducting_research/vocabularies/aat/>
Dewey Decimal Classification and Relative Index, Dewey, M. in Joan S. Mitchell et al (ed.), 4 vols, (Dublin, Ohio: OCLC, 2003), Vol. 3, p. 610
Dewey Decimal Classification,
<http://www.oclc.org/dewey/>
Universal Decimal Classification,
<http://www.udcc.org/>
Nederlandse Basisclassificatie (Dutch Basic Classification),
<http://www.kb.nl/dutchess/>
National Library of Medicine
<http://wwwcf.nlm.nih.gov/class/>
Yahoo!
<http://www.yahoo.com/>

Further Information

DESIRE Information Gateways Handbook, Cross, P. et al
<http://www.desire.org/handbook/2-5.html>
Controlled Vocabularies, Thesauri and Classification Systems Available in the WWW, DC Subject (1996 and ongoing), Koch, T.
<http://www.ub2.lu.se/metadata/subject-help.html>
The Role of Classification Schemes in Internet Resource Description and Discovery, Koch, T. & Day, M.
<http://www.ukoln.ac.uk/metadata/desire/classification/>
Beyond Bookmarks: Schemes For Organising The Web, McKiernan, G.
<http://www.iastate.edu/~CYBERSTACKS/CTW.htm>

Briefing 65

Audio For Low-Bandwidth Environments

Background

Audio quality is surprisingly difficult to predict in a digital environment. Quality and file size can depend upon a range of factors, including vocal type, encoding method and file format. This document provides guidelines on the most effective method of handling audio.

Factors To Consider

When creating content for the Internet it is important to consider the hardware the target audience will be using. Although the number of users with a broadband connection is growing, the majority of Internet users utilise a dial-up connection to access the Internet, limiting them to a theoretical 56kbps (kilobytes per second). To cater for these users, it is useful to offer smaller files that can be downloaded faster.

The file size and quality of digital audio is dependent upon two factors:

File format
Type of audio

By understanding how these three factors contribute to the actual file size, it is possible to create digital audio that requires less bandwidth, but provides sufficient quality to be understood.

File Format

File format denotes the structure and capabilities of digital audio. When choosing an audio format for Internet distribution, a lossy format that encodes using a variable bit-rate is recommended. Streaming support is also useful for delivering audio data over a sustained period without the need for an initial download. These formats use mathematical calculations to remove superfluous data and compress it into a smaller file size. Several popular formats exist, many of which are household names. MP3 (MPEG Audio Layer III) is popular for Internet radio and non-commercial use. Larger organisations, such as the BBC, use Real Audio (RA) or Windows Media Audio (WMA), based upon its digital rights support. Table 1 shows a few of the options that are available.

Format	Compression	Streaming	Bit-rate
MP3	Lossy	Yes	Variable
Mp3PRO	Lossy	Yes	Variable
Ogg Vorbis	Lossy	Yes	Variable
RealAudio	Lossy	Yes	Variable
Windows Media Audio	Lossy	Yes	Variable

Figure 1: File Formats Suitable For Low-Bandwidth Delivery

Once recorded audio is saved in a lossy format, it is wise to listen to the audio data to ensure it is audible and that essential information has been retained.

Finally, it is recommended that a variable bit-rate is used. For speech, this will usually vary between 8 and 32kbp as needed, adjusting the variable rate accordingly if incidental music occurs during a presentation.

Choosing An Appropriate Encoding Method

The audio quality required, in terms of bit-rate, to record audio data is influenced significantly by the type of audio that you wish to record: music or voice.

Music - Music data is commonly transmitted in stereo and will vary significantly from one second to the next. A sampling rate of 32-64khz is appropriate for low-bandwidth environments, allowing users to listen to streamed audio without significant disruption to other tasks.
Voice - Voice is less demanding than music data. The human voice has a limited range, usually reaching 3-4khz. Therefore, an 8-15khz sampling rate and 8-32kbps bit-rate is enough to maintain good quality. Mono audio, transmitted through a single speaker, will also be suitable for most purposes. Common audio players 'double' the audio content, transmitting mono channel data as stereo audio through two speakers. This is equivalent to a short-range or AM radio, providing a good indication of the audio quality you can expect. By using these methods, the user can reduce file size for voice content by 60%+ in comparison to recording at a higher bit-rate without loss of quality.

Assessing Quality Of Audio Data

The creation of audio data for low-bandwidth environments does not necessitate a significant loss in quality. The audio should remain audible in its compressed state. Specific checks may include the following questions:

Can listeners understand voices in recording?
Can listeners hear quiet sounds?
Can listener hear loud sounds without distortion?

Further Information

IaWiki: MP3, <http://www.infoanarchy.org/wiki/wiki.pl?MP3>
MP3Pro Zone, <http://www.mp3prozone.com/>
Measuring Audio Quality, <http://www.itworld.com/AppDev/1469/NWW1204revside4/>
Ogg Vorbis, <http://www.vorbis.com/>
PC Recording, <http://www.pcrecording.com/>
Real Audio: Producing Music, <http://service.real.com/help/library/guides/production/htmfiles/audio.htm>
Xorys' MP3 FAQ, <http://webhome.idirect.com/~nuzhathl/mp3-faq.html>

Briefing 66

Producing And Improving The Quality Of Digitised Images

Introduction

To produce high-quality digital images you should follow certain rules to ensure that the image quality is sufficient for the purpose. This document presents guidance on digitising and improving image quality when producing a project Web site.

Choose Suitable Source Material

Quality scans start with quality originals - high-contrast photos and crisp B&W line art will produce the best-printed results. Muddy photos and light-coloured line art can be compensated for, but the results will never be as good as with high-quality originals. The use of bad photos, damaged drawings, or tear sheets - pages that have been torn from books, brochures, and magazines - will have a detrimental effect upon the resultant digital copy. If multiple copies of a single image exist, it is advisable to choose the one that has the highest quality.

Scan at a Suitable Resolution

It is often difficult to improve scan quality at a later stage. It is therefore wise to scan the source according to consistent, pre-defined specifications. Criteria should be based upon the type of material being scanned and the intended use. Table 1 indicates the minimum quality that projects should choose:

Use	Type	Dots Per Inch (dpi)
Professional	Text	200
Professional	Graphics	600
Non-professional	Text	150
Non-professional	Graphics	300

Table 1: Guidelines To Scanning Source Documents

Since most scans require subsequent processing, (e.g. rotate an image to align it correctly) that will degrade image quality, it is advisable to work at a higher resolution and resize the scans later.

Once the image has been scanned and saved to in an appropriate file format, measures should be taken to improve the image quality.

Straighten Images

For best results, an image should lay with its sides parallel to the edge of the scanner glass. Although it is possible to straighten images that have been incorrectly digitised, it may introduce unnecessary distortion of the digital image.

Sharpen the Image

To reduce the amount of subtle blur (or 'fuzziness') and improve visual quality, processing tools may be used to sharpen, smooth, improve the contrast level or perform gamma correction. Most professional image editing software contains filters that perform this function automatically.

Correct Obvious Faults

Scanned images are often affected by many problems. Software tools can be used to remove the most common faults:

Remove "red-eye" from a picture.
Correct the colour balance
Repair a tear or crease in a picture, or
Remove a moiré pattern from a picture scanned from a book.

Be careful you do not apply the same effect twice. This can create unusual effects that distract the observer when viewer the picture.

Further Information

JPEG Image Compression FAQ, part 1/2,
<http://www.faqs.org/faqs/jpeg-faq/part1/preamble.html>
How to Design Like a Pro,
<http://www.prographicsllc.com/Digi/Scans.html>
Scanning 101: Getting Great-Looking Line Art from Not-So-Great Sources,
<http://www.creativepro.com/story/feature/6718.html>

Briefing 67

Implementing and Improving Structural Markup

Background

Digital text has existed in one form or another since the 1960s. Many computer users take for granted that they can quickly write a letter without restriction or technical considerations. This document provides advice for improving the quality of structural mark-up, emphasising the importance of good documentation, use of recognised standards and providing mappings to these standards.

Why Should I Use Structural Mark-Up?

The use of structural mark-up can provide many organizational benefits:

Easier to maintain - allows modification to document structure without the need to directly edit the content. An entire site can be updated by changing a single CSS file.
Code reduction - by abstracting the structural element to a separate file, the structural information can be used by multiple documents, reducing the amount of code required.
Portable - The creation of well-formed documents will ensure the document will display correctly on browsers/viewers that support the markup language.
Interoperable - Structural data can be utilized to access information stored in a third party database.

Improving The Quality Of Structural Mark-Up

For organisations that already utilise structural mark-up the benefits are already apparent. However, some consideration should be made on improving the quality of descriptive data. The key to improving data quality is twofold: utilise recognised standards whenever possible; and establish detailed documentation on all aspects of the schema.

Documentation Documentation is an important, if often ignored, aspect of software development. Good documentation should establish the purpose of structural data, examples, and the source of the data. Good documentation will allow others to understand the XML without ambiguity.

Use recognised standards Although there are many circumstances where recognised schemas are insufficient for the required task, the designer should investigate relevant standards and attempt to merge their own bespoke solution with the various standard. In the long-term this will have several benefits:

The project can take advantage of existing knowledge in the field, allowing them to cover areas where they have limited or no experience.
Improve access to content by supporting proven standards, such as SVG.
The time required to map their data to alternative schemas used by other organisations will be reduced significantly.

TEI, Dublin Core and others provide cross-subject metadata elements that can be combined with subject specific languages.

Provide mappings to recognised standards Through the creation of different mappings the developer will standardise and enhance their approach to schema creation, removing potential ambiguities and other problems that may arise. In an organisational standpoint, the mappings will also allow improved relations between cooperating organisations and diversify the options available to use information in new ways.

Follow implementation conventions In addition to implementing recognised standards, it is important that the developer follow existing rules to construct existing elements. In varying circumstances this will involve the use of an existing data dictionary, an examination of XML naming rules. Controlled languages (for example, RDF, SMIL, MathML and SVG) use these conventions to implement specific localised knowledge.

Further Information

Dublin Core Metadata Initiative,
< http://dublincore.org/>
TEI Home site,
<http://www.tei-c.org/>

Briefing 68

Techniques To Assist The Location And Retrieval Of Local Images

Summary

Use of a consistent naming scheme and directory structure, as well as controlled vocabulary or thesaurus improve the likelihood that digitised content captured by many people over an extended period will be organized in a consistent manner that avoid ambiguity and can be quickly located.

This QA paper describes techniques to aid the storage and successful location of digital images.

Storing local images

Effective categorization of images stored on a local drive can be equally as important as storing them in an image management system. Digitisation projects that involve the scanning and manipulating of a large number of images will benefit from a consistent approach to file naming and directory structure.

An effective naming convention should identify the categories that will aid the user when finding a specific file. To achieve this, the digitisers should ask themselves:

What type of information should be identified?
What is the most effective method of describing this information in shorthand?

This can be better described with an example. A digitisation project is capturing photographs taken during wartime Britain. They have identified location, year and photographer as search criteria for locating images. To organize this information in a consistent manner the project team should establish a directory structure, common vocabulary and shorthand terms for describing specific locations. Figure 1 outlines a common description framework:

A sample naming convention

Potential Problems

To avoid problems that may occur when the image collection expands or is transferred to a different system, the naming convention should also take account the possibility that:

Some or all of this information may not be available (e.g. the year may be unknown)
Several photographs are likely to exist that possess the same criteria, same location, year and photographer.
Operating systems (OS) and Content Management Systems (CMS) treat lower case, upper case, and filename spaces in a different manner. To maintain consistency, filenames should be written in lower case and spaces should be avoided or replaced with underscores.
Older operating systems or filing systems (e.g. ISO 9660) use the 8.3 DOS filename restrictions, which may cause problems when accessing these files.
Some characters are illegal on different operating systems. Mac OS cannot use a colon in a filename, while DOS/Windows identifies ?[]/\=+<>:;", as illegal.

Naming conventions will allow the project to avoid the majority of these problems. For example, a placeholder may be chosen if one of the identifiers is unknown (e.g. 'ukn' for unknown location, 9999 for year). Special care should be taken to ensure this placeholder is not easily mistaken for a known location or date. Additional criteria, such as other photo attributes or a numbering system, may also be used to distinguish images taken by the same person, in the same year, at the same location.

Identification of Digital Derivatives

Digital derivatives (i.e. images that have been altered in some way and saved under a different name) introduce further complications in how you distinguish the original from the altered version. This will vary according to the type of changes made. On a simple level, you may simply choose a different file extension or store files in two different directories (Original and modified). Alternatively you may append additional criteria onto the filename (e.g. _sm for smaller images or thumbnails, _orig and _modif for original and modified).

Further Information

FILTER - Focusing Images for Learning and Teaching,
<http://www.filter.ac.uk/>
MacWindows Tutorial, John Rizzo, 2001,
<http://www.macwindows.com/tutfiles.html>
Controlling your language - links to metadata vocabularies, TASI,
<http://www.tasi.ac.uk/resources/vocabs.html>
File Naming, TASI
<http://www.tasi.ac.uk/advice/creating/filenaming.html>

Briefing 69

QA In The Construction Of A TEI Header

Background

Since the TEI header is still a relatively recent development, there has been a lack of clear guidelines as to its implementation; with the result that metadata has tended to be poor and sometimes erroneous. The implementation of a standard approach to metadata will improve the quality of data and increase the likelihood of locating relevant information.

Structure of a TEI header

The TEI header has a well-defined structure that may provide information analogous to that of a title page for printed text. The <teiHeader> element contains four major components:

FileDesc: The mandatory <fileDesc> element contains a full bibliographic description of an electronic file.
EncodingDesc: The <encodingDesc> element details the relationship between the electronic text and the source (or sources) from which it was derived. Its use is highly recommended.
ProfileDesc: The <profileDesc> element provides a detailed description of any non-bibliographic aspects of a text. Specifically the languages and sublanguages used, the situation in which it was produced, or the participants and their setting.
RevisionDesc: The <revisionDesc> element provides a change log in which each change made to a text may be recorded. The log may be recorded as a sequence of <change> elements each of which contains a corpus or collection of texts, that share many characteristics, or you may use one header for the corpus and individual headers for each component of the corpus.

A corpus or collection of texts, which share many characteristics, may have one header for the corpus and individual headers for each component of the corpus. In this case the type attribute indicates the type of header. For example, <teiHeader type-"corpus"> indicates the header for corpus-level information.

Some of the header elements contain running prose that consists of one or more <p>s. Others are grouped:

Elements whose names end in S:nu (for statement) usually enclose a group of elements recording some structured information.
Elements whose names end in Decl (for declaration) enclose information about specific encoding practices.
Elements whose names end in Desc (for description) contain a prose description.

What Standards Should I Conform To?

The cataloguer should observe the Anglo-american cataloguing rules 2nd ed. (rev), AACR2, and the international standard bibliographic for electronic resources, ISBD (ER) when creating new headers. AACR2 is used in the Source Description of the header, which is primarily concerned with printed material, whereas ISBD (ER) is used more heavily in the rest of the File Description in which the electronic file is being described.

Further Information

Documenting resources at the Oxford Text Archive, 2002, Colley, G and Anton S. Webb. 2nd Rev. ed.
The Electronic Title Page, Text Encoding Initiative, 2002,
<http://www.tei-c.org/Lite/U5-header.html>

Briefing 70

Establishing a Digital Repository

Background

Digital repositories are often thought of primarily as a computer system, consisting of hardware, software and networks, but they are more than this. Digital repositories are organisations similar in purpose to libraries or archives and, just as it does for these organisations, quality assurance should form an integral part of the work of a digital repository.

Repository Requirements

A digital repository should:

Operate according to documented policies and procedures that are monitored and can be externally assessed
Follow relevant standards and best practices in its work
Obtain sufficient control over its holdings to permit management and preservation actions such as storage, duplication, delivery to users and migration
Operate within a sustainable organisational setting that includes appropriate legal and financial mandates
Provide information to its users in a suitable form that they can understand and use.

A digital repository intent on long-term retention of its holdings should conform to the Reference Model for an Open Archival Information System (OAIS) [1].

Useful information is available in the QA Focus Briefing papers on "From Project To Production Service" [2] and "Planning An End User Service" [3].

Collections Management Policy and Procedures

Quality assurance can be incorporated into the work of a digital repository through the establishment of formal (but not necessarily complex) policies and procedures.

The CEDARS project suggested that collections management policies should cover: selection, acquisition, organisation, storage, access (user registration and authentication, delivery of masters versions), de-selection, and preservation. Policies developed to cover these topics should be subject to internal and external review as part of a formal approval process. Policies should be reviewed at regular intervals.

Policies should be written to conform to the requirements of relevant legislation, notably the Data Protection Act, 1998.

The day-to-day operation of the repository should be connected to its overall policy framework through the development of procedures. Procedures should be:

Consistent with repository policy
Tested by staff who will be responsible for implementing them
Checked for clarity and comprehensiveness by staff who are not familiar with the topic of the procedure Standards and Best Practices

Digital repositories need to make use of a wide range of standards and best practices for data creation, metadata creation, data storage, transmission and for many other areas. Many of these topics are discussed in more detail in other QA Focus documents. Selection of technical standards should take particular account of the guidance in QA Focus briefing papers on "Matrix for Selection of Standards" [4] and "Top Tips For Selecting Open Source Software" [5].

Rights and Responsibilities

A digital repository should operate within a clear legal framework that establishes the rights and responsibilities of the repository, its depositors and its users. A formal agreement should be established between each depositor and the repository, by way of a signed licence form or other technique (unavoidable online licence agreement).

This agreement should limit the liability of the repository (e.g. where a depositor does not have copyright), while conferring the repository with rights to manage or withdraw content. Otherwise, the depositor's rights should be protected and any limits on the service provided by the repository should be made clear (such as limits on how long data will be stored, and whether migration or other preservation actions will be undertaken).

The depositor agreement should be checked by an appropriate legal service.
The repository should establish who is legally entitled to sign such agreements for the repository.

References

Reference Model for an Open Archival Information System (OAIS), Consultative Committee for Space Data Systems, January 2002
<http://ssdoo.gsfc.nasa.gov/nost/wwwclassic/documents/pdf/CCSDS-650.0-B-1.pdf>
From Project To Production Service, QA Focus,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-38/>
Planning An End User Service, QA Focus,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-39/>
Matrix for Selection of Standards, QA Focus,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-31/>
Top Tips For Selecting Open Source Software, QA Focus,
http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-60/>

Further Information

CEDARS: Curl Exemplars in Digital Archives
<http://www.leeds.ac.uk/cedars/>
SPARC Institutional Repository Checklist & Resource Guide, Crow, R. 2002
<http://www.arl.org/sparc/IR/IR_Guide.html >
Trusted Digital Repositories: Attributes and Responsibilities, RLG, May 2002
<http://www.rlg.org/longterm/repositories.pdf>

Briefing 71

QA Techniques For The Storage Of Image Metadata

Background

The archival of digital images requires the consideration of the most effective method of storing technical and life cycle information. Metadata is a common method used to describe digital resources, however the different approaches may confuse many users.

This paper describes QA techniques for choosing a suitable method of metadata storage that takes into account the need for interoperability and retrieval.

Choosing a Suitable Metadata Association Model

Metadata may be associated with an image in three ways:

Internal Model:: Metadata is stored within the image file itself, either through an existing metadata mapping or attached to the end of an image file in an ad hoc manner. Therefore, it is simple to transfer metadata alongside image data without special requirements or considerations. However, support for a metadata structure differs between file formats and assignment of the same metadata record to multiple images causes inefficient duplication in comparison to a single metadata record associated with a group of images.
External Model:: A unique identifier is used to associate external metadata with an image file e.g. an image may be stored on a local machine while the metadata is stored on a server. This is better suited to a repository and is more efficient when storing duplicate information on a large number of objects. However, broken links may occur if the metadata record is not modified when an image is moved, or visa versa. Intellectual Property data and other information may be lost as a result.
Hybrid Model:: Uses both internal and externally associated metadata. Some metadata (file headers/tags) are stored directly in the image file while additional workflow metadata is stored in an external database. The deliberate design of external record offers a common application profile between file formats and provides a method of incorporating format-specific metadata into the image file itself. However, it shares the disadvantages of internal & external models in terms of duplication and broken links.

When considering the storage of image metadata, the designer should consider three questions:

What type of metadata do you wish to store?
Is the file format capable of storing metadata?
What environment is the metadata intended to be stored and used within?

The answer to these questions should guide the choice of the metadata storage model. Some file formats are not designed to store metadata and will require supplementation through the external model; other formats may not store data in sufficient detail for your requirements (e.g. lifecycle data). Alternatively, you may require IP (Intellectual Property) data to be stored internally, which will require a file format that supports these elements.

Ensuring Interoperability

Metadata is intended for the storage and retrieval of essential information regarding the image. In many circumstances, it is not possible to store internal metadata in a format that may be read by different applications. This may be for a number of reasons:

The file format does not define metadata placeholders (e.g. BMP), or does not use a metadata profile that the application uses.
A standard image metadata definition and interchange format model does not exist (e.g. JPEG). As a result, the storage mechanism and metadata structure must be defined by each application.
The metadata is stored in a proprietary file format that is not publicly defined.

Before choosing a specific image format, you should ensure the repository software is able to extract metadata and that editing software does not corrupt the data if changes are made at a later date. To increase the likelihood of this, you should take one of the following approaches:

Convert image data to a file format that supports a known metadata structure (e.g. Exif, TIFF, SPIFF and Flashpix).
Use a vendor-neutral, and technology-independent, well-documented metadata standard, preferably one written in XML (e.g. DIG35, Z39.87 & MIX).
Investigate the solutions offered by the DIG35 [1] and the FILTER [2] projects, which are developing a set of templates for consistent description of images.

Although this will not guarantee interoperability, these measures will increase the likelihood that it may be achieved.

Structuring Your Image Collection

To organise your image collection into a defined structure, it is advisable to develop a controlled vocabulary. If providing an online resource, it is useful to identify your potential users, the academic discipline from which they originate, and the language they will use to locate images. Many repositories have a well-defined user community (archaeology, physics, sociology) that share a common language and similar goals. In a multi-discipline collection it is much more difficult to predict the terms a user will use to locate images. The US Library of Congress [3], the New Zealand Time Frames [4] and International Press Telecommunications Council (IPTC) [5] provide online examples of how a controlled vocabulary hierarchy may be used to catalogue images.

References

DIG35 Specification: Metadata for Digital Images, Version 1.0, August 30, 2000,
<http://xml.coverpages.org/FU-Berlin-DIG35-v10-Sept00.pdf>
FILTER,
<http://www.filter.ac.uk/>
Library of Congress Thesauri,
<http://www.loc.gov/lexico/servlet/lexico/>
New Zealand Time Frames, New Zealand National Library
<http://timeframes1.natlib.govt.nz/nlnz-browse>
International Press Telecommunications Council
<http://www.iptc.org/>

Briefing 72

Using The QA For Web Toolkit

About The QA Focus Toolkits

The QA Focus Toolkits are an online resource which can be used as a checklist to ensure that your project or service has addressed key areas which can help to ensure that your deliverables are fit for its intended purpose, widely accessible and interoperable and can be easily repurposed.

The QA For Web Toolkit is one of several toolkits which have been developed by the QA Focus project to support JISC's digital library programmes. This toolkit addresses compliance with standards and best practices for Web resources.

Accessing The QA For Web Toolkit

The QA For Web Toolkit is available from <http://www.ukoln.ac.uk/qa-focus/toolkit/>. The toolkit is illustrated in Figure 1:

Figure 1: The QA For Web Toolkit

Coverage

The toolkit addresses the following key areas

HTML compliance of a page
HTML compliance of a page and pages beneath it
CSS compliance of a page
Checking links on a page
Checking links on a page and pages beneath it
Checking HTTP for a page
Dublin Core metadata which may be embedded within a HTML page

Embedding The Toolkit In Your Work

The toolkit can provide access to a set of online checking services.

You should seek to ensure that systematic checking is embedded within your work. If you simply make occasional use of such tools you may fail to spot significant errors. Ideally you will develop a systematic set of workflow procedures which will ensure that appropriate checks are carried out consistently.

You should also seek to ensure that you implement systematic checks in areas in which automated tools are not appropriate or available.

You may wish to use the results you have found for audit trails of compliance of resources on your Web site.

About The QA For Web Toolkit Resource

The QA For Web Toolkit described in this document provides a single interface to several online checking services hosted elsewhere. The QA Focus project and its host organisations (UKOLN and AHDS) have no control over the remote online checking services. We cannot guarantee that the remote services will continue to be available.

Further Information

Further toolkits are available at <http://www.ukoln.ac.uk/qa-focus/toolkit/>

Briefing 73

Using The QA For Metadata Toolkit

About The QA Focus Toolkits

The QA For Metadata Toolkit is an online resources which can be used as a checklist to ensure that your project or service has addressed key areas which can help to ensure that the metadata you use in your service is fit for its intended purpose and will ensure that your application will be interoperable.

The QA For Metadata Toolkit is one of several toolkits which have been developed by the QA Focus project to support JISC's digital library programmes.

Accessing The QA For Metadata Toolkit

The QA For Metadata Toolkit is available from <http://www.ukoln.ac.uk/qa-focus/toolkit/>. The toolkit is illustrated in Figure 1:

Figure 1: The QA For Metadata Toolkit

Coverage

The toolkit addresses the following key areas

The purpose of your metadata
The availability of cataloguing rules
The technical architecture for creating and managing your metadata
The approaches for checking the content of your metadata
The approaches for checking the syntax of your metadata
The interoperability of your metadata
Training and development for staff involved in creating and managing your metadata
The provision of documented policies
The provision of systematic procedures which ensure that the policies are complied with

Embedding The Toolkit In Your Project Activities

The toolkit can provide access to a set of online checking services.

The toolkit can provide a simple checklist for ensuring that your project has addressed key areas in the development and deployment of metadata. As well as providing an aide memoire for projects the toolkit may also be useful in a more formal context. For example the answers could be used in initial scoping work at the early stages of a project or in reports to the project funders. In addition answers to the issues raised may be helpful for other potential users of the metadata or the final service provider of the project deliverables.

About The QA For Metadata Toolkit Resource

Further Information

Further toolkits are available at <http://www.ukoln.ac.uk/qa-focus/toolkit/>

Briefing 74

Improving The Quality Of Digitised Images

Summary

A digitised image requires careful preparation before it is suitable for distribution. This document describes a workflow for improving the quality of scanned images by correcting faults and avoiding common errors.

Preparing your master image

The sequence in which modifications are made will have a significant contribution to the quality of the final image. Although conformance to a strict sequence is not always necessary, inconsistencies may be introduced if the order varies dramatically between images. The Technical Advisory Service for Images (TASI) recommends the following order:

Does the image require rotation or cropping?
In many circumstances, the digitiser will not require the entire image. Cropping an image to a specific size, shape or orientation will reduce the time required for the computer to manipulate the image and prioritise errors to those considered important.
Are shades and colours difficult to distinguish?
Scanners and digital cameras often group colours into a specific density range. This makes it difficult to differentiate shades of the same colour. Use the Histogram function with Photoshop (or other software) and adjust the different levels to best use the range of available tones.
Is the colour balance accurate in comparison to the original?
Some colours may change when digitised, e.g. bright orange may change to pink. Adjust the colour balance by modifying the Red, Green & Blue settings. Decreasing one colour increases its opposite.
Are there faults or artefacts on the image?
Visual checks should be performed on each image, or a selection of images, to identify faults, such as dust specks or scratches on the image.

Once you are satisfied with the results, the master image should be saved in a lossless image format - RGB Baseline TIFF Rev 6 or PNG are acceptable for this purpose.

Improving image quality

Subsequent improvements by resizing or sharpening the image should be performed on a derivative.

Store work-in-progress images in a lossless format
Digitisers often get into the habit of making modifications to a derivative image saved in a 'lossy' format, i.e. a format that simplifies detail to reduce file size. This is considered bad practice, will reduce quality and cause compression 'artefacts' to appear over subsequent edits. When repeatedly altering an image it is advisable to save the image in a lossless format (e.g. TIFF, PNG) until the image is ready for dissemination. Once all changes have been made it can be output in a lossy format.
Filter the image
Digitised images often appear 'noisy' or contain dust and scratches. Professional graphic manipulation (Photoshop, PaintShop Pro, etc.) possesses graphic processors that can be useful in removing these effects. Common filters include 'Despeckle' that subtly blurs an image to reduce the amount of 'noise' in an image and 'median' that blends the brightness of pixels and discards pixels that are radically different from adjacent pixels.
Remove distracting effect
If you are funded to digitise printed works, moiré (pronounced more-ray) effects may be a problem. Magazine or newspaper illustrations that print an image as thousands of small coloured dots produce a noticeable repeating pattern when scanned. Blur effects, such as the Gaussian blur, are an effective method of reducing noticeable moiré effects, however these also reduce image quality. Resizing the image is also an effective strategy that forces the image-processing tool to re-interpolate colours, which will soften the image slightly. Although these effects will degrade image to an extent, the results are often better than a moiré.

Further Information

Image Manipulation and Preparation, TASI,
<http://www.tasi.ac.uk/advice/using/dimpmanipulation.html>
Using Raster Images, QA Focus briefing document,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-28/>
Digital Imaging Basics, TASI,
<http://www.tasi.ac.uk/advice/using/basics.html>

Briefing 75

Digitisation Of Still Images Using A Flat-Bed Scanner

Preparing For A Large-Scale Digitisation Project

The key to the development of a successful digitisation project is to separate it into a series of stages. All projects planning to digitise documents should establish a set of guidelines to help ensure that the scanned images are complete, consistent and correct. This process should consider the proposed input and output of the project, and then find a method of moving from the first to the second.

This document provides preparatory guidance to consider when approaching the digitisation of many still images using a flatbed scanner.

Choose Appropriate Scanning Software

Before the digitisation process may begin, the digitiser requires suitable tools to scan & manipulate the image. It is possible to scan a graphic using any image processing software that supports TWAIN (an interface to connect to a scanner, digital camera, or other imaging device from within a software application), however the software package should be chosen carefully to ensure it is appropriate for the task. Possible criteria for measuring the suitability of image processing software include:

The ability to perform batch operations upon many different images
The ability to perform image processing upon an image.
The digitisers familiarity with the software package

A timesaving may be found by utilizing a common application, such as Adobe Photoshop, Paintshop Pro, or GIMP. For most purposes, these offer functionality that is rarely provided by editing software included with the scanner.

Check The Condition Of The Object To Be Scanned

Image distortion and dark shading at page edges are common problems encountered during the digitisation process, particularly when handling spine-bound books. To avoid these and similar issues, the digitiser should ensure that:

The document is uniformly flat against the document table.
The document is not accidentally moved during scanning.
The scanner is on a flat, stable surface.
The edges of the scanner are covered by paper to block external light, caused when the object does not lay completely flat against the scanner.

Scanning large objects that prevent the scanner lid being closed (e.g. a thick book) often causes discolouration or blurred graphics. Removing the spine will allow each page to be scanned individually, however this is not always an option (i.e. when handling valuable books). In these circumstances you should consider a planetary camera as an alternative scanning method.

Identification Of A Suitable Policy For Digitisation

It is often costly and time-consuming to rescan the image or improve the level of detail in an image at a later stage. Therefore, the digitiser should ensure that a consistent approach to digitisation is taken in the initial stages. This will include the choice of a suitable resolution, file format and filename scheme.

Establish a consistent quality threshold for scanned images

It is difficult to improve low quality scans at a later date. It is therefore important to digitise images at a at a slightly higher resolution (measured in pixels per inch) and scan type (24-bit or higher for colour, or 8-bit or higher for grey scale) than required and rescale the image at a later date.

Choose an appropriate image format

Before scanning the image, the digitiser should consider the file format in which it will be saved. RGB Baseline TIFF Rev 6 is the accepted format of master copies for archival and preservation (although PNG is a possible alternative file format). To preserve the quality, it is advisable to avoid compression where possible. If compression must be used (e.g. for storing data on CD-ROM), the compression format should be noted (Packbits, LZW, Huffman encoding, FAX-CCITT 3 or 4). This will avoid incompatibilities in certain image processing applications.

Data intended for dissemination should be stored in one of the more common image formats to ensure compatibility with older or limited browsers. JPEG (Joint Photographic Experts Group) is suitable for photographs, realistic scenes, or other images with subtle changes in tone, however its use of 'lossy' compression causes sharp lines or letterings are likely to become blurred. When modifying an image, the digitiser should return to the master TIFF image, make the appropriate changes and resave it as a JPEG.

Choose an appropriate filename scheme

Digitisation projects will benefit from a consistent approach to file naming and directory structure that allows images to be organized in a manner that avoids confusion and can be quickly located. An effective naming convention should identify the categories that will aid the user when finding a specific file. For example, the author, year it was created, thematic similarities, or other notable factors. The digitiser should also consider the possibility that multiple documents will have the same filename or may lack specific information and consider methods of resolving these problems. Guidance on this issue can be found in related QA Focus documents.

Further Information

Acquiring Images, MicroFrontier Online
<http://www.microfrontier.com/support_area/faq031/qa002.html>
Scanner Quality Problems, Epson
<http://support2.epson.net/manuals/english/scanner/perfection2450photo/REF_G/TRBLE_6.HTM>
TWAIN Standard, Twain
<http://www.twain.org/>
Resolving the Units of Resolution, TASI,
<http://www.tasi.ac.uk/advice/creating/dpi.html>
A Rough Guide to Image Sizes, TASI,
<http://www.tasi.ac.uk/advice/creating/roughguide.html>
Generic Image Workflow: TASI Recommended Best Practice for Digitisation Projects, TASI,
<http://www.tasi.ac.uk/advice/managing/workflow_generic.html>
The Book Scanning and Digisation Process
<http://www.rod-neep.co.uk/books/production/scan/>

Briefing 76

Choosing A Suitable Digital Watermark

Summary

Watermarking is an effective technology that solves many problems within a digitisation project. By embedding Intellectual Property data (e.g. the creator, licence model, creation date or other copyright information) within the digital object, the digitiser can demonstrate they are the creator and disseminate this information with every copy, even when the digital object has been uploaded to a third party site. It can also be used to determine if a work has been tampered with or copied.

This paper describes methods for establishing if a project requires watermarking techniques and criteria for choosing the most suitable type.

Purpose Of A Watermark

Before implementing watermarking within your workflow, you should consider its proposed purpose. Are you creating watermarks to indicate your copyright, using it as a method of authentication to establish if the content has been modified, or doing so because everyone else has a watermarking policy? The creation of a watermark requires significant thought and modification to the project workflow that may be unnecessary if you do not have a specific reason for implementing it.

For most projects, digital watermarks are an effective method of identifying the copyright holder. Identification of copyright is encouraged, particularly when the work makes a significant contribution to the field. However, the capabilities of watermarks should not be overstated. It is useful in identifying copyright, but is incapable of preventing use of copyrighted works. The watermark may be ignored or, given sufficient time and effort, removed entirely from the image. If the intent is to restrict content reuse, a watermark may not be the most effective strategy.

Required Attributes Of A Watermark

To assist the choice of a watermark, the project team should identify the required attributes of a watermark by answering two questions:

To whom do I wish to identify my copyright?
What characteristics do I wish the watermark to possess?

The answer to the first question is influenced by the skills and requirements of your target audience. If the copyright information is intended for non-technical and technical users, a visible watermark is the most appropriate. However, if the copyright information is intended for technical users only or the target audience is critical of visible watermarks (e.g. artist may criticise the watermark for impairing the original image), an invisible watermark may be the best option.

To answer the second question, the project team should consider the purpose of the watermark. If the intent is to use it as an authentication method (i.e. establish if any attempt to modify the content has been made), a fragile watermark will be a valued attribute. A fragile watermark is less robust towards modifications where even small change of the content will destroy embedded information. In contrast, if the aim is to reflect the owner's copyright, a more robust watermark may be preferential. This will ensure that copyright information is not lost if an image is altered (through cropping, skewing, warp rotation, or smoothing of an image).

Choosing A Resilient Watermark

If resilience is a required attribute of a digital watermark, the project team has two options: invisible or visible watermark. Each option has different considerations that make it suitable for specific purposes.

Invisible Watermarks
Invisible watermarks operate by embedding copyright information within the image itself. As a rule, watermarks that are less visible are weaker and easier to remove. When choosing a variant it is important to consider the interaction between watermark invisibility and resilience. Some examples are shown in Table 1:

Name	Description	Resilience
Bit-wise	Makes minor alterations to the spatial relation of an image	Weak
Noise Insertion	Embed watermark within image noise	Weak
Masking and filtering	Similar to paper watermarks on a bank note, it provides a subtle, though recognisable evidence of a watermark.	Strong
Transform domain	Uses dithering, luminance, or lossy techniques (similar to JPEG compression) on the entire or section of an image.	Strong

Table 1: Indication of resilience for invisible watermarks

'Bit-wise' & 'noise insertion' may be desirable if the purpose is to determine whether the medium has been altered. In contrast, 'transform domain' and 'masking' techniques are highly integrated into the image and therefore more robust to deliberate or accidental removal (caused by compression, cropping, and image processing techniques) in which significant bits are changed. However, these are often noticeable to the naked eye.

Visible Watermarks
A bird A visible watermark is more resilient and may be used to immediately identify copyright without significant effort by the user. However, these are, by design, more intrusive to the media. When creating a visible watermark, the project team should consider its placement. Projects funded with public money should be particularly conscious that the copyright notice does not interfere with the purpose of the project. A balance should be reached between the need to make the watermark difficult to remove and its use to the user.

Both watermarks make them suitable for specific situations. If handling a small image collection, it may be feasible (in terms of time and effort) to use both watermarks as a redundant protection measure - in the event that one is removed, the second is likely to remain.

Information Stored within the Watermark

If the project is using a watermark to establish their copyright, some thought should be made on the static information you wish to provide. For example:

Creator: The forename and surname of the person who created the image, either as full text or their initials.
Organisation: The project or organisation that holds copyright for the work.
Creation date: The date of creation, either the exact date (e.g. 24/03/2004) or year (2004)
Identifiers: A unique identifier to distinguish the image, distributor, creator, transaction, and other attributes.

Some content management systems are also able to generate dynamic watermarks and embed them within the image. This may record the file information (file format, image dimensions, etc.) and details about the download transaction (transaction identifier, download date, etc.). This may be useful for tracking usage, but may annoy the user if the data is visible.

Implementing Watermarks in the Project Workflow

To avoid unnecessary corruption of a watermark by the digitiser/creator themselves, the watermark creation process should be delayed until the final steps of the digitisation workflow. Watermarks can be easily removed when the digitiser is modifying the image in any way (e.g. through cropping, skewing, adjustment of the RGB settings, or through use of lossy compression). If an image is processed to the degree that the watermark cannot be recognized, then reconstruction of the image properties may be possible through the use of an original image.

Further Information

DigiMarc,
<http://www.digimarc.com/>
Digital Image Archive User Issues: Digital Watermarking, TASI,
<http://www.tasi.ac.uk/advice/using/uissues.html#ui6>
Paint Shop Pro 5 - Watermarking Photographs,
<http://www.grafx-design.com/24psp.html>

Briefing 77

An Introduction To RSS And News Feeds

Background

RSS is increasingly being used to provide news services and for syndication of content. The document provides a brief description of RSS news feed technologies which can be used as part of a communications strategy by projects and within institutions. The document summarises the main challenges to be faced when considering deployment of news feeds.

What Are News Feeds?

News feeds are an example of automated syndication. News feed technologies allow information to be automatically provided and updated on Web sites, emailed to users, etc. As the name implies news feeds are normally used to provide news; however the technology can be used to syndicate a wide range of information.

Standards for News Feeds

The BBC ticker [1] is an example of a news feed application. A major limitation with this approach is that the ticker can only be used with information provided by the BBC.

The RSS standard was developed as an open standard for news syndication, allowing applications to display news supplied by any RSS provider.

RSS is a lightweight XML application (see RSS fragment). Ironically the RSS standard proved so popular that it led to two different approaches to its standardisation. So RSS now stands for RDF Site Summary and Really Simple Syndication (in addition to the original phrase Rich Site Summary).

<title>BBC News</title>
<url>http://news.bbc.co.uk/nol/shared/img/bbc_news_120x60.gif</url>
<link>http://news.bbc.co.uk/</link>
<item>
<title>Legal challenge to ban on hunting</title>
<description>The Countryside Alliance prepares a legal challenge to Parliament Act ... </description>
<link>http://news.bbc.co.uk/go/click/rss/0.91/public/-/1/hi/... </link>.

Figure 1: Example Of An RSS File

Despite this confusion, in practice many RSS viewers will display both versions of RSS (and the emerging new standard, Atom).

News Feeds Readers

There are a large number of RSS reader software applications available [2] and several different models. An example of a scrolling RSS ticker is also shown above [3]. RSSxpress [4] (illustrated below) is an example of a Web-based reader which embeds an RSS feed in a Web page.

RSSxpress

In addition to these two approaches, RSS readers are available with an email-style approach for the Opera Web browser [5] and Outlook [6] and as extensions for Web browsers [7] [8].

Creating News Feeds

There are several approaches to the creation of RSS news feeds. Software such as RSSxpress can also be used to create and edit RSS files. In addition there are a number of dedicated RSS authoring tools, including standalone applications and browser extensions (see [9]). However a better approach may be to generate RSS and HTML files using a CMS or to transform between RSS and HTML using languages such as XSLT.

Issues

Issues which need to be addressed when considering use of RSS include:

The architecture for reading and creating RSS feeds
The procedures needed in order to guarantee the quality of the news feed content
How news feeds fits in with your organisation's communications strategy

Further Information

Desktop Ticker, BBC,
<http://news.bbc.co.uk/1/hi/help/3223354.stm>
RSS Readers, Weblogs Compendium,
<http://www.lights.com/weblogs/rss.html>
RSSxpress, UKOLN
<http://rssxpress.ukoln.ac.uk/>
ENewsBar
<http://www.enewsbar.com/>
RSS Newsfeeds In Opera Mail, Opera
<http://www.opera.com/products/desktop/m2/rss/>
Read RSS In Outlook, intraVnews,
<http://www.intravnews.com/>
RSS Extension for Firefox, Sage,
<http://sage.mozdev.org/>
RSS Reader, Pluck,
<http://www.pluck.com/product/rssreader.aspx>
Web / Authoring / Languages / XML / RSS, Webreference.com,
<http://www.webreference.com/authoring/languages/xml/rss/>

Briefing 78

An Introduction To Wikis

Background

Wiki technologies are increasingly being used to support development work across distributed teams. This document aims to give a brief description of Wikis and to summarise the main challenges to be faced when considering the deployment of Wiki technologies.

What is A Wiki?

A Wiki or wiki (pronounced "wicky" or "weekee") is a Web site (or other hypertext document collection) that allows a user to add content. The term Wiki can also refer to the collaborative software used to create such a Web site [1].

The key characteristics of typical Wikis are:

The ability to create and edit content within a Web environment without the need to download any special software.
The use of a simple markup language which is designed to simplify the process of creating and editing documents.
The ability to easily create and edit content, often without the need for special privileges.

Wikipedia - The Largest Wiki

The Wikipedia is the largest and best-known Wiki - see <http://www.wikipedia.org/>.

Wikipedia

The Wikipedia provides a good example of a community Wiki in which content is provided by contributors around the world.

The Wikipedia appears to have succeeded in providing an environment and culture which has minimised the dangers of misuse. Details of the approaches taken on the Wikipedia are given on the Wikipedia Web site [2].

What Can Wikis Be Used For?

Wikis can be used for a number of purposes:

On public Web sites to enable end users to easily contribute information.
In teaching. Wikis can provide an opportunity to learn about team working, trust, etc. A good example is provided by Queen's University Belfast [3].
By researchers. Wikis are by Web researchers to make it easier to develop collaborative documents e.g. the FOAF Wiki [4].
On Intranets, where departmental administrators with minimal HTML experience may be able to manage departmental content.
Wikis can be used at events for note-taking e.g. in discussion groups [5].

Wikis - The Pros And Cons

As described in [6] advantages of Wikis may include:

No need to install HTML authoring tools.
Minimal training may be needed.
Can help develop a culture of sharing and working together (cf. open source).
Useful for joint working when there are agreed shared goals.

Disadvantages of Wikis include:

The success of the Wikipedia may not necessarily be replicated elsewhere.
There is not (yet) a standard lightweight Wiki markup language.
A collaborative Wiki may suffer from a lack of a strong vision or leadership.
Can be ineffective when there is a lack of consensus.
There may be copyright and other legal issues regarding collaborative content.
It may be difficult for Wikis to gain momentum.

Further Information

Wiki, Wikipedia,
<http://en.wikipedia.org/wiki/Wiki>
Wikimedia principles, Wikimedia,
<http://meta.wikimedia.org/wiki/Wikimedia_principles>
IT and Society Wiki, Queen's University Belfast
<http://itsoc.mgt.qub.ac.uk/ITandSociety>
FOAF Wiki, FoafProject,
<http://rdfweb.org/topic/FoafProject>
Experiences of Using a Wiki for Note-taking at a Workshop, B. Kelly, Ariadne 42, Jan 2005,
<http://www.ariadne.ac.uk/issue42/web-focus/>
, E. Tonkin, Ariadne 42, Jan 2005,
<http://www.ariadne.ac.uk/issue42/tonkin/>

Briefing 79

An Introduction To Audio And Video Communication Tools

Background

Audio and video applications are being increasingly used to support project working across distributed project teams. This document aims to give a brief description of audio and video tools which can be used to support such collaborative work within our institutions and to summarise the main challenges to be faced when considering their deployment across organisations.

The Potential For Audio And Video Tools

The growth in broadband is leading to renewed interest in audio and video-conferencing systems. In the past such services often required use of specialist hardware and software. However tools are now being developed for home use. This briefing document explores some of the issues concerning use of such technologies within an institution.

An Example Of An Audio Tool

Skype The Skype Internet telephony system [1] is growing in popularity. Skype is popular because it can provide free calls to other Skype users. In addition Skype has potential for use in an academic context:

High quality sound is often experienced between broadband users.
It can provide cheap calls.
It can be used to set up conferences calls
It is integrated with instant messaging.
It can be used to provide a sense of presence - e.g. remote users can participate in seminars, etc.
It can provide accessibility benefits.
" Video is supported in the latest version of the software.

It should be noted, however, that Skype is a proprietary application and concerns over its use have been raised.

Examples Of Video Tools

Instant Messaging clients such as MSN Messenger [2] also provide audio and video capabilities. Such tools can raise expectations of student users who may wish to use such tools for their own use.

It should be noted, however, that there are interoperability problems with such tools (e.g. both users may need to be running the latest version of the MS Windows operating system). In addition the management of user IDs and setting up areas for group discussions may be issues.

An alternative approach is use of software such as VRVS [3], an Access Grid application. This Web-based system provides managed access to virtual rooms, etc. VRVS is intended for use by GRID users and not be appropriate for certain uses. However it illustrates an alternative approach.

VRVS

Issues

Issues which need to be addressed when considering use of such tools include

Are such tools needed? How will it co-exist with services such as email?
How should the potentially disruptive aspect of audio tools be addressed?
If felt to be needed, in what areas?
What support infrastructure needs to be provided?
Are there any information management, technical, security, performance or other barriers to be addressed?

Further Information

Skype,
<http://www.skype.com/>
MSN Messenger, Microsoft,
<http://messenger.msn.com/>
Virtual Rooms, Virtual Meetings, A. Powell, Ariadne, issue 41, Oct 2004,
<http://www.ariadne.ac.uk/issue41/powell/>

Briefing 80

An Introduction To Persistent Identifiers

What are Persistent Identifiers?

An identifier is any label that allows us to find a resource. One of the best-known identifiers is the International Standard Book Number (ISBN), a unique ten-digit number assigned to books and other publications. On the Internet the most widely known identifier is the Uniform Resource Locator (URL), which allows users to find a resource by listing a protocol, domain name and, in many cases, file location.

A persistent identifier is, as the name suggests, an identifier that exists for a very long time. It should at the very least be globally unique and be used as a reference to the resource beyond the resource's lifetime. URLs, although useful, are not very persistent. They only provide a link to the resource's location at the moment in time they are cited, if the resource moves they no longer apply. The issue of 'linkrot' on the Internet (broken links to resources), along with the need for further interoperability has led to the search for more persistent identifiers for digital resources.

Principles for Persistent Identification

The International Digital Object Identifier (DOI) Foundation [1] states that there are two principles for persistent identification:

Assign an ID to a resource: Once assigned the number must identify the same resource beyond the lifetime of the resource or identifier.
Assign a resource to an ID: The resource should persistently continue to be the same thing.

Uniform Resource Identifiers

A Uniform Resource Identifier (URI) is the string that is used to identify anything on the Internet. URLs along with Uniform Resource Names (URNs) are both types of URI. A URN is a name with global scope and does not necessarily imply a location. A URN will include a Namespace Identifier (NID) Code and a Namespace Specific String (NSS). The NID specifies the identification system used (e.g. ISBN) and the NSS is local code that identifies a resource. For someone to find a resource using a URN they must use a resolver service.

Persistent URLs

Persistent URLs (PURLs) [2] have been developed by the Online Computer Library Centre (OCLC) as an interim measure for Internet resources until the URN framework is well established. A PURL is functionally a URL, but rather than pointing at a location points at a resolution service, which redirects the user to the appropriate URL. If the URL changes it just needs to be amended in the PURL resolution service

Example: http://purl.oclc.org/OCLC/PURL/summary
This is made up of the protocol (http), the resolver address (http://purl.oclc.org/) and the user-assigned name (OCLC/PURL/summary).

Digital Object Identifiers

The Digital Object Identifier (DOI) system was initiated by the Association of American publishers in an attempt to assist the publishing community with copyright and electronic commerce. DOIs are described by the International DOI Foundation, who manage them, as persistent, interoperable, actionable identifiers. They are persistent because they identify an object as a first-class entity (not just the location), they are interoperable because they are designed with the future in mind and they are actionable because they allow a user to locate a resource by resolution using the Handle System. The Handle System, developed by the Corporation for National Research Initiatives (CNRI) includes protocols that enable a distributed computer system to store handles of digital resources and resolve them into a location. DOIs can be assigned by a Registration Agency (RA), which provides services for a specific user community and may charge fees. The main RA for the publishing community is CrossRef [3].

Example: 10.1000/123456
This is made up of the prefix (10.1000) which is the string assigned to an organisation that registering DOIs and the suffix (123456) which is a unique (to a given prefix) alphanumeric string, which could be an existing identifier.

Using Persistent Identifiers

While DOIs hold great potential for helping many information communities enhance interoperability they have yet to reach full maturity. There are still many unresolved issues, such as their resolution (how users use them in to receive a Web page), registration of the DOI system, the persistence of the International DOI Foundation as an organisation and what exactly their advantages are over handles or PURLs. Until these matters are resolved they will remain little more than a good idea for most communities.

However the concept of persistent identifiers is still imperative to a working Internet. While effort is put into finding the best approach there is much that those creating Web pages can do to ensure that their URIs are persistent. In 1998 Tim Berners-Lee coined the phrase Cool URIs to describe URIs which do not change. His article explains the methods a Webmaster would use to design a URI that will stand the test of time. As Berners-Lee elucidates "URIs don't change: people change them." [4].

References

International DOI Foundation,
<http://doi.org/>
PURL,
<http://purl.org/>
CrossRef,
<http://www.crossref.org/>
Cool URIs Don't Change, W3C,
<http://www.w3.org/Provider/Style/URI.html>

Briefing 81

An Introduction To Folksonomies

What is a Folksonomy?

A folksonomy is a decentralised, social approach to creating metadata for digital resources. It is usually created by a group of individuals, typically the resource users, who add natural language tags to online items, such as images, videos, bookmarks and text. These tags are then shared and sometimes refined. Folksonomies can be divided into broad folksonomies, when lots of users tag one object, and narrow folksonomies, when a small number of users tag individual items. This new social approach to creating online metadata has sparked much discussion in the cataloguing world.

Note that despite its name a folksonomy is not a taxonomy. A taxonomy is the process, within subject-based classification, of arranging the terms given in a controlled vocabulary into a hierarchy. Folksonomies move away from the hierarchical approach to an approach more akin to that taken by faceted classification or other flat systems.

The History of Folksonomies

With the rise of the Internet and increased use of digital networks it has become easier to both work in an informal and adhoc manner, and as part of a community. In the late 1990s Weblogs (or blogs), a Web application similar to an online diary, became popular and user centred metadata was first created. In late 2003 delicious, an online bookmark manager, went live. The ability to add tags using a non-hierarchical keyword categorisation system was appended in early 2004.Tagging was quickly replicated by other social software and in late 2004 the Folksonomy name, a portmanteau of folk and taxonomy, was coined by Thomas Vander Wal.

Strengths and Weaknesses of Folksonomies

Robin Good is quoted as saying that "a folksonomy represents simultaneously some of the best and worst in the organization of information." There is clearly a lot to be learnt from this new method of classification as long as you remain aware of the strengths and weaknesses.

Strengths

Serendipity: Folksonomies at this point in time are more about browsing than finding and a great deal of useful information can be found in this way.
Cheap and extendable: Folksonomies are created by users. This makes them relatively cheap and highly scalable, unlike more formal methods of adding metadata. Often users find that it is not a case of 'folksonomy or professional classification' but 'folksonomy or nothing'.
Community: The key to folksonomies success is community and feedback. The metadata creation process is quick and responsive to user needs, new words can become well used in days. If studied they can allow more formal classification systems to emerge and demonstrate clear desire lines (the paths users will want to follow).

Weaknesses

Imprecision of terms: Folksonomy terms are added by users which means that they can be ambiguous, overly personalised and imprecise. Some sites only allow single word metadata resulting in many compound terms, many tags are single use and at present there is little or no synonym control.
Searching: The uncontrolled set of terms created can mean that folksonomies may not support searching as well as services using controlled vocabularies.

The Future for Folksonomies

Over time users of the Internet have come to realise that old methods of categorisation do not sit comfortably in a digital space, where physical constraints no longer apply and there is a huge amount to be organised. Search services like Yahoo's directory, where items are divided into a hierarchy, often seem unwieldy and users appear happier with the Google search box approach. With the rise of communities on the Web there has also come about a feeling that meaning comes best from our common view of the world, rather than a professional's view.

While there is no doubt that the professional cataloguing will continues to have a place, both off the Internet and on, there has been recent acceptance that new ways of adding metadata, such as folksonomies, need more exploration, alongside other areas like the semantic Web. The two models of categorisation (formal and informal) are not mutually exclusive and further investigation could only help us improve the way we organise and search for information. If nothing else folksonomies have achieved the once believed unachievable task of getting people to talk about metadata!

Further Information

The following additional resources may be useful:

An Introduction To Creative Commons

What is a Creative Commons?

Creative Commons (CC) [1] refers to a movement started in 2001 by US lawyer Lawrence Lessig that aims to expand the collection of creative work available for others to build upon and share. The Creative Commons model makes a distinction between the big C (Copyright) meaning All Rights Reserved and CC meaning Some Rights Reserved. It does so by offering copyright holders licences to assign to their work, which will clarify the conditions of use and avoid many of the problems current copyright laws pose when attempting to share information.

What Licences?

There are a series of eleven Creative Commons licences available to download from the Web site. They enable copyright holders to allow display, public performance, reproduction and distribution of their work while assigning specific restrictions. The six main licences combine the four following conditions:

	Attribution - Users of your work must credit you.
	Non-commercial - Users of your work can make no financial gain from it.
	Non-derivative - Only verbatim copies of your work can be used.
	Share-alike - Subsequent works have to be made available under the same licence as the original.

The other licences available are the Sampling licence, the Public Domain Dedication, Founders Copyright, the Music Sharing licence and the CC Zero licence. Creative Commons also recommends two open source software licences for those licensing software: the GNU General Public licence and the GNU Lesser Public licence.

Each license is expressed in three ways: (1) legal code, (2) a commons deed explaining what it means in lay person's terms and (3) a machine-readable description in the form of RDF/XML (Resource Description Framework/Extensible Mark up Language) metadata. Copyright holders can embed the metadata in HTML pages.

International Creative Commons

The Creative Commons licences were originally written using an American legal model but through the Creative Common international (CCi) have since been adapted for use in a number of different jurisdictions. The regional complexities of UK law has meant that two different set of licences have had to be drafted for use of the licenses the UK. Creative Commons works with the Arts and Humanities Research Board Centre for Studies in Intellectual Property and Technology Law at Edinburgh University on the Scotland jurisdiction-specific licenses and the Information Systems and Innovation Group (ISIG) to create the England and Wales jurisdiction-specific licenses.

Why Use Creative Commons Licences?

There are many benefits to be had in clarifying the rights status of a work. When dealing with Creative Commons licenced work, it is known if the work can be used without having to contact the author, thus allowing the work to be exploited more effectively, more quickly and more widely, whilst also increasing the impact of the work. Also in the past clarification of IPR has taken a huge amount of time and effort, Creative Commons could save some projects a considerable amount of money and aid their preservation strategies. More recently, because Creative Commons offers its licence in a machine-readable format, search engines can now search only CC licenced resources allowing users easier access to 'free materials'.

Issues

Although Creative Commons has now been in existence for a while there are still issues to be resolved. For example in the UK academic world the question of who currently holds copyright is a complex one with little commonality across institutions. A study looking at the applicability of Creative Commons licences to public sector organisations in the UK has been carried out [2].

Another key area for consideration is the tension between allowing resources to be freely available and the need for income generation. Although use of a Creative Commons license is principally about allowing resources to be used by all, this does not mean that there has to be no commercial use. One option is dual licensing, which is fairly common in the open source software environment.

References

Creative Commons,
<http://creativecommons.org/>
Creative Commons Licensing Solutions for the Common Information Environment, Intrallect,
<http://www.intrallect.com/cie-study/>

Briefing 83

An Introduction To Podcasting

What Is Podcasting?

Podcasting has been described as "a method of publishing files to the internet, often allowing users to subscribe to a feed and receive new files automatically by subscription, usually at no cost." [1].

Podcasting is a relatively new phenomena becoming popular in late 2004. Some of the early adopters regard Podcasting as a democratising technology, allowing users to easily create and publish their own radio shows which can be easily accessed within the need for a broadcasting infrastructure. From a technical perspective, Podcasting is an application of the RSS 2.0 format [2]. RSS can be used to syndicate Web content, allowing Web resources to be automatically embedded in third party Web sites or processed by dedicated RSS viewers. The same approach is used by Podcasting, allowing audio files (typically in MP3 format) to be automatically processed by third party applications - however rather than embedding the content in Web pages, the audio files are transferred to a computer hard disk or to an MP3 player - such as an iPod.

The strength of Podcasting is the ease of use it provides rather than any radical new functionality. If, for example, you subscribe to a Podcast provided by the BBC, new episodes will appear automatically on your chosen device - you will not have to go to the BBC Web site to see if new files are available and then download them.

Note that providing MP3 files to be downloaded from Web sites is sometimes described as Podcasting, but the term strictly refers to automated distribution using RSS.

What Can Podcasting Be Used For?

There are several potential applications for Podcasting in an educational context:

Recording of lectures, allowing students to easily access the recording as a revision aid, to catch up on missed lectures, etc.
Asking students to record their own Podcasts on, for example, project reports.
Automated conversion of text files, email messages, RSS feeds, etc. to MP3 format, allowing the content to be accessed on mobile MP3 players.
Maximising the impact of talks by allowing seminars, lectures, conference presentations, etc. to be listened to by a wider audience.
Recordings of meetings to provide access for people who could not attend.
Enhancing the accessibility of talks to people with disabilities.

Possible Problems

Although there is much interest in the potential for Podcasting, there are potential problem areas which will need to be considered:

Recording lectures, presentations, etc. may infringe copyright or undermine the business model for the copyright owners.
Making recordings available to a wider audience could mean that comments could be taken out of context or speakers may feel inhibited when giving presentations.
The technical quality of recordings may not be to the standard expected.
Although appealing to the publisher, end users may not make use of the Podcasts.

It would be advisable to seek permission before making recordings or making recordings available as Podcasts.

Podcasting Software

Listening To Podcasts

It is advisable to gain experiences of Podcasting initially as a recipient, before seeking to create Podcasts. Details of Podcasting software is given at [3] and [4]. Note that support for Podcasts in iTunes v. 5 [5] has helped enhance the popularity of Podcasts. You should note that you do not need a portable MP3 player to listen to Podcasts - however the ability to listen to Podcasts while on the move is one of its strengths.

Creating Podcasts

When creating a Podcast you first need to create your MP3 (or similar) audio file. Many recording tools are available, such as the open source Audacity software [6]. You may also wish to make use of audio editing software to edit files, include sound effects, etc.

You will then need to create the RSS file which accompanies your audio file, enabling users to subscribe to your recording and automate the download. An increasing number of Podcasting authoring tools and Web services are being developed [7] .

References

Podcasting, Wikipedia,
<http://en.wikipedia.org/wiki/Podcasting>
RSS 2.0, Wikipedia,
<http://en.wikipedia.org/wiki/Really_Simple_Syndication>
iPodder Software,
<http://www.ipodder.org/directory/4/ipodderSoftware>
iTunes - Podcasting,
<http://www.apple.com/podcasting/>
Podcasting Software (Clients), Podcasting News,
<http://www.podcastingnews.com/topics/Podcast_Software.html>
Audacity,
<http://audacity.sourceforge.net/>
Podcasting Software (Publishing), Podcasting News,
<http://www.podcastingnews.com/topics/Podcasting_Software.html>

Briefing 84

Usage Statistics For Web Sites

About This Document

Information on performance indicators for Web sites has been published elsewhere [1] [2]. This document provides additional information on the specific need for usage statistics for Web sites and provides guidance on ways of ensuring the usage statistics can be comparable across Web sites.

About Usage Statistics For Web Sites

When a user accesses a Web page several resources will normally be downloaded to the user (the HTML file, any embedded images, external style sheet and JavaScript files, etc.). The Web server will keep a record of this, including the names of the files requested and the date and time, together with some information about the user's environment (e.g. type of browser being used).

Web usage analysis software can then be used to provide overall statistics on usage of the Web site. As well as giving an indication of the overall usage of a Web site, information can be provided on the most popular pages, the most popular entry points, etc.

What Can Usage Statistics Be Used For?

Usage statistics can be used to give an indication of the popularity of Web resources. Usage statistics can be useful if identifying successes or failures in dissemination strategies or in the usability of a Web site.

Usage statistics can also be useful to system administrators who may be able to use the information (and associated trends) in capacity planning for server hardware and network bandwidth.

Aggregation of usage statistics across a community can also be useful in profiling the impact of Web services within the community.

Limitations Of Usage Statistics

Although Web site usage statistics can be useful in a number of areas, it is important to be aware of the limitations of usage statistics. Although initially it may seem that such statistics should be objective and unambiguous, in reality this is not the case.

Some of the limitations of usage statistics include:

The numbers may be under-reported due to caches - which improve the performance of Web sites by keeping a copy of Web resources.
The numbers may be over-reported due to use of off-line browsers which can download Web resources which are not viewed.
The numbers may be over-reported due to reported on accessed by indexing software (e.g. the Google robot software).
Aggregation of usage statistics may be flawed due to organisations processing the data in-consistently (e.g. some removing data from robots when others do not).
Errors may be introduced when merging statistical data from a variety of sources.

Recommendations

Although Web site usage statistics cannot be guaranteed to provide a clear and unambiguous summary of Web site usage, this does not mean that the data should not be collected and used. There are parallels with TV viewing figures which are affected by factors such as video recording. Despite such known limitations, this data is collected and used in determining advertising rates.

The following advice may be useful:

Document Your Approaches And Be Consistent

You should ensure that you document the approaches taken (e.g. details of the analysis tool used) and any processing carried out on the data (e.g. removing robot traffic or access from within the organisation). Ideally you will make any changes to the processing, but if you do you should document this.

Consider Use Of Externally Hosted Usage Services

Traditional analysis packages process server log files. An alternative approach is to make use of an externally-hosted usage analysis service. These services function by providing a small graphical image (which may be invisible) which is embedded on pages on your Web site. Accessing a page causes the graphic and associated JavaScript code, which is hosted by a commercial company, to be retrieved. Since the graphic is configured to be non-cachable, the usage data should be more reliable. In addition the JavaScript code can allow additional data to be provided, such as additional information about the end users PC environment.

References

Performance Indicators For Your Project Web Site, QA Focus briefing document No. 17,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-17/>
Performance Indicators For Web Sites, Exploit Interactive (5), 2000,
<http://www.exploit-lib.org/issue5/indicators/>

Briefing 85

An Introduction To Web Services

What Are Web Services?

Web services are a class of Web application, published, located and accessed via the Web, that communicates via an XML (eXtensible Markup Language) interface [1]. As they are accessed using Internet protocols, they are available for use in a distributed environment, by applications on other computers.

What's The Innovation?

The idea of Internet-accessible programmatic interfaces, services intended to be used by other software rather than as an end product, is not new. Web services are a development of this idea. The name refers to a set of standards and essential specifications that simplify the creation and use of such service interfaces, thus addressing interoperability issues and promoting ease of use.

Well-specified services are simple to integrate into larger applications, and once published, can be used and reused very effectively and quickly in many different scenarios. They may even be aggregated, grouped together to produce sophisticated functionality.

Example: Google Spellchecker And Search Services

The Google spellchecker service, used by the Google search engine, suggests a replacement for misspelt words. This is a useful standard task; simply hand it a word, and it will respond with a suggested spelling correction if one is available. One might easily imagine using the service in one's own search engine, or in any other scenario in which user input is taken, perhaps in an intelligent "Page not found" error page, that attempts to guess at the correct link. The spellchecker's availability as a Web service simplifies testing and adoption of these ideas.

Furthermore, the use of Web services is not limited to Web-based applications. They may also usefully be integrated into a broad spectrum of other applications, such as desktop software or applets. Effectively transparent to the user, Web service integration permits additional functionality or information to be accessed over the Web. As the user base continues to grow, many development suites focus specifically on enabling the reuse and aggregation of Web services.

What Are The Standards Underlying Web Services?

'Web services' refers to a potentially huge collection of available standards, so only a brief overview is possible here. The exchange of XML data uses a protocol such as SOAP or XML-RPC. Once published, the functionality of the Web service may be documented using one of a number of emerging standards, such as WSDL, the Web Service Description Language.

WSDL provides a format for description of a Web service interface, including parameters, data types and options, in sufficient detail for a programmer to write a client application for that service. That description may be added to a searchable registry of Web services.

A proposed standard for this purpose is UDDI (Universal Description, Discovery and Integration), described as a large central registry for businesses and services. Web services are often seen as having the potential to 'flatten the playing field', and simplify business-to-business operations between geographically diverse entities.

Using Web Services

Due to the popularity of the architecture, many resources exist to support the development and use of Web services in a variety of languages and environments. The plethora of available standards may pose a problem, in that a variety of protocols and competing standards are available and in simultaneous use. Making that choice depends very much on platform, requirements and technical details.

Although Web services promise many advantages, there are still ongoing discussions regarding the best approaches to the underlying technologies and their scope.

References

The JISC Information Environment and Web Services, A. Powell and E. Lyon, Ariadne, issue 31, April 2002,
<http://www.ariadne.ac.uk/issue31/information-environments/>
World Wide Web Consortium Technical Reports, W3C,
<http://www.w3.org/TR/>

Further Information

Web Services Description Working Group, W3C, <http://www.w3.org/2002/ws/desc/>
Top Ten FAQs for Web Services, XML.COM, <http://webservices.xml.com/pub/a/ws/2002/02/12/webservicefaqs.html>
Web Services, Wikipedia, <http://en.wikipedia.org/wiki/Web_services>

Briefing 86

Usability and the Web

Background

Usability refers to a quality attribute that assesses how easy user interfaces are to use. The term is also used to refer to a number of techniques and methods for improving usability during the various stages of design and development.

What Does Usability Include?

Usability can be separated into several components [1] such as:

Efficiency: How quickly an experienced user can perform a given task Memorability: Once familiar with an interface, is it easily forgettable? Errors: How easy is it to make mistakes/recover from mistakes? Satisfaction: Is the design enjoyable to use?

Learnability:: How easy it is to get to grips with an unfamiliar interface?
Efficiency:: How quickly an experienced user can perform a given task?
Memorability:: Once familiar with an interface, is it easily forgettable?
Errors:: How easy is it to make mistakes/recover from mistakes?
Satisfaction:: Is the design enjoyable to use?

These characteristics are all useful metrics, although the importance of each one depends on the expected uses of the interface in question. In some circumstances, such as software designed for a telephone switchboard operator, the time it takes for a skilled user to complete a task is rather more important than learnability or satisfaction. For an occasional web user, a web site's designers may wish to focus principally on providing a site that is learnable, supports the user, and is enjoyable to use. Designing a usable site therefore requires a designer to learn about the needs of the site's intended users, and to test that their design meets the criteria mentioned above.

Why Does Usability Matter?

More attention is paid to accessibility than to usability in legislation, perhaps because accessibility is perceived as a clearly defined set of guidelines, whilst usability itself is a large and rather nebulous set of ideas and techniques. However, a Web site can easily pass accessibility certification, and yet have low usability; accessibility is to usability what legible handwriting is to authorship. Interfaces with low usability are often frustrating, causing mistakes to be made, time to be wasted, and perhaps impede the user from successfully reaching their intended goal at all. Web sites with low usability will not attract or retain a large audience, since if a site is perceived as too difficult to use, visitors will simply prefer to take their business elsewhere.

Usability Testing

User testing is traditionally an expensive and complicated business. Fortunately, modern discount ('quick and dirty') methods have changed this, so that it is now possible to quickly test the usability of a web site at any stage in its development. This process, of designing with the user in mind at all times, is known as user-centred design. At the earliest stages, an interface may be tested using paper prototypes or simple mockups of the design. It is advisable to test early and often, to ensure that potential problems with a design are caught early enough to solve cheaply and easily. However, completed Web sites also benefit from usability testing, since many such problems are easily solved.

User testing can be as simple as asking a group of users, chosen as representative of the expected user demographic, to perform several representative tasks using the Web site. This often reveals domain-specific problems, such as vocabulary or language that is not commonly used by that group of users. Sometimes user testing can be difficult or expensive, so discount techniques such as heuristic evaluation [2], where evaluators compare the interface with a list of recommended rules of thumb, may be used. Other discount techniques include cognitive walkthrough in which an evaluator role-plays the part of a user trying to complete a task. These techniques may be applied to functional interfaces, to paper prototypes, or other mockups of the interface.

A common method to help designers is the development of user personas, written profiles of fictitious individuals who are designed to be representative of the site's intended users. These individuals' requirements are then used to inform the design process and to guide the design process.

Conclusions

Considering the usability of a web site not only helps users, but also tends to improve the popularity of the site in general. Visitors are likely to get a better impression from usable sites. Quick and simple techniques such as heuristic evaluation can be used to find usability problems; frequent testing of a developing design is ideal, since problems can be found and solved early on. Several methods of usability testing can be used to expose different types of usability problems.

References And Further Information

Usability 101: Introduction to Usability, J. Nielsen,
<http://hcibib.org/tcuid/chap-4.html>
Heuristic Evaluation, J. Nielsen,
<http://portal.acm.org/citation.cfm?id=142869>

Ten Usability Heuristics, J. Nielsen,
<http://www.useit.com/papers/heuristic/heuristic_list.html>
Personas: Matching a design to the users' goals,
<http://www.uie.com/articles/personas/>

Briefing 87

Introduction to Cognitive Walkthroughs

Introduction To Cognitive Walkthroughs

The cognitive walkthrough is a method of discount ("quick and dirty") usability testing requiring several expert evaluators. A set of appropriate or characteristic tasks to be completed is compiled. The evaluators then "walk" through each task, noting down problems or difficulties as they go.

Since cognitive walkthroughs are often applied very early in development, the evaluators will often be working with mockups of interfaces such as paper prototypes and role-playing the part of a typical user. This is made much simpler if user personas, detailed descriptions of fictitious users, have been developed, because these simplify the role-playing element of cognitive walkthrough. These are often developed at the beginning of a user-centred design process, because designers often find it much easier to design to the needs of a specific user.

Evaluators are typically experts such as usability specialists, but the same basic technique can also be applied successfully in many different situations.

The Method

Once you have a relatively detailed prototype, paper or otherwise, you are ready to try a cognitive walkthrough.

Start off by listing the tasks that you expect users to be able to perform using your Web site or program. To do this, think about the possible uses of the site; perhaps you are expecting users to be able to book rooms or organise tours, or find out what events your organisation is running in the next month, or find opening times and contact details for your organisation. Write down each of these tasks.

Secondly, separate these tasks into two parts: the user's purpose (their intention) and the goals that they must achieve in order to complete this. Take the example of organising a tour; the user begins with the purpose of finding out what tours are available. In order to achieve this, they look for a link on your Web site leading to a Web page detailing possible tours. Having chosen a tour, they gain a new purpose - organising a tour date - and a new set of goals, such as finding a Web page that lets them book a tour date and filling it out appropriately.

Separating tasks into tiny steps in this way is known as decomposition, and it is mostly helpful because it allows you to see exactly where and when the interface fails to work with the user's expectations. It is important to do this in advance, because otherwise you find yourself evaluating your own trial-and-error exploration of the interface! Following these steps "wearing the users' shoes" by trying out each step on a prototype version of the interface shows you where the user might reach an impasse or a roadblock and have to retrace his or her steps to get back on track. As a result, you will gain a good idea of places where the interface could be made simpler or organised in a more appropriate manner.

To help this process, a Walkthrough Evaluation Sheet is filled in for each step taken. An example is shown below [1]:

Will the users be trying to produce whatever effect the action has?
Will users see the control (button, menu, switch, etc.) for the action?
Once users find the control, will they recognize that it produces the effect they want?
After the action is taken, will users understand the feedback they get, so they can go on to the next action with confidence?

Advantages and Disadvantages

Cognitive walkthroughs are often very good at identifying certain classes of problems with a Web site, especially showing how easy or difficult a system is to learn or explore effectively - how difficult it will be to start using that system without reading the documentation, and how many false moves will be made in the meantime.

The downside is principally that on larger or more complex tasks they can sometimes be time-consuming to perform, so the technique is often used in some altered form. For example, instead of filling out an evaluation sheet at each step, the evaluation can be recorded on video [2]; the evaluator can then verbally explain the actions at each step.

Conclusions

'Cognitive walkthroughs are helpful in picking out interface problems at an early stage, and works particularly well together with a user-centred design approach and the development of user personas. However, the approach can sometimes be time-consuming, and since reorganising the interface is often expensive and difficult at later stages in development, the cognitive walkthrough is usually applied early in development.

References

Evaluating the design without users, from Task-Centered User Interface Design,
<http://hcibib.org/tcuid/chap-4.html>
The Cognitive Jogthrough,
<http://portal.acm.org/citation.cfm?id=142869>

Briefing 88

Task Analysis and Usability

Background

A key issue in usability is that of understanding users, and a key part of user-centred design is that of describing the tasks that the users expect to be able to accomplish using the software you design [1]. Because of the origins of usability as a discipline, a lot of the terminology used when discussing this issue comes from fields such as task analysis. This briefing paper defines some of these terms and explains the relationship between usability and task analysis.

What Is Task Analysis?

Within the usability and human-computer interaction communities, the term is generally used to describe study of the way people perform tasks - that is, the way in which a task is currently performed in real-life situations. Task analysis does not describe the optimal or ideal procedure for solving a problem. It simply describes the way in which the problem is currently solved.

Gathering Data For Task Analysis

Since the intent of task analysis is description of an existing system, the ideal starting point is data gathered from direct observation. In some cases, this is carried out in a controlled situation such as a usability laboratory. In others, it is more appropriate to carry out the observation "in the field" - in a real-life context. These may yield very different results!

Observational data can be gathered on the basis of set exercises, combined with the "think-aloud" technique, in which subjects are asked to describe their actions and their reasoning as they work through the exercise. Alternatively, observations can be taken by simply observing subjects in the workplace as they go through a usual day's activities. The advantage of this latter method is principally that the observer influences events as little as possible, but the corresponding disadvantage is that the observations are likely to take longer to conclude.

Unfortunately, there are significant drawbacks of direct observation, principally cost and time constraints. For this reason, task analysis is sometimes carried out using secondary sources such as manuals and guidebooks. This, too, has drawbacks - such sources often provide an idealised or unrealistic description of the task.

A third possibility is conducting interviews - experts, themselves very familiar with a task, can easily answer questions about that task. While this can be a useful way of solving unanswered questions quickly, experts are not always capable of precisely explaining their own actions as they can be too familiar with the problem domain, meaning that they are not aware on a conscious level of the steps involved in the task.

Analysing Observations

There are several methods of analysing observational data, such as knowledge-based analysis, procedural [2] or hierarchical task analysis, goal decomposition (the separation of each goal, or step, into its component elements) and entity-relationship based analysis. Data can also be visualised by charting or display as a network. Some methods are better suited to certain types of task - e.g. highly parallel tasks are difficult to describe using hierarchical task analysis (HTA). On the other hand, this method is easy for non-experts to learn and use. Each answers a slightly different question - for example, HTA describes the knowledge and abilities required to complete a task, while procedural task analysis describes the steps required to complete a task.

A simple procedural task analysis is completed as follows:

Choose the appropriate procedure to complete the task that is being analysed.
Determine and write down each step in that procedure; break down each step as far as possible.
Complete every step of the procedure.
Check that the procedure gave the correct result.

These steps can be charted as a flowchart for a clear and easy to read visual representation.

Conclusions

Task analysis provides a helpful toolkit for understanding everyday processes and for describing how human beings solve problems. It is not appropriate to perform detailed task analysis in every situation, due to cost and complexity concerns. However, the results of a task analysis can usefully inform design or pinpoint usability problems, particularly differences between the system designer's assumptions and the users' "mental models" - ways of looking at - the task to be performed.

References

Task Analysis and Human-Computer Interaction, Crystal & Ellington,
<http://www.ils.unc.edu/~acrystal/AMCIS04_crystal_ellington_final.pdf>
Procedural Task Analysis,
<http://classweb.gmu.edu/ndabbagh/Resources/Resources2/procedural_analysis.htm>

Briefing 89

Heuristic Evaluation

Background

Heuristic evaluation is a method of user testing, which enables a product to be assessed in order to identify usability problems - that is, places where the product is not easy to use. It is a discount ("quick and dirty") method, which means that it is cheap and requires relatively little expertise.

What's Involved In Heuristic Evaluation?

In this technique, a number of evaluators are first introduced to the heuristics, then given some tasks to complete and invited to report the problems - where the system fails to comply with the heuristics - either verbally or in some form of written report or checklist. Unlike many forms of usability testing, the evaluators do not have to be representative of the system's expected users (although they can be!), nor do the evaluators have to be experts, as the heuristics can be read and understood in a few minutes. Just three to five evaluators are needed to find the majority of usability problems, so the technique is quite efficient and inexpensive.

The problems found in heuristic evaluation essentially represent subjective opinions about the system. Evaluators will frequently disagree (there are no absolute right or wrong answers) but these opinions are useful input to be considered in interface design.

What Heuristics Should I Use?

There are several sets of possible heuristics available on the Web and elsewhere. This reflects the fact that they are "rules of thumb", designed to pick out as many flaws as possible, and various sets of usability evaluators have found different formalisations to be most useful for their needs, e.g. [1]. Probably the most commonly used is Nielsen's set of ten usability heuristics [2] given below with a sample question after each one:

Visibility of system status: Does the system give timely & appropriate feedback?
Match between system and the real world: Is it speaking the users' language?
User control and freedom: How hard is it to undo unwanted actions?
Consistency and standards: Does it follow conventions and expectations?
Error prevention: Are potential errors recognised before becoming a problem?
Recognition rather than recall: Does the system rely on the users' memory?
Aesthetic & minimalist design: Are dialogs cluttered with information?
Help users recognise, diagnose & recover from errors: Are error messages useful?
Help and documentation: Is there online help? Is it useful?

An excellent resource to help you choose a set of heuristics is the Interactive Heuristic Evaluation Toolkit [3] which offers heuristics tailored to your expected user group, type of device, and class of application.

When Should Heuristic Evaluation Be Carried Out?

As heuristic evaluation is simple and cheap, it is possible to use it to quickly test the usability of a web site at any stage in its development. Waiting until a fully functional prototype Web site exists is not necessary; interface ideas can be sketched out onto paper or mocked up using graphics software or Flash. These mockups can be tested before any actual development takes place.

Most projects will benefit from a user-centred design process, an approach that focuses on supporting every stage of the development process with user-centred activities. It is advisable to test early and often, in order to ensure that potential problems with a design are caught early enough that they can be solved cheaply. However, even web sites that are already active can benefit from usability testing, since many such problems are easily solved, but some problems are difficult or expensive to solve at a late stage.

Conclusions

If a developing design is tested frequently, most usability problems can be found and solved at an early stage. Heuristic evaluation is a simple and cheap technique that finds the majority of usability problems. An existing Web site or application will often benefit from usability testing, but testing early and often provides the best results. Finally, it is useful to alternate use of heuristic evaluation with use of other methods of usability testing, such as user testing, since the two techniques often reveal different sets of usability problems.

References

Heuristic Evaluation - A System Checklist, Deniese Pierotti, Xerox Corp.
<http://www.stcsig.org/usability/topics/articles/he-checklist.html>
Heuristic Evaluation, Jakob Nielsen,
<http://www.useit.com/papers/heuristic/>
Interactive Heuristic Evaluation Toolkit,
<http://www.id-book.com/catherb/>

Further Information

Ten Usability Heuristics, Jakob Nielsen,
<http://www.useit.com/papers/heuristic/heuristic_list.html>
Getting Started With User Centred Design, UsabilityNet,
<http://www.usabilitynet.org/tools/gettingstarted.htm>

Briefing 90

Developing User Personas

Background

When designing a Web site or program, the obvious question to ask at once is, "who are my audience?" It seems natural to design with users in mind, and just as natural to wish to build a product that is satisfactory to all one's users - however, experience shows that it is difficult to design something that appeals to everybody [1]. Instead, it is useful to start with a few sample profiles of users, typical examples of the audience to whom the design should appeal, and design to their needs. Not only is it easier for the designer, but the result is usually more appealing to the user community.

Researching A User Persona

The first step in developing a user persona is to learn a little about your users; qualitative research techniques like one-to-one interviews are a good place to start. It's best to talk to several types of users; don't just focus on the single demographic you're expecting to appeal to, but consider other groups as well. Focusing on one demographic to the exclusion of others may mean that others do not feel comfortable with the resulting design, perhaps feeling alienated or confused. The expected result of each interview is a list of behaviour, experience and skills. After a few interviews, you should see some trends emerging; once you feel confident with those, it's time to stop interviewing and start to build personas [2].

Developing A User Persona

Once you have an idea of each type of persona, write down the details for each one. It may help to write a sort of biography, including the following information:

Vital statistics: name, age, gender and personality details (shy, timid, outgoing?)
Interests and hobbies
Experience and education
Motivation

You can even find a photograph or sketch that you feel fits the personality and add it to the persona's description.

Why User Personas?

The intent behind a user persona is to create a shared vocabulary for yourself and your team when discussing design questions and decisions. User personas provide easy-to-remember shorthand for user types and behaviour, and can be used to refer to some complex issues in a simple and generally understood way. Sharing them between management and development teams, perhaps even with funders, also provides a useful avenue for effective communication of technical subjects. Furthermore, it is much easier to design for a persona with whom one can empathise than for a brief, dry description of user demographics.

It is good practice, when making design decisions, to consider each user persona's likely reaction to the result of the decision. Which option would each user persona prefer?

User personas can also feed in to discount usability testing methods such as the cognitive walkthrough, saving time and increasing the effectiveness of the approach.

Finally, the research required to create a user persona is an important first step in beginning a user-centred design process, an approach that focuses on supporting every stage of the development process with user-centred activities, which is strongly recommended in designing for a diverse user group.

Conclusions

User personas are a useful resource with which to begin a design process, which allow the designers to gain understanding of their users' expectations and needs in a cheap and simple manner, and can be useful when conducting discount usability testing methods. Additionally, they make helpful conversational tools when discussing design decisions.

References

The Inmates are Running the Asylum, Alan Cooper, ISBN: 0672316498
5 Minute Whitepaper: Which persona are you targeting?,
<http://newsletter.refinery.com/e_article000334332.cfm?x=b11,0,w>

Further Information

Ten Usability Heuristics, Jakob Nielsen,
<http://www.useit.com/papers/heuristic/heuristic_list.html>
Getting Started With User Centred Design, UsabilityNet,
<http://www.usabilitynet.org/tools/gettingstarted.htm>

Briefing 91

The e-Framework for Education and Research

The e-Framework is an initiative by the UK's Joint Information Systems Committee (JISC), Australia's Department of Education, Science and Training (DEST) and partners to produce an evolving and sustainable, open standards based, service oriented technical framework to support the education and research communities.

The e-Framework supports a service oriented approach to developing and delivering education, research and management information systems. Such an approach maximises the flexibility and cost effectiveness with which systems can be deployed, both in an institutional context, nationally and internationally.

The e-Framework allows the community to document its requirements and processes in a coherent way, and to use these to derive a set of interoperable network services that conform to appropriate open standards. By documenting requirements, processes, services, protocol bindings and standards in the form of 'reference models' members of the community are better able to collaborate on the development of service components that meet their needs (both within the community and with commercial and other international partners). The 'e-Framework' also functions as a strategic planning tool for the e-Framework partners.

The initiative builds on the e-Learning Framework [1] and the JISC Information Environment [2] as well as other service oriented initiatives in the areas of scholarly information, research support and educational administration. A briefing paper that provides an overview of the e-Framework [3] and how the partners intend to use it can be found in the resources section [4].

Guiding Principles For The e-Framework

The e-Framework Partnership intends to operate in accordance with the following guiding principles:

The Adoption of a Service Oriented Approach to System and Process Integration

A service-oriented framework provides significant benefits to stakeholders including policy makers, managers, institutions, suppliers and developers and is a business driven approach for developing ICT infrastructure that encourages innovation by being agile and adaptive.

A service-oriented framework currently provides the best means of addressing systems integration issues within institutions, between institutions and across the domains within education and research.

The definition of services is driven by business requirements and processes. The factoring of the services is a key to the effectiveness of the framework.

A high level 'abstract' service definition should not duplicate or overlap another service. An abstract service definition is a description of a service that is independent of the language or platform that may be used to implement the service.

The e-Framework activities will strive for technical excellence and adoption of co-developed good practices.

The Development, Promotion and Adoption of Open Standards

Open standards are key to achieving integration between systems, institutions and between domains in the education and research communities. Open standards are defined for the e-Framework as those standards that are developed collaboratively through due process, are platform independent, vendor neutral, extensible, reusable, publicly accessible and not encumbered by royalties. In order to achieve impact open standards require international collaboration and consensus.

Community Involvement in the Development of the e-Framework

Framework. Collaboration between technical and domain experts, practitioners, developers and vendors will be essential to the evolution and uptake of the e-Framework approach. Capacity and capability will need to be developed

Open Collaborative Development Activities

In order to support evolution of the e-Framework, results will be publicly available. Engagement with communities of use will be essential in the development of the e-Framework. Sustained international development of the e-Framework cannot be undertaken by a single organisation and collaboration between organisations is required. Where possible and appropriate, Open Intellectual Property licensing approaches (such as open source, Creative Commons, royalty free patent licences) will be adopted.

Flexible and Incremental Deployment of the e-Framework

The e-Framework supports and promotes flexible deployment by institutions and facilitates incremental deployment and change. The e-Framework will accommodate both open source and proprietary implementations. Institutions will decide whether to use open or closed source implementations in deploying the e-Framework

The e-Framework's founding organisations, DEST and JISC, have devised a temporary model for the management of and engagement with the e-Framework designed to support the incubation and nurture of the e-Framework, throughout the critical early years of development. The governance and stewardship structures will be iteratively refined as part of the e-Framework work plan.

About This Document

This document is a modified version of a document on "The e-Framework for Education and Research" published on the E-Framework Web site at <http://www.e-framework.org/about/> (version last modified on 2005-10-16 11:05 PM).

The document was originally written by Wilbert Kraan, CETIS and has been republished as a QA Focus briefing document. We are grateful to Wilbert for permission to reprint this document.

Briefing 92

An Introduction to Web 2.0

Web 2.0

The term 'Web 2.0' was coined to define an emerging pattern of new uses of the Web and approaches to the Web development, rather than a formal upgrade of Web technologies as the 2.0 version number may appear to signify. The key Web 2.0 concepts include:

It's an attitude, not a technology:: An acknowledgement that Web 2.0 is not primarily about a set of standards or applications, but a new mindset to how the Web can be used.
A network effect:: This describes applications which are more effective as the numbers of users increase. This effect is well-known in computer networks, with the Internet providing an example of how network traffic can be more resilient as the numbers of devices on the Internet grows.
Openness:: The development of more liberal licences (such copyright licences such Creative Commons; open sources licences for software) can allow integration of data and reuse of software without encountering legal barriers.
Trust Your Users:: Rather than having to develop complex access regimes, a more liberal approach can be taken who can make it easier for users to make use of services.
Network as a platform:: The Web can now be used to provide access to Web applications, and not just informational resources. This allows users to make use of applications without having to go through the cumbersome exercise of installing software on their local PC.
Always beta:: With Web applications being managed on a small number of central servers, rather on large numbers of desktop computers, it becomes possible for the applications to be enhanced in an incremental fashion, with no requirements for the user of the application to upgrade their system.
The long tail:: As the numbers of users of the Web grows, this can provide business opportunities for niche markets which previously it may not have been cost-effective to reach.
Small pieces, loosely coupled:: As the technical infrastructure of the Web stabilises, it becomes possible to integrate small applications. This enables services to be developed more rapidly and can avoid the difficulties ort developing and maintaining more complex and cumbersome systems.

Web 2.0 Application Areas

The key application areas which embody the Web 2.0 concepts include:

Blogs: A Web site which is commonly used to provide diaries, with entries provided in chronological order. Blogs can be used for a variety of purposes, ranging from reflective learning by students and researchers through to dissemination channels for organisations.
Wikis: A wiki refers to a collaborative Web-based authoring environment. The term wiki comes from an Hawaiian word meaning 'quick' and the origins of the name reflect the aims of the original design of wikis to provide a very simple authoring environment which allows Web content to be created with the need to learn the HTML language or to install and master HTML authoring tools.
Syndicated Content: RSS and Atom formats have been developed to enable content to be automatically embedded elsewhere. RSS was initially developed to support reuse of blog content produced. RSS's success led to the format being used in other areas (initially for the syndication of news feeds and then for other alerting purposes and general syndication of content). The Atom format was developed as an alternative to RSS.
Mashups: A mashup is a service which contains data and services combined from multiple sources. A common example of a mashup is a Google Maps mashup which integrated location data was a map provided by the Google Maps service.
Podcasts: A podcast initially referred to syndicated audio content, which can be transferred automatically to portable MP3 players, such as iPods. However the term is sometimes misused to describe a simple audio file.
Social sharing services: Applications which provide sharing of various types of resources such as bookmarks, photographs, etc. Popular examples of social sharing services include del.icio.us and Flickr.
Social networks: Communal spaces which can be used for group discussions and sharing of resources.
Folksonomies and tagging: A bottom-up approach to providing labels for resources, to allow them to be retrieved.

Further Information

An Introduction to RSS and News Feeds, QA Focus briefing document 77, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-77/>
An Introduction to Wikis, QA Focus briefing document 78, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-78/>
Web 2.0: Supporting Library Users, QA Focus briefing document 102, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-102/>

Briefing 93

An Introduction to AJAX

What Is AJAX?

Asynchronous JavaScript and XML (AJAX) is an umbrella term for a collection of Web development technologies used to create interactive Web applications, mostly W3C standards (the XMLHttpRequest specification is developed by WHATWG [1]:

XHTML - a stricter, cleaner rendering of HTML into XML.
CSS for marking up and adding styles.
The Javascript Document Object Model (DOM) which allows the content, structure and style of a document to be dynamically accessed and updated.
The XMLHttpRequest object which exchanges data asynchronously with the Web server reducing the need to continually fetch resources from the server.

Since data can be sent and retrieved without requiring the user to reload an entire Web page, small amounts of data can be transferred as and when required. Moreover, page elements can be dynamically refreshed at any level of granularity to reflect this. An AJAX application performs in a similar way to local applications residing on a user's machine, resulting in a user experience that may differ from traditional Web browsing.

The Origins of AJAX

Recent examples of AJAX usage include Gmail [2], Flickr [3] and 24SevenOffice [4]. It is largely due to these and other prominent sites that AJAX has become popular only relatively recently - the technology has been available for some time. One precursor was dynamic HTML (DHTML), which twinned HTML with CSS and JavaScript but suffered from cross-browser compatibility issues. The major technical barrier was a common method for asynchronous data exchange; many variations are possible, such as the use of an "iframe" for data storage or JavaScript Object Notation for data transmission, but the wide availability of the XMLHttpRequest object has made it a popular solution. AJAX is not a technology, rather, the term refers to a proposed set of methods using a number of existing technologies. As yet, there is no firm AJAX standard, although the recent establishment of the Open AJAX group [5], supported by major industry figures such as IBM and Google, suggests that one will become available soon.

Using AJAX

AJAX applications can benefit both the user and the developer. Web applications can respond much more quickly to many types of user interaction and avoid repeatedly sending unchanged information across the network. Also, because AJAX technologies are open, they are supported in all JavaScript-enabled browsers, regardless of operating system - however, implementation differences of the XMLHttpRequest between browsers cause some issues, some using an ActiveX object, others providing a native implementation. The upcoming W3C 'Document Object Model (DOM) Level 3 Load and Save Specification' [6] provides a standardised solution, but the current solution has become a de facto standard and is therefore likely to be supported in future browsers.

Although the techniques within AJAX are relatively mature, the overall approach is still fairly new and there has been criticism of the usability of its applications; further information on this subject is available in the Ajax and Usability QA Focus briefing document [7]. One of the major causes for concern is that JavaScript needs to be enabled in the browser for AJAX applications to work. This setting is out of the developer's control and statistics show that currently 10% of browsers have JavaScript turned off [8]. This is often for accessibility reasons or to avoid scripted viruses.

Conclusions

The popularity of AJAX is due to the many advantages of the technology, but several pitfalls remain related to the informality of the standard, its disadvantages and limitations, potential usability issues and the idiosyncrasies of various browsers and platforms. However, the level of interest from industry groups and communities means that it is undergoing active and rapid development in all these areas.

References

Web Hypertext Application Technology Working Group,
<http://www.whatwg.org/>
GMail,
<http://gmail.google.com/>
Flickr,
<http://www.flickr.com/>
24SevenOffice,
<http://www.24sevenoffice.com/>
The Open AJAX group,
<http://www.siliconbeat.com/entries/ajax.pdf>
Document Object Model (DOM) Level 3 Load and Save Specification, W3C,
<http://www.w3.org/TR/DOM-Level-3-LS/>
AJAX and Usability, QA Focus briefing document,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-94/>
W3Schools Browser Statistics,
<http://www.w3schools.com/browsers/browsers_stats.asp>

Briefing 94

AJAX And Usability Issues

Introducing AJAX

The term Asynchronous JavaScript and XML (AJAX) refers to a method by which a number of technologies can be combined to enable web applications to communicate in an asynchronous manner with services - that is, they can dynamically send and receive information without forcing page reloads, in a manner that is transparent to the user. This allows for a user experience similar to that of using local applications on a desktop PC. The background to AJAX is discussed in more depth in a QA Focus briefing paper [1].

Using AJAX

AJAX applications potentially offer a number of benefits, such as speed and efficiency in terms of bandwidth and time, and consistency in terms of appearance and behaviour across browser platforms. However, there are several potential disadvantages, a number of which are covered in this briefing paper. Some are related to security, such as limitations set as a response to cross-scripting security flaws, and the deactivation of JavaScript by many users, although this may also be due to accessibility issues. Others involve implementation issues, design issues or mismatch with user expectations.

AJAX and Usability

Certain usability issues have been identified as particularly common:

Concept of State

The Web is built according to a very specific content of state; once a page has been downloaded, it is usually expected to remain static. AJAX uses dynamic Web page updates, which means that state transition (the move from one page view to another) is now much more complex, as separate elements may update asynchronously. AJAX applications frequently do not store application state information; this breaks the 'back' button functionality of the browser. Many Web users use the back button as their primary means of navigation and struggle to control the system without it. Supporting undo and redo is one of the key usability rules and vital in allowing users to recover from errors - that said, it is not always possible or advisable, such as in the case of the sale event in e-commerce.

AJAX requires developers to explicitly support this functionality in their software, or use a framework that supports it natively. Various solutions to this problem have been proposed or implemented, such as the use of invisible IFRAME elements that invoke changes which populate the history originally used by the browser's back button.

A related issue is that because AJAX allows asynchronous data exchange with the server, it is difficult for users to bookmark a particular state of the application. Solutions to this problem have also started to appear. Some developers use the URL anchor or fragment identifier (the identifier after the hash '#') to keep track of state, and therefore allow users to return to the application in a given state. Some AJAX applications also include specially constructed permalinks.

The asynchronous nature of AJAX can also confound search engines and web spiders, which traditionally record only the content of the page. Since these usually disregard JavaScript entirely, an alternative access must be provided if it is desirable for the web page in question to be indexed by search engines. Many AJAX applications will not benefit from indexing by external search engines, such as email, mapping services or online chat clients; however, as the popularity of AJAX grows, it is likely that more informational sites will begin to apply the technology.

User Expectations

The Web is no longer in its infancy and most users have now become fairly familiar with its conventions. When entering a Web site there are certain expectations of how information will be served up and dealt with. Without explicit visual clues to the contrary, users are unlikely to realise that the content of a page is being modified dynamically. AJAX applications often do not offer visual clues if, for example, a change is being made to the page or content is being preloaded. The usual clues (such as the loading icon) are not always available. Again, solving this problem requires designers to explicitly support this functionality, using traditional user interface conventions wherever possible or alternative clues where necessary.

Response Time

AJAX has the potential to reduce the amount of traffic between the browser and the server, as information can be sent or requested as and when required. However, this ability can easily be misused, such as by polling the server for updates excessively frequently. Since data transfer is asynchronous, a lack of bandwidth should not be perceivable to the user; however, ensuring this is the case requires smart preloading of data.

Design Issues

AJAX makes many techniques available to developers that, previously, were available only by using DHTML or a technology like Flash. There is therefore concern that, as with these previous technologies, designers have access to a plethora of techniques that bring unfamiliar usability or accessibility problems. Gratuitous animation, pop ups, blinking text and other distractions all have accessibility implications and stop the user from fully focussing on the task at hand. When creating AJAX applications developers should fully consider the ramifications from the user's perspective.

Accessibility

Most methods of AJAX implementation rely heavily on features only present in desktop graphical browsers and not in text-only readers [2]. Developers using AJAX technologies in Web applications will find attempting to adhere to WAI accessibility guidelines a challenge. They will need to make sure that alternate options for users on other platforms, or with older browsers and slow Internet connections, are available.

A more detailed explanation of usability and the Web is available as a briefing paper [3].

Conclusions

The concerns surrounding adoption of AJAX are not unfamiliar; many stem from user and developer experience of Flash. Like Flash, the technologies comprising AJAX may be used in many different ways; some are more prone to usability or accessibility issues than others. The establishment of standard frameworks, and the increasing standardisation of the technologies behind AJAX, is likely to improve the situation for the Web developer.

In the meantime, the key for developers is to remember is that despite the availability of new approaches, good design remains essential and Jacob Nielson's Ten Usability Heuristics [4] should be kept in mind. AJAX applications need to be rigorously tested to deal with the idiosyncrasies of different browsers, platforms and usability issues. Further, applications should degrade gracefully and offer alternative functionality for those users who do not have JavaScript enabled.

Note that as the use of AJAX increases and more programming libraries become available, many of the issues mentioned in this briefing paper will be resolved. In parallel it is likely that over time browsers will standardise and incorporate better support for new technologies.

References

An Introduction To AJAX, QA Focus briefing document,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-93/>
W3Schools Browser Statistics, W3Schools,
<http://www.w3schools.com/browsers/browsers_stats.asp>
Usability and the Web, QA Focus briefing document,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-86/>
Ten Usability Heuristics, Useit.com,
<http://www.useit.com/papers/heuristic/heuristic_list.html>

Briefing 95

Service Registries And UDDI

What are Service Registries?

There are a wealth of services available on the Web. The high-profile examples are Amazon and Google APIs, which cover services as diverse as ISBN lookups, book cover image retrieval, Web search, spell-checking and geographical mapping, but many other services are available, performing all sorts of task from library access to instant price quotes and reviews for shopping sites. Some services perform a large and complex task, others perform a single task, and all speak different 'languages', from SOAP (Simple Object Access Protocol) to REST (Representational State Transfer).

For any given Web-based problem, there is an excellent possibility that a service is available that offers useful functionality. Unfortunately, finding these services is not always easy, but the process is greatly facilitated by using a service registry, designed to allow developers to register their services and for anybody, developer or end-user, to locate useful services. A service registry is an important part of any service-oriented architecture.

Types of Service Registry

Various service registries already exist. The first major standard to appear was UDDI (Universal Description, Discovery and Integration), which was designed mostly with SOAP-based Web services in mind. UDDI was originally designed with the idea that there would be very few service registries needed - like the Yellow Pages, there would be one central source of information, which would be available from a number of places, but each one would offer the same content. Several other types of service registry exist, such as the JISC's IESR (Information Environment Service Registry) project [1], which focuses on improving resource discovery mechanisms for electronic resources, that is, to make it easier to find materials to support teaching, learning and research. This briefing paper focuses on UDDI, although it is important to realise that the original UDDI standard has now been replaced by UDDI v3 [2] and is no longer generally used as part of a centralised approach. Instead, many organisations use corporate UDDI servers that are not publicly accessible.

Why Use Service Registries?

Service registries can be accessed in a number of ways. Most can be accessed via a Web interface, so if one is looking for a service or type of service, one can use a service registry like a typical search engine, entering keywords and reading textual descriptions of each service. However, many registries are designed to permit a second mode of use, where services are described in a machine-readable way. This means that, should one service become unavailable, the systems that were using that service can search the service registry for a second, compatible service that could be used in its place.

Using a UDDI Service Registry

UDDI can be used in two ways, because it is both a service registry and a business registry. One can look up businesses or organisations in the business registry, or search for services by description or keyword - UDDI also supports a formal taxonomy system that can be used for formally classifying services. It is sometimes more effective to begin searching for a service by looking at known providers. When an appropriate service has been found, it can be used at once; the UDDI server provides not only the name and description of services, but also information about where the Web service can be found, what protocol should be used to communicate with it, and details of the functionality that it can provide. Adding new services to a UDDI service registry may be done either using a Web interface designed for administrative access, or through an API (application program interface). Each different type of service registry supports its own method or methods - for example, the IESR service registry provides a Web form through which services can be added, or an XML -based service submission function.

Quality Issues

When adding data to any sort of service registry, it is important to ensure that the data is provided in the form recommended by the registry. Malformed entries will not be searchable, and may confuse other users. Equally, once you have listed a service in a service registry, remember that if the service is moved, shut down or altered, it will also be necessary to update the listing in the registry.

Conclusions

Service registries are an important part of the service-oriented Internet, automating and simplifying resource discovery. However, there is no single standard for service registries; as of today, each group has its own resource discovery strategy. Taking part in a service registry will generally lead to additional exposure for the listed services and resources, but does convey the additional responsibility of ensuring that listings remain up-to-date.

References

JISC Information Environment Service Registry,
<http://iesr.ac.uk/>
UDDI V3 Specification, Oasis,
<http://uddi.org/pubs/uddi-v3.00-published-20020719.htm>

Briefing 96

Open Standards For JISC Development Programmes

The Importance of Open Standards

Open standards play an important role in JISC's development programmes in order to help ensure:

Device-independence
Application independence
Avoidance of patented technologies
Architectural integrity
Ease interoperability across products from multiple suppliers
Ease of curation and long term preservation of digital resources

Open standards are of particular importance in a JISC development environment, in order to enable developments in one institution to be reused in another, and to support the diversity of the UK's higher and further education environment.

Open Standards and JISC's Development Programmes

JISC funds a range of development programmes. Various procedures and policies may be developed for the different programme calls. This briefing document outlines some of the key principles related to the selection and use of open standards for project teams (a) preparing proposals and (b) developing project deliverables, once a proposal has been accepted.

The key areas for projects are:

Ensuring that you have knowledge of open standards relevant to your proposal, including advice provided by JISC and JISC advisory services.
Ensuring that you have an understanding of potential difficulties and limitations of relevant open standards, and are able to address such concerns.
Ensuring that you can provide documentation of the section of open standards.
Ensuring that you have quality assurance procedures in place to ensure you make use of open standards in an appropriate fashion.

Programme-Specific Advice and Procedures

Please note that projects should consult with Programme Managers and appropriate documentation for specific information related to individual programme calls.

Preparing Your Proposal

When preparing a proposal it is important that you are familiar with standards which are relevant to your proposal and to the particular programme you are involved with. You should ensure that you read relevant resources such as resources documented in the programme call, QA Focus advisory resources [1], JISC services such as UKOLN [2], CETIS [3], etc.

Your proposal should demonstrate that you have an understanding of appropriate open standards and that you are able to implement the relevant open standards, or can provide valid reasons if you are not able to do this.

Implementing Your Proposal

If your proposal is accepted you will need to address the issue of the selection and deployment of standards in more detail. For example, you may have a work package on the selection of standards and the technical architecture used to create, manage and use the standards.

It is important that you have appropriate quality assurance processes in place in order to ensure that you are making use of open standards in an appropriate way. Further advice on quality assurance is provided on the QA Focus Web site [4].

Sharing Your Experiences

Since projects may be involved in developing innovative applications and services, making use of standards in innovative ways, making use of emerging standards or prototyping new standards, it is important that feedback is provided on the experiences gained. The feedback may be provided in various ways: on project reports; by producing case studies; by providing feedback on emailing lists, on Wikis, etc.

Where possible it is desirable that the feedback is provided in an open way, allowing other projects, programmes and the wider community to benefits form such experiences. JISC Programmes may put in place structures to support such feedback.

References

Briefing documents, QA Focus,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/>
UKOLN,
<http://www.ukoln.ac.uk/>
CETIS,
<http://www.cetis.ac.uk/>
Quality Assurance, QA Focus briefing documents,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/#qa>

About This Document

This document was produced to support JISC's development programmes. We are grateful to JISC for their feedback on the document.

Briefing 97

Introduction To OPML

OPML

OPML stands for Outline Processor Markup Language. OPML was originally developed as an outlining application by Radio Userland. However it has been adopted for a range of other applications, in particular providing an exchange format for RSS.

This document describes the OPML specification and provides examples of use of OPML for the exchange of RSS feeds.

The OPML Specification

The OPML specification [1] defines an outline as a hierarchical, ordered list of arbitrary elements. The specification is fairly open which makes it suitable for many types of list data. The OPML specification is very simple, containing the following elements:

<opml version="1.0">: The root element which contains the version attribute and one head and one body element.
<head>: Contains metadata. May include any of these optional elements: title, dateCreated, dateModified, ownerName, ownerEmail, expansionState, vertScrollState, windowTop, windowLeft, windowBottom, windowRight.
<body>: Contains the content of the outline. Must have one or more outline elements.
<outline>: Represents a line in the outline. May contain any number of arbitrary attributes. Common attributes include text and type.

Limitations Of OPML

OPML has various shortcomings:

OPML stores data in XML attributes, which violates a XML design principle.
Information about OPML items cannot itself be hierarchically marked up (ironically), due to the use of attributes to store that information.
The RFC-822 date format used in OPML is considered obsolete.

OPML Applications

Import and Export of RSS Files

OPML can be used in a number of application areas. One area of particular interest is in the exchange of RSS files. OPML can be used to group together related RSS feeds. RSS viewers which provide support for OPML can then be used to read in the group, to avoid having to import RSS files individually. Similarly RSS viewers may also provide the ability to export groups of RSS files as a single OPML file.

OPML Viewers

OPML viewers can be used to view and explore OPML files. OPML viewers have similar functionality as RSS viewers, but allow groups of RSS files to be viewed.

The QA Focus Web site makes use of RSS and OPML to provide syndication of the key QA Focus resources [2]. This is illustrated in Figure 1, which shows use of the Grazr inline OPML viewer [3]. This application uses JavaScript to read and display the OPML data.

Other OPML viewers include Optimal OPML [4] and OPML Surfer [5].

Figure 1: Grazr

Risk Assessment

It should be noted that OPML is a relatively new format and only limited experiences have been gained in its usage. Organisations who wish to make exploit the benefits of OPML should seek to minimise any risks associated with use of the format and develop migration strategies if richer or more robust alternative formats become available.

Acknowledgments

This briefing document makes use of information published in the OPML section on Wikipedia [6].

References

OPML Specification,
<http://www.opml.org/spec>
RSS Feeds, QA Focus,
<http://www.ukoln.ac.uk/qa-focus/rss/#opml>
Grazr,
<http://www.grazr.com/>
Optimal OPML,
<http://www.optimalbrowser.com/optimal.php>
OPML Surfer,
<http://www.kbcafe.com/rss/opmlsurfer.aspx>
OPML, Wikipedia
<http://en.wikipedia.org/wiki/Outline_Processor_Markup_Language>

Briefing 98

Risk Assessment For Making Use Of Third Party Web 2.0 Services

Background

This briefing document provides advice for Web authors, developers and policy makers who are considering making use of Web 2.0 services which are hosted by external third party services. The document describes an approach to risk assessment and risk management which can allow the benefits of such services to be exploited, whilst minimising the risks and dangers of using such services.

Note that other examples of advice are also available [1] [2].

About Web 2.0 Services

This document covers use of third party Web services which can be used to provide additional functionality or services without requiring software to be installed locally. Such services include:

Search facilities, such as Google University Search and Atomz.
Social bookmarking services, such as del.icio.us.
Wiki services, such as WetPaint.
Usage analysis services, such Google Analytics and SiteMeter.
Chat services such as Gabbly and ToxBox.

Advantages and Disadvantages

Advantages of using such services include:

May not require scarce technical effort.
Facilitates experimentation and testing.
Enables a diversity of approaches to be taken.

Possible disadvantages of using such services include:

Potential security and legal concerns e.g. copyright, data protection, etc.
Potential for data loss or misuse.
Reliance on third parties with whom there may be no contractual agreements.

Risk Management and Web 2.0

A number of risks associated with making use of Web 2.0 services are given below, together with an approach to managing the dangers of such risks.

Risk	Assessment	Management
Loss of service (e.g. company becomes bankrupt, closed down, ...)	Implications if service becomes unavailable. Likelihood of service unavailability.	Use for non-mission critical services. Have alternatives readily available. Use trusted services.
Data loss	Likelihood of data loss. Lack of export capabilities.	Evaluation of service. Non-critical use. Testing of export.
Performance problems. Unreliability of service.	Slow performance	Testing. Non-critical use.
Lack of interoperability.	Likelihood of application lock-in. Loss of integration and reuse of data.	Evaluation of integration and export capabilities.
Format changes	New formats may not be stable.	Plan for migration or use on a small-scale.
User issues	User views on services.	Gain feedback.

Note that in addition to risk assessment of Web 2.0 services, there is also a need to assess the risks of failing to provide such services.

Example of a Risk Management Approach

A risk management approach [3] was taken to use of various Web 2.0 services on the Institutional Web Management Workshop 2006 Web site.

Use of established services:: Google and Google Analytics are used to provide searching and usage reports.
Alternatives available:: Web server log files can still be analysed if the hosted usage analysis services become unavailable.
Management of services:: Interfaces to various services were managed to allow them to be easily changed or withdrawn.
User Engagement:: Users are warned of possible dangers and invited to engage in a pilot study.
Learning:: Learning may be regarded as the aim, not provision of long term service.

References

Checklist for assessing third-party IT services, University of Oxford,
<http://www.oucs.ox.ac.uk/internal/3rdparty/checklist.xml>
Guidelines for Using External Services, University of Edinburgh,
<https://www.wiki.ed.ac.uk/download/attachments/8716376/GuidelinesForUsingExternalWeb2.0Services-20080801.pdf?version=1>
Risk Assessment, IWMW 2006, UKOLN,
<http://www.ukoln.ac.uk/web-focus/events/workshops/webmaster-2006/risk-assessment/>

Briefing 99

Impact Analysis For Web Sites

Background

This briefing document provides advice on approaches to measuring the impact of a service provided by a Web site.

The document describes an approach to risk assessment and risk management which can allow the benefits of such services to be exploited, whilst minimising the risks and dangers of using such services.

Traditional Approaches To Impact Analysis

A traditional approach to measuring the impact of a Web site is to report on Web server usage log files [1]. Such data can provide information such as an indication of trends and growth in usage; how visitors arrived at the Web site; how users viewed pages on your Web site and details on the browser technologies used by your visitors.

However although such information can be useful, it is important to recognise that the underlying data and the data analysis techniques used may be flawed [2]. For example:

As search engines may deliver significant traffic to Web sites, potentially they may result in visitors who find the wrong type of information, and immediately leave after viewing the first page.
Use of caches, offline browsers, etc may influence the findings.
Trends may be misleading, as they may be influenced by changes in the Web site architecture (for example, if visitors spend more time on your Web site after a Web site redesign, does this indicate that they are finding additional useful resources, or wasting time looking for resources?

It should also be noted that care must be taken when aggregating usage statistics:

You should not aggregate monthly findings to determine, say, annual number of unique visitors.
Aggregation of data from disparate Web sites may give misleading information.

So although analysis of Web site usage data may be useful, the findings need to be carefully interpreted.

Other Approaches To Impact Analysis

Although Web site usage analysis may have flaws, there are other approaches which can be used to measure the impact of a Web site: Such alternative can be used to complement Web usage analysis.

Link analysis: If other Web sites have links to your Web site, this can be an indication of the value placed on you Web site. Services such as LinkPopularity.com [3] can provide such data. Keeping a record of the numbers of sites linking to you can also help show trends.
Analysis of social bookmarking services:: Services such as del.icio.us [4] allow you to bookmark resources. A useful aspect of the service is the ability to observe others who are bookmarking the same resource. So bookmarking your own Web site will allow you to record the numbers of people who bookmark your site. This may be a useful indicator, if the social bookmarking service you use if popular with your target audience.
User comments:: Comments from your user community can provide a particularly valuable way of measuring impact. Feedback can be obtained in a variety of ways: focus groups; online questionnaires, online guest books, etc.
Analysis of Web sites, mailing lists, Blogs, etc.:: Search engines such as Google, Technorati [5]], etc. may enable you to find comments about your Web site, but also provide various metrics which may be useful.

Possible disadvantages of using such services include:

Potential security and legal concerns.
Potential for data loss.
Reliance on third parties with whom there may be no contractual agreements.

Embedding Impact Analysis

In order to maximise the benefits, you may find it useful to develop an Impact Analysis Strategy. This should ensure that you are aware of the strengths and weaknesses of the approaches you plan to use, have mechanisms for gathering information in a consistent and effective manner and that appropriate tools and services are available.

References

Usage Statistics For Web Sites, QA Focus briefing document no. 84, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-84/>
Performance Indicators for Web Sites, B. Kelly, Exploit Interactive, issue 5, April 2000,
<http://www.exploit-lib.org/issue5/indicators/>
LinkPopularity.com,
<http://www.linkpopularity.com/>
del.icio.us,
<http://del.icio.us/>
Technorati,
<http://www.technorati.com/>

Briefing 100

An Introduction To Microformats

Background

This document provides an introduction to microformats, with a description of what microformats are, the benefits they can provide and examples of their usage. In addition the document discusses some of the limitations of microformats and provides advice on best practices for use of microformats.

What Are Microformats?

"Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns (e.g. XHTML, blogging)." [1].

Microformats make use of existing HTML/XHTML markup: Typically the <span> and <div> elements and class attribute are used with agreed class name (such as vevent, dtstart and dtend to define an event and its start and end dates). Applications (including desktop applications, browser tools, harvesters, etc.) can then process this data.

Examples Of Microformats

Popular examples of microformats include:

hCard: Markup for contact details such as name, address, email, phone no., etc. Browser tools such as Tails Export [2] allow hCard microformats in HTML pages to be added to desktop applications (e.g. MS Outlook).
hCalendar: Markup for events such as event name, date and time, location, etc. Browser tools such as Tails Export and Google hCalendar [3] allow hCalendar microformats in HML pages to be added to desktop calendar applications (e.g. MS Outlook) and remote calendaring services such as Google Calendar.

An example which illustrates the commercial takeup of the hCalendar microformat is its use with the World Cup 2006 fixture list [4]. This application allows users to choose their preferred football team. The fixtures are marked up using hCalendar and can be easily added to the user's calendaring application.

Limitations Of Microformats

Microformats have been designed to make use of existing standards such as HTML. They have also been designed to be simple to use and exploit. However such simplicity means that microformats have limitations:

Possible conflicts with the Semantic Web approach: The Semantic Web seeks to provide a Web of meaning based on a robust underlying architecture and standards such as RDF. Some people feel that the simplicity of microformats lacks the robustness promised by the Semantic Web.
Governance: The definitions and ownership of microformats schemes (such as hCard and hCalendar) is governed by a small group of microformat enthusiasts.
Early Adopters: There are not yet well-established patterns of usage, advice on best practices or advice for developers of authoring, viewing and validation tools.

Best Practices for Using Microformats

Despite their limitations microformats can provide benefits to the user community. However in order to maximise the benefits and minimise the risks associated with using microformats it is advisable to make use of appropriate best practices. These include:

Getting it right from the start: Seek to ensure that microformats are used correctly. Ensure appropriate advice and training is available and that testing is carried out using a range of tools. Discuss the strengths and weaknesses of microformats with your peers.
Having a deployment strategy: Target use of microformats in appropriate areas. For example, simple scripts could allow microformats to be widely deployed, yet easily managed if the syntax changes.
Risk management: Have a risk assessment and management plan which identifies possible limitations of microformats and plans in case changes are needed [5].

References

About Microformats, Microformats.org,
<http://microformats.org/about/>
Tails Export: Overview, Firefox Addons,
<https://addons.mozilla.org/firefox/2240/>
Google hCalendar,
<http://greasemonkey.makedatamakesense.com/google_hcalendar/>
World Cup KickOff,
<http://www.worldcupkickoff.com/>
Risk Assessment For The IWMW 2006 Web Site, UKOLN,
<http://www.ukoln.ac.uk/web-focus/events/workshops/webmaster-2006/risk-assessment/#microformats>

Briefing 101

Tangram Model For Web Accessibility

Background

This document describes a user-focussed approach to Web accessibility in which the conventional approach to Web accessibility (based on use of WAI WCAG guidelines) can be applied within a wider context.

Traditional Approach To Web Accessibility

The conventional approach to Web accessibility is normally assumed to be provided by implementation of the Web Content Accessibility Guidelines (WCAG) which have been developed by the Web Accessibility Initiative (WAI).

In fact the WCAG guidelines are part of a set of three guidelines developed by WAI, the other guidelines being the Authoring Tools Accessibility Guidelines (ATAG) and the User Agent Accessibility Guidelines (UAAG). The WAI approach is reliant on full implementation of these three sets of guidelines.

Limitations Of The WAI Approach

Although WAI has been a political success, with an appreciation of the importance of Web accessibility now widely acknowledged, and has provided a useful set of guidelines which can help Web developers produce more accessible Web sites, the WAI model and the individual guidelines have their flaws, as described by Kelly et al [1]:

Flaws in WAI model: The WAI approach relies not only on WCAG-compliant content, but also on the availability and use of browsers which comply with UAAG guidelines and authoring tools which comply with ATAG guidelines. In practice, software which complies with these guidelines is not widely available.
WCAG Limitations: The WCAG 1.0 guidelines were published in 1999. Since then limitations and ambiguities in the guidelines have become apparent.
Alternative approaches: The WAI approach is based on universal accessibility of Web resources. However there will be circumstances in which this approach may not be applicable. For example in e-learning, the important factor is not necessarily the accessibility of e-learning resources but the accessibility of learning outcomes (see [2]). Such an approach encourages a diversity of solutions (including other IT solutions as well as real world alternatives).

The Tangram Model

Although the WAI approach has its flaws (which is understandable as this was an initial attempt to address a very difficult area) it needs to be recognised that WCAG guidelines are valuable. The challenge is to develop an approach which makes use of useful WCAG guidelines in a way which can be integrated with others areas of best practices (e.g. including usability, interoperability, etc.) and provides a richer and more usable and accessible experience to the target user community.

In the tangram model for Web accessibility (developed by Sloan, Kelly et al [3]) each piece in the tangram (see below left) represents guidelines in areas such as accessibility, usability, interoperability, etc. The challenge for the Web developer is to develop a solution which is 'pleasing' to the target user community (see below right).

Tangram model

The tangram model provides several benefits:

User-focussed: The aim is explicitly on provided appropriate solutions for the user, rather than compliance with guidelines.
Holistic: The approach explicitly covers several areas.
Technology-neutral: approach is not reliant on an individual technology.
Automated testing given context: : Individual pieces can represent best practices which can be tested by automated tools (e.g. present of ALT attributes). However the model makes it clear that a single 'piece' cannot provide the ideal solution.

References

Forcing Standardization or Accommodating Diversity? A Framework for Applying the WCAG in the Real World, Kelly, Sloan et al, Proceedings of the 2005 International Cross-Disciplinary Workshop on Web Accessibility (W4A),
<http://www.ukoln.ac.uk/web-focus/papers/w4a-2005/>
Developing A Holistic Approach For E-Learning Accessibility, Kelly, Phipps and Swift, CJLT 2004, 3(1),
<http://www.ukoln.ac.uk/web-focus/papers/cjtl-2004/>
Contextual Web Accessibility - Maximizing the Benefit of Accessibility Guidelines, Sloan, et al, Proceedings of the 2006 International Cross-Disciplinary Workshop on Web Accessibility (W4A),
<http://www.ukoln.ac.uk/web-focus/papers/w4a-2006/>

Briefing 102

Web 2.0: Supporting Library Users

Introduction

Web 2.0 is described by Wikipedia as referring to "a second generation of services available on the web that lets people collaborate and share information online" [1].

Web 2.0 is essentially all about creating richer user experiences through providing interactive tools and services, which sit on top of static web sites. Web sites which utilise aspects of Web 2.0 are often personalisable, dynamically driven, and rich in community tools and sharing functions. The data from underlying systems, such as a Library Management System, can be exposed and shared, usually using XML. Web 2.0 developments are underpinned by open source software and open standards - and they often use widely available components such as RSS, blogging tools and social bookmarking services.

Recently, the term Library 2.0 has also come to prominence. Library 2.0 [2] follows the principles of Web 2.0, in that it promotes the evaluation and adoption of software and tools which were originally created outside of the Library environment. These are over-layed on traditional library services - such as the Library OPAC - in order to create a more dynamic, interactive and personalisable user experience.

Re-inventing the Library OPAC

Web 2.0 technologies can be used to vastly improve the user experience when searching the Library OPAC. Traditionally, many Library OPACs have been designed with librarians rather than users in mind, resulting in interfaces which are not intuitive or attractive to users.

An excellent example of Web 2.0 in action comes from Plymouth State University [3]. The Library OPAC has been redesigned to give it the look and feel of a blog. The home page consists of a new books feed, presented in an engaging and attractive fashion, in contrast to the more usual approach of providing a set of search options. Users can view items, or click on options to 'find more like this' 'comment' or 'get more details'. The site is powered by WPopac, which the developer describes as 'an OPAC 2.0 Testbed' [4]. WPopac is itself based around the WordPress blogging tool [5].

The site contains lots of other useful features such as a list of the most popular books, recent search terms used by other users, ability to tag items with user-chosen tags, user comments and book reviews. Many of these features are provided by the use of RSS feeds from the Library Management System itself. The tagging tools are provided by an external social tagging Web site called Technorati [6].

Book reviews and detailed descriptions are also provided from the Amazon Web site, and other Amazon services [7] such as 'search inside' and 'tables of contents' are also integrated into the site.

Another interesting approach to the OPAC comes from Ann Arbor District Library (a US public library), who have created an online 'wall of books' using book jacket images [8]. Each image in the 'wall' links back to an item record in the Library OPAC. This is a novel approach to presenting and promoting library services, using an attractive 'virtual' library display to entice people into the OPAC for further information.

Services Integration

Google have provided a number of solutions for Libraries wishing to encourage their users to access local electronic and print resources when searching in Google Scholar [9]. Libraries can link their collections to Google Scholar to ensure that users are directed to locally-held copies of resources. The OCLC WorldCat database has already been harvested by Google Scholar, so that all the records in this database (a large Union Catalogue) are searchable via Google Scholar.

Google is also offering its 'Library Links' programme which enables libraries using an OpenURL link resolver to include a link from Google Scholar to their local resources as part of the Google Scholar search results. Google Scholar users can personalise their searching by selecting their 'home' library as a preference. In order to set up integration with an OpenURL resolver, libraries simply need to export their holding from their link resolver database and send this to Google. Once set up, Google will harvest new links from the link resolver database on an ongoing basis.

Library 'Mash-ups'

A mash-up can be described as a web site which uses content from more than one source to create a completely new service. One example of a tool which can be used to support 'mash-ups' is Greasemonkey [10]. Greasemonkey is an extension for the Firefox web browser which enables the user to add scripts to any external web page to change its behaviour. The use of scripts enables the user to easily control and re-work the design of the external page to suit their own specific needs.

The University of Huddersfield have used Greasemonkey to enable local information to be displayed in Amazon when a user is searching Amazon [11]. The information is drawn from their Library OPAC - for example, users can be shown whether a particular title is available in the Library at Huddersfield, and can link back to the Library OPAC to see the full record. If the book is out on loan, a due date can be shown. This approach encourages users back into the Library environment to use Library services rather than making a purchase through Amazon.

A New Approach to Resource Lists

Intute is a free online service managed by JISC for the UK higher education community [12]. It provides access to a searchable catalogue of quality-stamped web resources suitable for use in education. It was previously known as the RDN (Resource Discovery Network). Intute can be used by libraries to create their own local lists of web resources, thus reducing the need for libraries to maintain their own lists of useful Web sites for their users. Libraries can use the MyIntute service to create their own lists of resources drawn from the Intute catalogue. These can be updated on a weekly basis, if required, by creating weekly email alerts which identify new records added to the database which meet a stated set of criteria. The records can then be exported in order to present them on the libraries' own Web site, using local branding and look and feel. Library staff can add new records to the Intute database if they find useful resources which have not already been catalogued in Intute.

Supporting Users

Blogs are increasingly used by libraries as promotional, alerting and marketing tools; providing a useful method of promoting new services, alerting users to changes and offering advice and support.

The Library at the Royal College of Midwives [13] runs a blog to keep users informed about new service developments. Typical postings include information about new books received by the Library, pointers to content from electronic journals, news about new projects and services and other items which might usually be found in a Library newsletter. The advantage of the blog over a newsletter is that it is quick and easy to maintain and update, can therefore be kept more up-to-date, and can be integrated into the other services that the Library offers, through the Library Web site. Users can also choose to receive it as an RSS feed.

At the University of Leeds, Angela Newton runs a blog which is aimed at supporting information literacy developments [14]. Angela's blog is focused mainly at academic and support staff at Leeds who are interested in developing information-literate students. She focuses on a range of topics, such as assessment and plagiarism, academic integrity, use of information in education, research methods and e-learning tools. Angela's blog is one of a number of 'practitioner' blogs at Leeds, and has been well received by the e-learning community within the institution.

Another example is the "Univ of Bath Library Science News" service. In this case a blog service hosted at Blogspot.com provides "Updates for the Faculty of Science from your Subject Librarians: Linda Humphreys (Chemistry, Physics, Pharmacy & Pharmacology, Natural Sciences) Kara Jones (Biology & Biochemistry, Mathematics and Computing Sciences)" [15].

References

Web 2.0, Wikipedia, Proceedings of the 2005 International Cross-Disciplinary Workshop on Web Accessibility (W4A),
<http://en.wikipedia.org/wiki/Web_2.0>
Library, Wikipedia,
<http://en.wikipedia.org/wiki/Library_2.0>
Plymouth State University Library OPAC,
<http://www.plymouth.edu/library/opac/>
WPopac,
<http://maisonbisson.com/blog/post/11133/>
WordPress,
<http://wordpress.org/>
Technorati,
<http://technorati.com/>
Amazon Web Services,
<http://www.amazon.com/b/104-7926800-8152762?ie=UTF8&node=3435361>
Ann Arbor District Library wall of books,
<http://www.superpatron.com/wall-of-books/aadl/aadl-fiction-20060322.html>
Google Scholar Services for Libraries, Google, Google,
<http://scholar.google.com/scholar/libraries.html>
Greasemonkey,
<http://greasemonkey.mozdev.org/>
Greasemonkey example,
<http://library.hud.ac.uk/mediawiki/dughug2006/UsingFreeSoftware.zip>
Intute,
<http://www.intute.ac.uk/>
Royal Colleges of Midwives Library blog,
<http://midwifery-info.blogspot.com/>
Angela Newton's Information Literacy blog, University of Leeds,
<https://elgg.leeds.ac.uk/libajn/weblog/>
Univ of Bath Library Science News, University of Bath,
<http://bathsciencenews.wordpress.com/>

Briefing 103

Web 2.0: Addressing the Barriers to Implementation in a Library Context

Introduction

Web 2.0 is described by Wikipedia as referring to "a second generation of services available on the web that lets people collaborate and share information online" [1].

Web 2.0 is essentially about creating richer user experiences through providing interactive tools and services. Web sites which utilise aspects of Web 2.0 are often personalisable, dynamically driven, and rich in community tools and sharing functions. The data from underlying systems, such as a Library Management System, can be exposed and shared, usually using XML. Web 2.0 developments are underpinned by open source software and open standards - and they often use widely available components such as RSS, blogging tools and social bookmarking services.

Recently, the term Library 2.0 has also come to prominence [2]. Library 2.0 follows the principles of Web 2.0, in that it promotes the evaluation and adoption of software and tools which were originally created outside of the Library environment. These are over-layed on traditional library services - such as the Library OPAC - in order to create a more dynamic, interactive and personalisable user experience.

Barriers to Implementation

Information professionals often express concern about a number of issues relating to implementation of Web 2.0 within their Library service.

Scaleability

A significant concern centres on questions regarding scaleability of the Web 2.0 approach. Information professionals are usually concerned with finding institution-wide approaches to service provision - for example, in relation to reading lists it is important that a Library service is able to receive these from tutors in a timely fashion, that they are supplied in a consistent format and that they are made available in a standard and consistent way - perhaps through a central reading list management service with links to the Library Web site and the VLE. If other approaches start to be used it becomes difficult for these to be managed in a consistent way across the institution. For example, a tutor might decide to use a service such as Librarything [3] to create his or her own online reading list. Information professionals then face a potential headache in finding out about this list, synchronising it with other approaches across the institution, integrating it with other systems on campus such as the Library Sys tem, VLE or portal and presenting it in a consistent fashion to students.

Support Issues

Another concern centres on the supportability of Web 2.0 tools and services. Information professionals are often involved with training users to use library tools and services. They use a variety of approaches in order to achieve this ranging from hands-on training sessions, through to paper-based workbooks and interactive online tutorials. Information professionals are often concerned that users will struggle to use new tools and services. They are keen to develop training strategies to support implementation of new services. With Web 2.0 this can be difficult as users may be using a whole range of freely available tools and services, many of which the information professional may not themselves be familiar with. For example, if Tutor A is using Blogger [4] with her students, whereas Tutor B is using ELGG [5]], information professionals may find themselves being faced with expectations of support being provided for a wide variety of approaches. Students might encounter a different set of tools for each module that they take, and the support landscape quite quickly becomes cluttered. Information professionals can start to feel that they are losing control of the environment in which they are training and supporting users, and may also start to feel uneasy at their own inability to keep up to speed with the rapid changes in technology.

Longevity

Information professionals are also concerned about the longevity of Web 2.0 services. By its very nature, Web 2.0 is a dynamic and rapidly moving environment. Many of the tools currently available have been developed by individuals who are committed to the open source and free software movement who may not be backed by commercial funding; the individuals may have a tendency to lose interest and move on to other things once an exciting new piece of technology comes along. Some successful tools may end up being bought by commercial players which might result in their disappearance or incorporation into a commercial charging model. It appears risky to rely on services which may disappear at any time, where no support contract is available, no guarantee of bugs being fixed or formal processes for prioritisation of developments.

Commercialisation

Where Web 2.0 developments are backed by commercial organisations, this may also cause some concern. For example, Amazon provide many useful Web 2.0 services for enhancing the Library OPAC. However, information professionals may feel uneasy about appearing to be promoting the use of Amazon as a commercial service to their users. This might potentially damage relationships with on-campus bookshops, or leave the Library service open to criticism from users that the Library is encouraging students to purchase essential materials rather than ensuring sufficient copies are provided.

Web 2.0 technologies may also raise anxieties concerning strategy issues. Information professionals might worry that if Google Scholar [6] is really successful this would reduce use of Library services, potentially leading to cancellation of expensive bibliographic and full-text databases and a resulting decline in the perceived value of the Library within the institution. Library strategies for promoting information literacy might potentially be undermined by students' use of social tagging services which bypass traditional controlled vocabularies and keyword searching. The investment in purchase and set-up of tools such as federated search services and OpenURL resolvers might be wasted because users bypass these services in favour of freely available tools which they find easier to use.

Addressing the Barriers

Building On Web 2.0 Service

Information professionals can turn many of the perceived drawbacks of Web 2.0 to their advantage by taking a proactive approach to ensure that their own services are integrated with Web 2.0 tools where appropriate. Google, for example, offers a 'Library Links' programme which enables libraries using an OpenURL link resolver to include a link from Google Scholar to their local resources as part of the Google Scholar search results. Google Scholar users can personalise their searching by selecting their 'home' library as a preference. In order to set up integration with an OpenURL resolver, libraries simply need to export their holding from their link resolver database and send this to Google. Once set up, Google will harvest new links from the link resolver database on an ongoing basis. By using such tools, the information professional can put the Library back in the picture for users, who can then take advantage of content and services that they might not otherwise have come across because they had by-passed the Library Web site.

Working With Vendors

It is also important to work with Library Systems vendors to encourage them to take an open approach to Web 2.0 integration. Many Library Systems vendors are starting to use Web 2.0 approaches and services in their own systems. For example, the Innovative Interfaces' LMS [7] now provides RSS feed capability. RSS feeds can be surfaced within the OPAC, or can be driven by data from the Library system and surfaced in any other Web site. This provides a useful service for ensuring greater integration between Library content and other systems such as institutional VLEs or portals. Users can also utilise RSS feeds to develop their own preferred services. Information professionals should lobby their LMS partners to support a more open standards approach to service development and integration, and should ensure that they don't get too constrained by working with one particular vendor.

Working With National Services

Use of the national services can also help to ensure a more sustainable approach is taken to Web 2.0 developments. For example Intute [8], a nationally funded service working closely with the higher education community, can provide a range of Web 2.0 services which libraries can utilise in enhancing their own local services, with the advantage that the service is funded on a long-term basis and support can be provided by a dedicated team of staff employed full-time to resolve issues, develop services and fix bugs.

Working With Peers

Information professionals also need to work together to share ideas and experiences, implement developments and learn from each other. There are already lots of good examples of this kind of sharing of ideas and expertise - for example, Dave Pattern's blog at the University of Huddersfield [9] provides a useful resource for those interested in implementing Web 2.0 services in a library context. It is also important that information professionals are willing to work across professions - implementation of Web 2.0 services requires the contribution of IT professionals, e-learning specialists and academics.

Trying It

Finally, information professionals need to be willing to get their hands dirty and take risks. Web 2.0 is often concerned with rapid deployment of services which may still be in a beta state. The key is to get things out quickly to users, then develop what works and learn from things that doesn't. This requires a willingness to be flexible, a confidence in users' ability to cope with rapid change without requiring a lot of hand-holding and support, and the courage to step back from a particular approach that hasn't worked and move on to other things. Not everything will be successful or enthusiastically taken up by your users and you need to be prepared to cut your losses if something does not work. Users' views should be actively sought, and they should be involved in the development and testing process wherever possible.

References

Web 2.0, Wikipedia,
<http://en.wikipedia.org/wiki/Web_2.0>
Library, Wikipedia,
<http://en.wikipedia.org/wiki/Library_2.0>
Librarything,
<http://www.librarything.com/>
Blogger,
<http://www.blogger.com/start>
ELGG,
<http://elgg.net/>
Google Scholar Services for Libraries,
<http://scholar.google.com/scholar/libraries.html>
Innovative Interfaces Inc,
<http://www.iii.com/>
Intute,
<http://www.intute.ac.uk/>
Dave Pattern's blog,
<http://www.daveyp.com/blog/>

Briefing 104

Guide To The Use Of Wikis At Events

About This Document

This document describes how Wiki software can be used to enrich events such as conferences and workshops. The document provides examples of how Wikis can be used; advice on best practices and details of a number of case studies.

Use of Wikis at Events

Many events used to support the development of digital library services nowadays take place in venues in which WiFi network is available. Wikis (Web-based collaborative authoring tools) are well suited to exploit such networks in order to enrich such events in a variety of ways.

Examples of how Wikis can be used at events include:

Note-taking at discussion groups:: Wikis are ideal for use by reporters in discussion groups. They are easy to use and ensure that, unlike use of flip charts of desktop applications, the notes are available without any further processing being required.
Social support before & during the event:: Wikis can be used prior to or during events, to enable participants to find people with similar interests (e.g. those travelling from the same area; those interested in particular social events; etc.).

Best Practices

Issues which need to be addressed when planning a Wiki service at an event include:

AUP: It is advisable to develop an Acceptable Use Policy (AUP) covering use of the Wiki and other networked services. The AUP may cover acceptable content; responsibilities and preservation of the data.
Local or Hosted Services: There will be a need to decide whether to make use of a locally-hosted service or choose a third party service. If the latter option is preferred there is a need to decide whether to purchase a licences service or use a free service (which may have restructured functionality). In all cases there will be a need to establish the at Wiki software provides the functionality desired.
Registration Issues: You will need to decide whether registration is needed in order to edit the Wiki. Registration can help avoid misuse of the Wiki, but registration, especially if it requires manual approval, can act as a barrier to use of the Wiki.
Design Of The Wiki: Prior to the event you should establish the structure and appearance of the Wiki. You will need to provide appropriate navigational aids (e.g. the function of a 'Home' link; links between different areas; etc.).
After The Event: You will need to develop a policy of use of the Wiki after the event. You may, for example, wish to copy contents from the Wiki to another area (especially if the Wiki is provided by a third party service).

Examples Of Usage

A summary of use of Wikis at several UKOLN events are given below.

**Table 1: Use of Wikis at UKOLN Events**
Event	Comments
Joint UKOLN/UCISA Workshop, Nov 2004	A workshop on "Beyond Email: Strategies For Collaborative Working In The 21st Century" was held in Leeds in Nov 2004. The four discussion groups each used a Wiki provided by the externally-hosted Wikalong service. The Wiki pages were copied to the UKOLN Web site after the event. It was noted that almost 2 years after the event the original pages were still intact (although link spam was subsequently found). See <http://www.ukoln.ac.uk/web-focus/events/workshops/ucisa-wlf-2004-11/>
IWMW 2005 Workshop, Jul 2005	Wikalong pages were set up for the discussion groups at the Institutional Web Management Workshop 2005. As an example of the usage see <http://www.ukoln.ac.uk/web-focus/events/workshops/ webmaster-2005/discussion-groups/south-east/>
Joint UKOLN / CETIS / UCISA Workshop, Feb 2006	A workshop on Initiatives & Innovation: Managing Disruptive Technologies was held at the University of Warwick in Feb 2006. The MediaWiki software was installed locally for use by the participants to report on the discussion groups. See <http://www.ukoln.ac.uk/web-focus/events/workshops/ ucisa-ukoln-cetis-2006/wiki/>.
ALT Research Seminar 2006	The licensed Jot Wiki service was used during a day's workshop on Emerging Technologies and the 'Net generation'. The notes kept on the Wiki were used to form the basis of a white paper. See <http://altspring.jot.com/WikiHome>.
IWMW 2006 Workshop, Jun 2006	The MediaWiki software was used by the participants at the Institutional Web Management Workshop 2006 to report on the discussion groups and various sessions. It was also available for use prior to the event. See <http://www.ukoln.ac.uk/interop-focus/community/index/IWMW2006>

Briefing 105

Use Of Social Tagging Services At Events

About This Document

This document describes how social tagging services can be used to enrich events such as conferences and workshops. The document provides examples of how social tagging services can be used; advice on best practices and details of a number of case studies.

Use Of Social Tagging Services At Events

Social tagging services, such as social bookmarking services (e.g. del.icio.us) and photo sharing services (e.g. Flickr) would appear to have much to offer events such as conferences and workshops, as such events, by definition, are intended for groups of individuals with shared interests.

Since many events used to support the development of digital library services nowadays take place in venues in which WiFi network is available, such events are ideal for exploiting the potential of social tagging services.

Examples of how social tagging can be used at events include:

Proving Links To Resources:: Making use of social bookmarking services such as del.icio.us enables resources mentioned in presentations to be more easily accessed (no need for speakers to waste time spelling out long URLs).
Finding Related Resources:: Use of social bookmarking services such as del.icio.us enable resources related to those provided by the speaker to be more easily found.
Contributing Related Resources:: Use of social bookmarking services such as del.icio.us enable participants to provide additional resources related to those provided by the speaker.
Evaluation And Impact Analysis:: Use of a standard tag for an event can enable Blog postings, photographs and other resources related to the event to be quickly found, using services such as Technorati. This can help assist the evaluation and impact analysis for an event.
Community Building:: Use of photo sharing services such as Flickr can enable participants at an event to share their photographs of the event.

Best Practices

Issues to be addressed when using social networking services at events include:

Selecting The Services: You should inform participants of the recommend services and, ideally, provide an example of the benefits to the participants, especially if they are unfamiliar with them.
Defining Your Tags: You should provide a recommended tag for your event, and possibly a format for use of the tag if more than one tag is to be used. You should try to ensure that your tag is unambiguous and avoids possible clashes with other uses of the tag. Note that use of dates can help to disambiguate tags.
Be Aware Of Possible Dangers: You should be aware of possible dangers and limitations of the services, and inform your participants of such limitations. This may include possible spamming of the services; data protection and privacy issues; long term persistence of the services; etc.

Case Study - IWMW 2006

The Institutional Web Management Workshop 2006, held at the University of Bath on 14-16^th June 2006, made extensive use of social bookmarking services:

Standardised Tags: The tag iwmw2006 was recommended for use in social bookmarking services. Tags for plenary talks and parallel sessions had the format iwmw2006-plenary-speaker and iwmw2006-parallel-facilitator.
del.icio.us: The IWMW 2006 Web site was bookmarked on del.icio.us (this allowed the event organisers to observe other who had bookmarked the Web site in order to monitor communities of interest). In addition, speakers and workshop facilitators were invited to use del.icio.us for key resources mentioned in their talks.
Flickr: Participants who took photographs at the event were encouraged to tag the photographs with the tag iwmw2006 if they uploaded their photos to services such as Flickr.
Event Evaluation and Impact Assessment Using Technorati: The impact of the workshop was evaluated by using the Technorati service to find Blog postings which made use of the iwmw2006 tag.
Aggregation Using Suprglu: The various social bookmarking services which used the recommended tag were aggregated by the Suprglu service at the URL <http://iwmw2006.suprglu.com/>.

Briefing 106

Exploiting Networked Applications At Events

About This Document

Increasingly WiFi networks are available in lecture theatres [1]. With greater ownership of laptops, PDAs, etc. we can expect conference delegates to make use of the networks. There is a danger that this could lead to possible misuse (e.g. accessing inappropriate resources; reading email instead of listening; etc.) This document describes ways in which a proactive approach can be taken in order to exploit enhance learning at events. The information in this document can also be applied to lectures aimed at students.

Design Of PowerPoint Slides

A simple technique when PowerPoint slides are used is to make the slides available on the Web and embed hypertext links in the slides (as illustrated). This allows delegates to follow links which may be of interest.

Use of PowerPoint

Providing access to PowerPoint slides can also enhance the accessibility of the slides (e.g. visually impaired delegates can zoom in on areas of interest).

Using Bookmarking Tools

Social bookmarking tools such as del.icio.us can be used to record details of resources mentioned [2] and new resources added and shared.

Realtime Discussion Facilities

Providing discussion facilities such as instant messaging tools (e.g. MSN Messenger, Jabber or Gabbly) can enable groups in the lecture theatre to discuss topics of interest.

Support For Remote Users

VoIP (Voice over IP) software (such as Skype) and related audio and video-conferencing tools can be used to allow remote speakers to participate in a conference [3] and also to allow delegates to listen to talks without being physically present.

Using Blogs And Wikis

Delegates can make use of Blogs to take notes: This is being increasingly used at conferences, especially those with a technical focus, such as IWMW 2006 [4]. Note that Blogs are normally used by individuals. In order to allow several Blogs related to the same event to be brought together it is advisable to make use of an agreed tag [5].

Unlike Blogs, Wikis are normally used in a collaborative way. They may therefore be suitable for use by small groups at a conference [6]. An example of this can be seen at the WWW 2006 conference [7].

Challenges

Although WiFi networks can provide benefits there are several challenges to be addressed in order to ensure that the technologies do not act as a barrier to learning.

User Needs: Although successful at technology-focussed events, the benefits may not apply more widely. There is a need to be appreciative of the event environment and culture. There may also be a need to provide training in use of the technologies.
AUP: An Acceptable Use Policy (AUP) should be provided covering use of the technologies.
Performance Issues, Security, etc.: There is a need to estimate the bandwidth requirements, etc. in order to ensure that the technical infrastructure can support the demands of he event. There will also be a need to address security issues (e.g. use of firewalls; physical security of laptops, etc.).There is a need to estimate the bandwidth requirements, etc. in order to ensure that the technical infrastructure can support the demands of the event. There will also be a need to address security issues (e.g. use of firewalls; physical security of laptops, etc.).
Equal Opportunities: If not all delegates will possess a networked device, care should be taken to ensure that delegates without such access are not disenfranchised.

References

Using Networked Technologies To Support Conferences, Kelly, B. et al, EUNIS 2005,
<http://www.ukoln.ac.uk/web-focus/papers/eunis-2005/paper-1>
Use Of Social Tagging Services At Events, QA Focus briefing document no. 105,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-105/>
Interacting With Users, Remote In Time And Space, Phipps, L. et al, SOLSTICE 2006,
<http://www.ukoln.ac.uk/web-focus/events/conferences/solstice-2006/>
Workshop Blogs, IWMW 2006,
<http://www.ukoln.ac.uk/web-focus/events/workshops/webmaster-2006/blogs/>
Use Of Social Tagging Services At Events, QA Focus briefing document no. 105,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-105/>
Guide To The Use Of Wikis At Events, QA Focus briefing document no. 104,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-104/>
WWW2006 Social Wiki, WWW 2006,
<http://www2006.org/wiki/w/Main_Page>

Briefing 107

Guidelines For Exploiting WiFi Networks At Events

About This Document

Increasingly WiFi networks are available in lecture theatres, conference venues, etc. We are beginning to see various ways in which networked applications are being used to enhance conferences, workshops and lectures [1].

Availability Of The Network

If you are considering making use of a WiFi network to support an event you will need to ensure that (a) a WiFi network is available; (b) costs, if any, for use of the network and (c) limitations, if any, on use of the network. Note that even if a WiFi network is available, usage may restricted (e.g. to academic users; local users; etc.)

Demand From The Participants

There may be a danger in being driven by the technology (just because a WiFi network is available does not necessarily mean that the participants will want to make use of it). Different groups may have differing views on the benefits of such technologies (e.g. IT-focussed events or international events attracting participants from North America may be particularly interested in making use of WiFi networks).

If significant demand for use of the WiFi network is expected you may need to discuss this with local network support staff to ensure that (a) the network has sufficient bandwidth to cope with the expected traffic and (b) other networked services have sufficient capacity (e.g. servers handling logins to the network).

Proactive Or Reactive Approach?

You may choose to provide details of how to access the WiFi network and leave the participants to make use of it as they see fit. Alternatively you may wish to manage the way in which it is used, and provide details of networked applications to support the event, as described in [2], [3] and [4].

Financial And Administrative Issues

If there is a charge for use of the network you will have to decide how this should be paid for? You may choose to let the participants pay for it individually. Alternatively the event organisers may chose to cover the costs.

You will also have to set up a system for managing usernames and passwords for accessing the WiFi network. You may allocate usernames and passwords as participants register or they may have to sign a form before receiving such details.

Support Issues

There will be a need to address the support requirements to ensure that effective use is made of the technologies.

Participants: There may be a need to provide training and to ensure participants are aware of how the networked technologies are being used.
Event Organisers, Speakers, etc.: Event organisers, chairs or sessions, speakers, etc should also be informed of how the networked technologies may be used and may wish to give comments on whether this is appropriate.
AUP: An Acceptable Use Policy (AUP) should be provided which addresses issues such as privacy, copyright, distraction, policies imposed by others, etc.
Evaluation: It would be advisable to evaluate use of technologies in order to inform planning for future events.

Physical And Security Issues

You will need to address various issues related to the venue and the security of computers. For example, you may need to provide advice on where laptop users should sit (often next to a power supply and possibly away from people who do not wish to be distracted by noise). There will also be health and safety issues to consider. There will also be issues regarding the physical security of computers and the security against viruses, network attacks, etc.

References

Using Networked Technologies To Support Conferences, Kelly, B. et al, EUNIS 2005,
<http://www.ukoln.ac.uk/web-focus/papers/workshops/eunis-2005/paper-1>
Exploiting Networked Applications At Events, QA Focus briefing document no. 106,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-106/>
Use Of Social Tagging Services At Events, QA Focus briefing document no. 105,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-105/>
Guide To The Use Of Wikis At Events, QA Focus briefing document no. 104,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-104/>

Briefing 108

An Introduction to Secure Web Practice

About This Document

Since the early years of the Web, the process of designing and building a Web site has changed. The availability of pre-packaged software for many common tasks - content management, blogging, forum systems and so forth - has improved. Many third party services are now available. Web frameworks of varying complexity are available in almost any common programming language. Templating systems are commonplace - sites making use of advanced features such as AJAX functionality will often make use of a framework to simplify design. The fact that the design and complexity of the tools has increased does not mean that security is now out of the hands of the site developer. Some frameworks explicitly handle certain security issues, but it is good practice to work with security in mind. This document provides guidelines on some of these issues.

Platform Security Issues

Every component in your web site, including the Web server and underlying frameworks or platforms, may suffer from security flaws. As they are discovered, the developers behind that software package will issue advisories regarding security flaws, as well as software patches, upgrades or workarounds. Ongoing maintenance of a web site involves keeping an eye out for advisories, perhaps on appropriate mailing lists or RSS newsfeeds, and prompt action when such issues are discovered. Remember to plan for this essential ongoing maintenance work in your budget. Note that using popular components is likely to help security - issues are discovered and fixed quickly.

User Authentication

In general, user authentication is a difficult piece of functionality to write from scratch. If your project parameters permit you to use an existing system, or to tie your authentication into an existing system such as PHP's sessioning, consider the option. Ideally, do not store user passwords in plain text, because of the possibility that the passwords could be retrieved and used to compromise users' accounts elsewhere - in practice, users often maintain one password for several systems or sites. Make use of a hashing function, such as an md5sum s2], and store the result instead.

User Input

Many security flaws result from the assumption that user input is trustworthy, and that it contains what the programmer intended the user to input. Others result from the use of client-side code, such as JavaScript, to check the user's input before it is posted. The client's browser is not a secure environment - the user can alter browser behaviour. Even if client-side code is used, the check must be run again on the server.Examples of Common Vulnerabilities

Remote execution of code - a user, aware that their user input will be run as part of a shell or system command, tries to append additional code to their input so that a different command is run, or so that the command returns different information.
Denial of Service - a malicious user inputs data that causes your Web application to stop responding correctly.
Cross-site scripting - displaying unchecked user data provides malicious users with a method of injecting information into your web pages, providing an opportunity to confuse or mislead your users and perhaps gain private information or compromise user accounts.
SQL injection - many websites make use of a backend database. If unsafe user input is sent to the database - input to which is appended additional SQL commands - the malicious user can design commands to compromise other users' accounts, alter account information or delete information.

Conclusions

Make security a priority at all stages in the design and build process. Plan for maintenance in your budget. When specifying an application, page or component that takes user input, write down a list of possible vulnerabilities and ensure that you have addressed these during the design and build phases. Test your Web applications - provide unexpected input, and see if you can find any way to break them. Provide custom error pages - diagnostic information on errors can be useful for malicious users. Speak to a security expert if possible, and ensure that you have taken reasonable precautions. A secure application can only be ensured by an ongoing commitment to that aim.

References

Common Security Vulnerabilities in eCommerce Applications, K. K. Mookhey,
<http://www.securityfocus.com/infocus/1775>
2. Consider MD5 checksums for application passwords, Scott Stephens,
<http://www.builderau.com.au/architect/soa/Consider_MD5_checksums_for_application_passwords/0,39024564,39130325,00.htm>

Briefing 109

Web Security, Services and AJAX

About This Document

Web security is an ongoing concern of many - site developers, browser developers and everyday users. Some of the basics of Web security have been covered in a previous briefing paper [1]. However, the increasing use of Web Services and APIs in various contexts has led to some novel security concerns, as well as the resurgence of several older issues. This document will discuss a few of the security issues brought up by the use of AJAX and 'Web 2.0' service APIs.

Cross-domain Requests

Since Web services are not hosted locally, consuming data from such services involves making requests to services on other domains. Default security settings in JavaScript disallow this - most browsers will not permit it for security reasons. There are legitimate security reasons for this. If developers wish to overcome this restriction, they typically create a proxy for that service - a server-side script running on their local domain that forwards the request and returns the response. Proxies must be secured against unauthorised use. Many services will ask for an API key or password; be careful with this information and do not make it visible in the JavaScript of your page.

Multi-layered Security

Do not consider the browser as a secure environment; users can read JavaScript - even if it has been obfuscated - and alter it as it runs on their machine. In fact, they can tailor requests that do not make use of the page at all. Therefore, ensure that all the crucial business logic takes place on the server, rather than the browser. You may wish to perform it on the browser to avoid unnecessary crosstraffic between client and server, or for usability/interface reasons, but though this means that the logic has been duplicated, it is important to perform this step again when processing input requests.

According to research by Gartner Inc, 70% of attacks occur via the application layer [2]. Before deploying AJAX, consider the risks and ensure that your project is ready and able to meet the commitment required to complete the task securely. Access control in particular should be considered attentively, as well as communications channels - remember that by default XMLHTTPRequest does not encrypt the data it transmits.

Bandwidth and Speed

Several applications of AJAX have been designed to provide almost real-time input or feedback to the user, or for continuous small updates to be made to a page. When putting together an application using AJAX or frequent service calls in general, add up the amount of bandwidth that will be used during a typical interaction or use of the service.

Fallback Plans

If your service is slow or unavailable - or JavaScript is turned off on the browser - the majority of web applications should still work. The only exception to this is the relatively small subset of applications whose functionality depends directly on input taken from services, such as mash-ups based around maps taken from Google Earth.

Managing Complexity

Because AJAX applications share the business logic between the server and the client, the developers responsible for each element need to work together closely. Developers dealing with technology relatively new to them will need time to produce successive prototypes, since this is a learning process with few authoritative references from which to work.

Conclusions

As with traditional Web applications [3], make security a priority, and plan for maintenance in your budget. When specifying an application, page or component that takes user input, write down a list of possible vulnerabilities and ensure that you have addressed these during the design and build phases. Test your Web applications - both the JavaScript layer and the server-side layer(s). Speak to a security expert if possible, and ensure that you have taken reasonable precautions. A secure application can only be ensured by an ongoing commitment to that aim.

Finally, be conservative with application of novel technologies, and consider a 'feature-freeze' and testing cycle before deployment. New features will also require testing before deployment, so plan to add features only when there is time to audit the result.

References

An Introduction to Secure Web Practice, QA Focus briefing document no. 108,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-108/>
AJAX Security, Stewart Twynham, IT Observer,
<http://www.it-observer.com/articles/1062/ajax_security/>
AJAX Threats Worry Researchers, Bill Brenner,
<http://searchsecurity.techtarget.com/originalContent/0,289142,sid14_gci1207759,00.html>

Briefing 110

Publish On Demand

About Publish On Demand

Publish on demand, also known a print on demand or POD, is an increasingly common and accessible alternative (or supplement) to traditional publishing methods. Publish on demand (note that the term Print On Demand is a trademark of Cygnus Business Media, Inc. [1]) became possible due to the development of equipment capable of economically printing very small print runs of materials in book form. This briefing document discusses technical prerequisites and restrictions and provides an overview of the advantages and disadvantages of the methodology.

How does it work?

POD systems make use of digital technologies to enable an all-digital printing system. This reduces initial setup costs, meaning that small print runs can be published at a reasonable unit cost. The technology becomes less price-effective at large volumes, making POD systems particularly economical for publications with low or limited demand, or where the predictability of market demand is fairly low. The minimal setup costs also imply that publications can be made available on a speculative basis.

A 'Long-Tail' Technology

The advent of POD also coincides with recent discussions characterised by the phrase 'the long tail'. This phrase refers to the large number of moderately popular or less popular publications that are interesting to a moderately large segment of the population, but are not bestsellers or chart hits [2]. Such content is often very difficult to get hold of, since demand is too low for it to be marketable effectively according to traditional economics of large-scale publishing. Small-scale publication - the print-on-demand methodology- reduces this barrier, meaning that content can be digitally stored, browsed and made available for sale on virtual shopfronts, and a physical object such as a book or a CD is printed for the buyer.

Digital browsing and distribution also means that physical limits, such as the size of a shopfront and the amount of space available in the warehouse for storing a large print run of books, becomes a less significant factor in the decision of whether to publish.

Technical Prerequisites

Popular POD services include Xlibris, Lulu [3] and Blurb. Each has its own process; Blurb, for example, specialises in printing from Web applications such as blogs and wikis, whilst Lulu provides a number of modes of use - essentially, most PDF files with embedded fonts can be printed using the Lulu publishing system. Customised covers and so forth can also be created, although simple and effective defaults are also available. Several book formats are available, ranging from bound A4 to British or American paperback novel formats.

Advantages and Disadvantages

There remains a stigma associated to the use of print-on-demand for certain types of content, particularly fiction. It is associated in many peoples' minds with the use of small presses or vanity presses, which to many have the reputation of preying on the unwary novelist, demanding large setup fees for small print runs and providing no help in terms of marketing or distributing the work. ISBNs need to be bought separately, and distribution difficulties exist for works without an ISBN assigned to them. However, there are many valid uses of print-on-demand, ranging from publishing of local-interest or specialist books (particularly valid, therefore, for institutions such as museums) to print publication of conference proceedings.

Conclusions

Affordable small-scale publishing has opened many doors for the smaller organisation or the individual. However, whilst a useful methodology, other publication methodologies should be considered in parallel - in particular, there are many potential difficulties and responsibilities that should be considered in terms of distribution and marketing, which are not handled by POD.

References

Print on Demand, Wikipedia,
<http://en.wikipedia.org/wiki/Print_on_Demand>
The Long Tail, Chris Anderson,
<http://www.wired.com/wired/archive/12.10/tail_pr.html>
Lulu basics, Lulu,
<http://www.lulu.com/>

Briefing 111

Layout Testing with Greeked Pages

Background

Page layout, content and navigation are not always designed at the same time. It is often necessary to work through at least part of these processes separately. As a result, it may not be possible to test layouts with realistic content until a relatively late stage in the design process, meaning that usability problems relating to the layout may not be found at the appropriate time.

Various solutions exist for this problem. One is the possibility of testing early prototype layouts containing 'greeked' text - that is, the 'lorem imsum' placeholder text commonly used for layout design [1]. A method for testing the recognisability of page elements was discussed in Neilsen's Alertbox back in 1998 [2], though the concept originated with Thomas S. Tullis [3].

Technique

Testing will require several users - around six is helpful without being excessively time-consuming. Ensure that they have not seen or discussed the layouts before the test! First, create a list of elements that should be visible upon the layout. Nielsen provides a list of nine standard elements that are likely to present on all intranet pages - but in your particular case you may wish to alter this list a little to encompass all of the types of element present on this template.

Give each test user a copy of each page - in random sequence, to eliminate any systematic error that might result from carrying the experience with the first page through to the second. Ask the test user to draw labelled blocks around the parts of the page that correspond to the elements you have identified. Depending on circumstances, you may find that encouraging the user to 'think aloud' may provide useful information, but be careful not to 'lead' the user to a preferred solution.

Finally, ask the user to give a simple mark out of ten for 'appeal'. This is not a very scientific measure, but is nonetheless of interest since this allows you to contrast the user's subjective measure of preference against the data that you have gathered (the number of elements correctly identified). Nielsen points out that the less usable page is often given a higher average mark by the user.

Scoring The Test

With the information provided, draw a simple table:

Layout	Correctly Identified Page Elements	Subjective Appeal
1	N% (eg. 65%)	# (e.g. 5/10)
2	M% (eg. 75%)	# (e.g. 6/10)

This provides you with a basic score. You will probably also find your notes from think-aloud sessions to be very useful in identifying the causes of common misunderstandings and recommending potential solutions.

When Should Page Template Evaluation Be Carried Out?

This technique can be applied on example designs, so there is no need to create a prototype Web site; interface ideas can be mocked up using graphics software. These mockups can be tested before any actual development takes place. For this reason, the template testing approach can be helpful when commissioning layout template or graphical design work. Most projects will benefit from a user-centred design process, an approach that focuses on supporting every stage of the development process with user-centred activities, so consider building approaches like this one into your development plans where possible.

Conclusions

If a developing design is tested frequently, most usability problems can be found and solved at an early stage. The testing of prototype page layouts is a simple and cheap technique that can help to tease out problems with page layout and visual elements. Testing early and often can save money by finding these problems when they are still cheap and simple to solve.

It is useful to make use of various methods of usability testing during an iterative design and development cycle, since the various techniques often reveal different sets of usability problems - testing a greeked page template allows us to separate the usability of the layout itself and the usability of the content that will be placed within this content [2]. It is also important to evaluate issues such as content, navigation mechanisms and page functionality, by means such as heuristic evaluation and the cognitive walkthrough - see QA Focus documents on these subjects [4] [5]. Note that greeked template testing does look at several usability heuristics: Aesthetic & minimalist design and Consistency and standards are important factors in creating a layout that scores highly on this test.

Finally, running tests like this one can help you gain a detailed understanding of user reactions to the interface that you are designing or developing.

References

Lorem Ipsum Generator,
<http://lorem-ipsum.perbang.dk/>
Testing Greeked Page Templates, Jakob Nielsen,
<http://www.useit.com/alertbox/980517.html>
A method for evaluating Web page design concepts, T.S. Tullis. In ACM Conference on Computer-Human Interaction CHI 98 Summary (Los Angeles, CA, 18-23 April 1998), pp. 323-324.
Introduction To Cognitive Walkthroughs, QA Focus briefing document no. 87,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-87/>
Heuristic Evaluation, QA Focus briefing document no. 89,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-89/>

Further Information

Getting Started With User Centred Design, UsabilityNet,
<http://www.usabilitynet.org/tools/gettingstarted.htm>

Briefing 112

An Introduction To Mashups

What Is A Mashup?

Wikipedia defines a mashup as "a web application that combines data from more than one source into a single integrated tool" [1]. Many popular examples of mashups make use of the Google Map service to provide a location display of data taken from another source.

Technical Concepts

As illustrated in a video clip on "What Is A Mashup?" [2] from a programmer's perspective a mashup is based on making use of APIs (application programmers interface). In a desktop PC environment, application programmers make use of operating system functions (e.g. drawing a shape on a screen, accessing a file on a hard disk drive, etc.) to make use of common functions within the application they are developing. A key characteristic of Web 2.0 is the notion of 'the network as the platform'. APIs provided by Web-based services (such as services provided by companies such as Google and Yahoo) can similarly be used by programmers to build new services, based on popular functions the companies may provide. APIs are available for, for example, the Google Maps service and the del.icio.us social book marking service.

Creating Mashups

Many mashups can be created by simply providing data to Web-based services. As an example, the UK Web Focus list of events is available as an RSS feed as well as a plain HTML page [3]. The RSS feed includes simple location data of the form:

<geo:lat>51.752747</geo:lat> <long>-1.267138</geo:long>

This RSS feed can be fed to mashup services, such as the Acme.com service, to provide a location map of the talks given by UK Web Focus, as illustrated.

Figure 1: Mashup Of Location Of UK Web Focus Events

Tools For The Developer

More sophisticated mashups will require programming expertise. The mashup illustrated which shows the location of UK Universities and data about the Universities [4] is likely to require access to a backend database.

Figure 2: A Google Maps Mashup Showing Location and Data About UK Universities

However a tools are being developed which will allow mashups to be created by people who may not consider themselves to be software developers. Such tools include Yahoo Pipes [5], PopFly [6] and Google Mashup Editor [7].

Allowing Your Service To Be 'Mashed Up'

Paul Walk commented that "The coolest thing to do with your data will be thought of by someone else" [8]. Mashups provide a good example of this concept: if you provide data which can be reused this will allow others to develop richer services which you may not have the resources or expertise to develop. It can be useful, therefore, to seek to both provide structured data for use by others and to avoid software development if existing tools already exist. However you will still need to consider issues such as copyright and other legal issues and service sustainability.

References

Mashup (web application hybrid, Wikipedia,
<http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid)>
What is A Mashup?, ZDNet,
<http://news.zdnet.com/2422-13569_22-152729.html >
Forthcoming Events and Presentations, UK Web Focus, UKOLN,
<http://www.ukoln.ac.uk/web-focus/events/>
University Locator, University of Northumbria,
<http://northumbria.ac.uk/browse/unimapper/>
Yahoo Pipes, Yahoo,
<http://pipes.yahoo.com/pipes/>
Popfly, Microsoft,
<http://www.popfly.com/>
Google Mashup Editor, Google,
<http://editor.googlemashups.com/>
The coolest thing to do with your data will be thought of by someone else, Paul Walk, 23 July 2007,
<http://blog.paulwalk.net/2007/07/23/>

QA Focus Briefing Documents: Print All

Compliance with HTML Standards

Why Bother?

Which Standards?

Approaches To Creating Resources

Monitoring Compliance

Use of Automated Tools For Testing Web Site Accessibility

Accessibility And Web Sites

Accessibility Testing Tools

Accessibility Guidelines

Examples Of Automated Accessibility Checking Tools

Typical Errors Flagged By Automated Tools

Caveats

Use Of Proprietary Formats On Web Sites

Use Of Proprietary Formats

URL Naming Conventions For Access To Proprietary Formats

Converting Proprietary Formats

MS Word

MS PowerPoint

Dynamic Conversion

Mothballing Your Web Site

About This Document

Web Site Content

Technologies

Process For Mothballing

Contacts

Planning For Mothballing

Accessing Your Web Site On A PDA

About This Document

AvantGo

About

Benefits

Other Benefits

Summary

404 Error Pages On Web Sites

Importance Of 404 Error Pages

Types Of 404 Error Pages

Functionality Of 404 Error Pages

Further Information

Approaches To Link Checking

Why Bother?

Approaches To Link Checking

Policy Issues

Tools

Other Issues

References

Search Facilities For Your Web Site

Background

Approaches To Providing Search Facilities

Local Search Engine

Externally-Hosted Search Engines

Pros And Cons

Trends

References

Image QA in the Digitisation Workflow

Introduction

Image QA

1 Strategic QA

Local Search Engine

2 Process QA

3 Sign-off QA

4 On-going QA

QA in the Digitisation Workflow

Acknowledgements:

Enhancing Web Site Navigation Using The LINK Element

Introduction

The LINK Element

About

Benefits

Browser Support

Information Management Challenges

Further Information

What Are Open Standards?

Background

Why Open Standards?

What Are Open Standards?

Other Types Of Standards

References

How To Evaluate A Web Site's Accessibility Level

Background