nof-digitise Technical Advisory Service - Online Help

1. Will you be preparing advice on colour to ensure that people looking at their computer screens will actually see images with the same colours as the originals?

Ensuring exact consistency in how images are displayed across the range of machines to which the content will be delivered is difficult to achieve given that there are a wide range of different types of display devices in use.

In practice existing projects do tend to create several different formats and sizes for images, e.g. SCRAN display 72 dpi, 256 colour JPEGs on the Web, with 150 x 150 pixel thumbnails. Information about the SCRAN solution to this can be found at
http://www.tasi.ac.uk/resources/scran.html

2. Will NOF provide guidance on screen size, screen resolution and size text?

It will be difficult to provide NOF projects with guidance on screen size, screen resolution and size of text. Although modern PCs provide high-resolution graphics capabilities, significant numbers of users may continue to use older PCs. In addition we are currently seeing development of a range of devices for accessing Internet resources which have limited display and graphical capabilities, such as Internet-enabled TVs, PDAs, mobile phones, etc. In addition people who access NOF resources may have particular requirements (visually-impaired, colour blind, etc.) NOF projects should ensure that their services can be accessed in a device-independent way, although enhanced services which will exploit the end users PC setup or personal preferences may be provided.

This is an area the NOF Technical Advisory Service is looking into. The following resources may be of use:

4. Can you provide advice on the advantages of static versus dynamic Web pages?

The terms "dynamic Web pages" and "dynamic Web sites" can be used in a number of senses, so it is important to clarify the meaning.

Movement on a Web page (example 1) may be useful in some cases. However, for accessibility purposes, the end user should be able to switch off scrolling text or moving images.

Access to search facilities, backend databases and legacy systems (example 2) is desirable on many Web sites.

Web sites which can be personalised for the end user (example 3) may be desirable in some cases.

Web sites which can be personalised for the end user's client environment (example 4) may be desirable. However users should not be disenfranchised if they have an unusual client environment.

Dynamic Web sites (example 5) may be desirable in some cases. However users should not be disenfranchised if their browser does not support ECMAscript, or if ECMAscript is disabled (e.g. for security purposes).

If you are considering developing a dynamic Web site you should consider the performance implications and the effect on caching. Will the performance of your service deteriorate if, for example, server-side scripting is used, or will the performance on the end users PC deteriorate if it is not powerful enough to support Java or ECMAscript? In addition will pages on the Web site fail to be cached and what will effect will this have on the performance for the end user?

You should also consider how easy it will be to cite and bookmark dynamic resources. Will resources have a meaningful URL which is easy to remember? Will bookmarked URLs return the same resource at a future date?

NOF projects should conform to W3C WAI guidelines. The Bobby Web service and Bobby Java application will help in monitoring compliance with the guidelines. See <http://www.w3.org/WAI> for further information.

6. I plan to use a piece of proprietary software to create the HTML for the web sites, and will be checking with the manufacturers to find out how well it fits the NOF technical standards. However, I do know that it does not make use of cascading style sheets to control the appearance of web pages. Instead it enables the web master to specify background/font colours and font sizes, page widths/heights etc. in a background "document" which is then referenced by every single new web page created - thereby splitting out web page content from design, achieving the same effect as CSS but in a different way. Will this still be acceptable to NOF?

We do recommend that CSS are used (see section 5.1.1) although it is not being insisted upon (this is a *should* rather than a *must*). If you can justify the reasons that you have decided against them then that's OK. We would say though that your solution does (at first sight) sound a bit cumbersome and potentially ties you to that piece of software for ever, and we do recommend against using proprietary software unless absolutely necessary.

7. Do you have any specific guidance about how we can ensure that web sites are compatible with PDAs? (Unfortunately our budget won't stretch to purchasing one so that we can work out what is needed.)

It is not necessary (or always desirable) to purchase and test Web pages against every combination of hardware device, browser version etc. Instead you should check that your Web pages are compliant with the version of HTML / XML / CSS that you use.

The testing should be carried out using a HTML / XML / CSS validator rather than relying on checking how it looks using a browser. A variety of validator are available - for example see the HTML and CSS validators at http://www.w3.org/.

In addition to these (and other) Web-based validators, many authoring tools will have their own validators.

There are a number of validators available for WAP phones. These may be bundled in with WAP / WML authoring tools.

There may be similar tools available for PDAs. However if the PDA supports HTML, you will be able to use a HTML validator.

Note that there are a number of WAP emulators available - e.g. see http://www.gelon.net/. These can be used to test out WAP sites. However, as stated above, it cannot be guaranteed that if a site works correctly in an emulator that it will work in the device itself.

8. Although we appreciate the importance of standards, we feel that we cannot implement them fully within our project. The NOF-digi Standards document mentions that if this is the case we should document a "migration strategy" in our project report. Can you explain what is meant by this?

You should provide full details of failure to comply with standards, and also any instances in which you feel the need to implement solutions which may not comply with best practices.

In your report your should provide detailed information which will inform your NOF-digi case manager and associated bodies. You should justify the decisions you have made, show that you are aware of potential problems by outlining the disadvantages (as well as summarising the advantages). You should also describe how you would move to a more standards-compliant solution or implement a better solution, the costs of doing this, and how the work could be funded.

You should be aware of the different approaches which can be taken in migrating resources to open file formats: you could use software to convert files from one format to another; you could provide an emulation environment; you could redigitise your resource; etc. In your migration strategy you should outline the approach you are likely to take.

Please document areas in which your project may deviate from compliance with the NOF Technical standards. In this section you must (a) describe the areas in which compliance will not be achieved; (b) explain why compliance will not be achieved (including research on appropriate open standards); (c) describe your migration strategies to ensure compliance in the future and (d) how the migration may be funded.

We will be providing an online game on our Web site. This is aimed at children. The game (on our Victoriana Web site) will allow users to dress images of Victorian dolls from a selection of costumes.

(b) Explain why compliance will not be achieved including research on appropriate open standards)

The Standards document request us to make use of appropriate open standards. In this area we believe the appropriate open standards are XHTML and ECMAScript (JavaScript), sometime known as JavaScript.

However since our online game is only a very minor part of our NOF-digi project Web site and we have already developed a solution in Flash, we intend to make this available in the short term.

Our proposed Flash solution will be easy to implement as similar work has already been carried out. It will be usable by modern browsers which have a Flash plugin.

However (a) it is a proprietary solution; (b) it may not be accessible; (c) it will probably not work on non-standard devices, such as a digital TV.

As part of the NOF-digi work we will be building up our technical expertise. As we develop in-house technical expertise in client-side scripting (ECMAScript/JavaScript) we intend to migrate our online games to make use of Dynamic HTML

This will be funded by our organisation, as part of our inhouse resources which will be ensuring that our Web site conforms more fully with accessibility guidelines.

Please document areas in which your project may not implement best practices or in which a compromise solution is proposed. In this section you must (a) describe the areas in which best practices will not be achieved; (b) explain why best practices will not be achieved; (c) describe your migration strategies in case of problems and (d) how the migration may be funded.

(a) Area in which best practices will not be achieved An externally-hosted service (company X at Web site http://www.xxx.com/) is proposed from providing realtime access to usage statistics.

(b) Explain why best practices will not be achieved This is a risk associated with use of externally-hosted services (especially those which provide services for free): the service may go out of business; the service may introduce charging; etc. A worst case scenario is that the service goes out of business and its domain name is taken over by a porn company. A small pornographic icon could then be included on our Web site!

The best practice solution would be to provide analysis of usage statistics locally. This could be achieved by using a Web analysis package (e.g. Web Trends at <http://www.webtrends.com/>). There is a cost associated with purchasing the software and resource implications in using it (rotating log files, managing large files, etc.). We do intend to analyse our own Web server log server files. However this will be for internal use and will not provide (a) realtime access and (b) detailed statistics such as browser functionality.

The use of an externally-hosted usage service is: (a) cheap (free); (b) requires no special technical expertise (HTML code has to be added to our pages); (c) requires no software or hardware to be installed and maintained locally and (d) provides usage statistics on browser and client machine functionality which is not provided by analysing our server log files.

This approach has the following disadvantages: (a) reliance on a third party, which we have no formal contractual arrangements with; (b) only provides usage statistics for graphical browsers; does not allow statistics to be easily reused (unless the licensed version of the service is purchased).

The company we intend to use has been in business since 19xx. We have been in email contact with them and they have reassured us of the financial reliability of the company. They have agreed that if they change the conditions of their service they will give us at least one month's notice.

The links to the externally hosted service will be managed within our Content Management System (or through use of Server-Side Includes). This will enable links to the service to be switched off be editing a single file.

Access to realtime usage statistics is a value-added service. Our project will continue to provide its deliverables if this service becomes unavailable.

We would lose usage data which is held by the company. However (a) we could purchase a licence for this service which would allow us to access our data and import it to a spreadsheet and (b) we still have usage log files held on our Web server.

(e) Describe how the migration may be funded. If we feel that we still require realtime usage statistics we will probably want this across a range of our organisational Web sites. We would therefore purchase a licensed package such as Web Trends Live (see <http://www.webtrends.com/products/wtl/>).

The use of client-side scripting (including javascript and DHTML techniques) is acceptable, however please take note of the following points:

1) The site must still be accessible to browsers which are not scriptable. Use <noscript>< tags and "sniffer" routines to determine the client capabilities and provide content-equivalent pages to non-scriptable browsers.

2) Thoroughly test your pages for functionality under a wide range of browser / platform configurations.

There Is More Than One Javascript...
Javascript is supported in various flavours on browsers according to version and manufacturer. "Javascript" is the name given to the scripting language developed by Netscape and has also come to mean the generic client-side scripting language. JScript is the Microsoft equivalent. ECMAscript is the published open standard, to which proprietary flavours of the language _should_ adhere. Note however that ECMAscript defines the underlying structure of the language, but not specific issues, which are addressed by the Domain Object Model. The DOM too has a standardised structure, defined by the W3C group.

Internet Explorer also supports VBscript, based on Visual Basic. This will not work in other browser versions.

USE "language" Attributes in <script> Tags:
The version of javascript can be specified in the SCRIPT tag. This can be useful for branching code, particularly when used in conjunction with the "src" attribute'.

DHTML: What Is It?:
The term DHTML (Dynamic HTML) is a marketing term which denotes the use of a client-side scripting language (normally JavaScript/ECMAScript) to manipulate HTML and CSS properties.

Use of DHTML is compatible with the open standards developed by W3C, and so its use is OK from a standards point of view.

<noscript> Tags:
Use <noscript> tags where appropriate to deliver equivalent content to browsers which do not support javascript or where script processing has been disabled.

Accessibility:
Regardless of how javascript is used on the page, the overriding principle for NOF projects is that the page must still be accessible to non-javascript enabled browsers.

Future Proofing:
As new versions of browsers are released and new ways of accessing web resources come into existence there is a need for projects to test their content against these new interfaces. This is true for all web content, but the situation is more acute when pages include script. Projects should build into their plans a provision for undertaking these checks and for making corrections to HTML and script as necessary.

Script Block Content Shielding:
Commenting within <script>tags to prevent inclusion of the script by browsers that do not understand the <script> tag. Wrapping the text in HTML comment tags prevents these browsers from displaying raw javascript on the page. Admittedly this is highly unlikely now, however it is good defensive programming practice.

Browser Detection Aka Sniffing:
It is superior to test for the component of the DOM that you wish to use rather than parsing the navigator.userAgent property. For example, to create a rollover script for images, you would code:

Proportion Of Javascript-Enabled Browsers:
Figures for the numbers of javascript enabled browsers are difficult to interpret. These figures can be derived in two ways: by making assumptions based on the userAgent or by placing a testing script on a web page and recording results. Measuring from the userAgent value can be done from the server access log but it takes no account of the setting of the browser, so the assumption that, say IE5.5 is javascript enabled is not always true. The nimda virus outbreak last year highlights this. Many people were advised to turn off scripting in their internet settings to protect themselves from nimda.

10. What exactly constitutes an acceptable minimum level of browser for support purposes?

The Technical Standards currently states "Web services must be accessible to a wide range of browsers and hardware devices (e.g. Personal Digital Assistants (PDAs) as well as PCs). Web services must be usable by browsers that support W3C recommendations such as HTML, Cascading Style Sheets (CSS) and the Document Object Model (DOM)."

We have deliberately left this fairly open and avoided listing specific browser types and specs because we would like projects to try to make their sites/resources as accessible as is possible and this may differ from project to project.As a guideline I would recommend that you make sure that you support at the very least Netscape/IE 4.x upwards. I would also recommend that your CMS supplier has a look at the WAI site where a good list of alternative browsing methods are available. The resources available from there will allow you to see if your site works well with screen-readers, voice browsers etc.

This is an important part of setting up your Web site and it is probably useful to read some background information before you start any registering.

Registry: the organisation responsible for administering a top-level domain. Like Nominet for .co.uk names.

Agent: an affiliate or partner of the registry that accepts requests for domain names and administers ownership of a domain, also know as a registrar.

Domains are divided into TLDs (top-level domains) and ccTLDs (country code top-level domains). There is an administrative body - the "registry" - to oversee the TLDs, and each ccTLD. Terms and conditions vary amongst them.

ICANN administer the TLDs : com, .net, .org etc, and DNS in general: Internet Corporation for Assigned Names and Numbers at http://www.icann.org/.

It devolves authority for naming to the Internet Assigned Numbers Authority (IANA)

In the UK, .uk is administered by http://www.nic.uk/. You can register a .co.uk site directly with Nominet or register through one of their agents - aka Nominet members. Direct registration is around £80+VAT for a two year period. Nominet members receive a huge bulk discount - £5+VAT for 2 years.

General points and guidelines concerning domain registration and administration

To register a domain name it is necessary to approach an "agent" (aka member, registrar) of the registry that controls its TLD. Many "agents" can exist for each registry, with ICANN and it's partners ensuring, behind the scenes, that the namespace remains unique - no duplicate names are allowed.

The registry is responsible for setting up your domain name on the Domain Naming System (DNS). DNS generally describes the way that (domain) names are mapped to IP numbers, The DNS system has at its highest level in the namespace hierarchy a system of "root" servers. This is where a request for a domain name resolution is initiated. The root servers will give the location of the primary and secondary name server that is "authoritive" for your domain. It is this link, from the root server to the name server that the agent maintains and alone can modify.

For top level domains: the domain registration has 3 separate contact fieldsets. These are the "technical", "administrative" and "billing" contacts, alongside the owning individual or organisation's details. The same details can appear in all three contact types.

The content of a domain registration is different amongst ccTLDs. However they all have in common the owner (registrant) details and the location of the nameserver.

Requests to modify the nameservers for a particular domain are generally permitted only by the registrant (or owner) of the domain.

Transfer of a domain should be initiated by a request to the agent by the registrant. If they have gone bust or causing trouble, you can approach the registry itself. They will not release the domain if you have signed a contract with the agent requiring you to pay a release fee.

The nameserver can be anywhere on the net. It is simply a machine that handles domain name lookups, translating them to the numeric, lower-level IP address necessary for transport over TCP/IP. The nameserver holds a "zone file" for the domain, and this file contains the mappings for various uses of the domain - the location of the web or mail server, amongst others.

Only the administrators of the nameserver can (directly) make changes to the zone file. They will generally only respond to the person identified as the technical contact for the domain, although in some cases these changes can be made online by the user, identifying themselves with a password.

You can often get nameserver services from the registry you buy domains from. This can be handy, as it means you will be able to make all your adjustments, including zone-file settings, through one point of contact. Easyspace.co.uk, for example, have a great web-based account administration allowing you to easily alter DNS details, change nameservers, buy new domains and obtain other related services.

Web-based management systems are very common with agents and will give you a lot of control over the administration of the names you control. Other places can have there own tedious in-house procedures of faxing or mailing company letter-headed paper to make a change.

Most agents will automatically inform you when your registration is due to expire, giving you the option of letting it lapse or automatically renewing it.

Take care when completing the application with your personal details.The details of the owners and contacts for a domain are publicly viewable, so make sure you use suitable names, addresses and email addresses. Make sure you keep these details up to date.

You can make a "WHOIS" search at most agents, this will allow you to look up the registration details for a domain. If you search with an agent who is not responsible for the domain, you may receive a reduced set of information. You will then have to locate the site of the agent (registrar) for the domain and make the WHOIS search again.

Generally the advice above applies to the TLDs and the ".uk" group of domains. If you are using other ccTLDs be sure to check carefully their terms and conditions. ".uk" is unusual amongst ccTLDs in that it allows registrations from non-UK residents, for example, and many other differences may exist between other ccTLDs.

Unfortunately NOF are unable to give you any direct recommendations of agents, however we do recommend that you take some time to read around the nominet (nic.uk) and ICANN sites to give yourself more background information. It is a murky area of the internet and many companies have pursued this to wring out extra money from their registrants : be sure to read the terms and conditions carefully and consider the advice given above.

Each domain you register will incur an ongoing cost to your organisation for renewals, redirection and alias issues of different domains. The purpose of multiple registrations is to protect your name and to try to cover misspellings of your name and to prevent competitors from bagging domains that might siphon legitimate users of your site. More a concern for commercial organisations. However, having multiple domains makes your system administration more complex. Suppose your main domain is www.abc.com, and that site has a page called www.abc.com/search/. What will you do with a second domain www.abcd.com? will www.abcd.com/search be the same as www.abc.com/search, or will your users be redirected to the main domain first? This can cause problems...you have to decide a coherent strategy for this and you should discuss the options with the systems administration people carefully to avoid problems down the road. On the whole, people will reach your site either through a link, in which case the domain is, to the user, mostly irrelevant, or through printed publicity they receive, over which you have control!

Must I have a server set up and ready before I register a name or can I do it anytime?

....Anytime, you will just need to link the two together when your server is ready...this involves configuring the webserver so that is knows about the domains it is to serve, and changing the pointer (to the IP address of the web server) in the name server. For this reason, it is great to have control over your name server: either host it yourself, or use an organisation that gives you your own web-based administration of the nameserver configuration for your domains.

Often, the registrar will also host the name server for the domain, this is fine as long as you are still able to make changes to the config - ideally online.

Domains that you do register that do not have a live webserver ready for them should have a relevant holding page giving info about your project, a contact perhaps and if possible a means to capture email addresses to build up a user base asap. Again, this should guide your choice of registrar as they should give you a couple of free, editable pages that you can use as holding pages. Or set up a temporary web server for these purposes.

Don't forget that changes to name server settings will take up to 48 hours to be distributed around the net: it isn't an instantaneous change over the whole web. When a computer requests a DNS lookup, in order to find the IP address to send a URL, it will use it's local nameserver, and that server will keep a cached copy of the DNS info for that domain for a period known as the "time to live"...only once this has expired will the nameserver go back to the authoritive nameserver to refresh it's local copy of the record. (Keeps the traffic load on the net down, basically).

There are no rules about the suffix (.com .org .co etc.) that you use for your NOF site but again as expense is an important factor it makes sense to use the most appropriate which is probably org.uk or co.uk. Some more information on the domain name you use is available on the NOF site at http://www.ukoln.ac.uk/nof/support/help/papers/website.htm#Domain.

12. To what degree must we provide support for web browsers, particularly where certain browsers, such as Netscape version 4 series, cause real difficulties when used on our website?

Netscape 4.x browsers, i.e. browsers in Netscape series 4, (e.g. Netscape 4.07, 4.08, 4.71, etc.) have very poor support for CSS and we are aware of the difficulties this can pose. However just as the difficulties posed by Netscape 4.x differ amongst those projects affected, so do their potential response to the problem; hence there is no definitive one-size-fits-all answer.

13. Can you detail some simple checks one can perform to ensure that my website is compliant with the technical standards?

There are two issues which we have noticed on a fairly regular basis and which usually require attention:

We would like to remind all projects that it is important that the character encoding used in a text document delivered over the web is clearly identified. Best practice is to identify the character encoding in use BOTH in the HTTP header and within the <head> section of the document.

Where XML/XHTML is used, it is also good practice to identify the encoding in the "<xml .." declaration at the start of the document.

Whilst the Technical Standard and Guidelines document currently mandates the use of UTF-8 encoding (setion 2.1.2), this is under revision and it is permissible to use any appropriate encoding (ie iso-8859-1, etc), as long as this is explicitly stated as discussed above. The TS&G will be updated at the end of this year to reflect this change, and at that time the section B of the quarterly monitoring reports will also be modified. In the meantime, please indicate the actual encoding your project is using when filling out your quarterly progress report.

We would also like to point out that it is necessary to include a DOCTYPE declaration at the top of all HTML or XHTML pages. This is necessary to allow your mark-up to be correctly validated, and perhaps more importantly can affect the way the user agent (browser) interprets the mark-up it finds within the page. Therefore, a DOCTYPE statement correctly identifying the version of (X)HTML that you are using on the page will improve the chances of your page rendering correctly across different browsers and clients. Please ensure that you include the DOCTYPE declaration on all your pages.

IPR and copyright is a very complex area and unfortunately there is no "one-size-fits-all" solution to these issues. Every resource or collection of resources may have its own IPR problems that will need to be solved before a digitisation project can go ahead. However, as it is an issue of such importance when working in a networked environment, a number of excellent resources have been produced to guide you through the process of clearing resources for use.

2. The technical standards document talks about certain institutions having access to additional resources by signing a licence committing them to non-commercial use (section 3.1.5). Who would be parties to these licences? Would these licences involve an exchange of money and if so between whom? Would the copyright owner be entitled to a fee for reproduction in the same way as with non-digital reproduction?

The principle is that the end user will be provided with access free at the point of use but that two issues have to be covered:

There is a variety of models of course but a proven model involves using contracts i.e. issuing a licence.

Let us define three parties. The contributor is the body which owns the IPR in the resource. The Service Provider is the body which stores and makes available the resource. The User Institution is the body which accesses the Service Provider under licence.

The Contributor and the Service Provider could be the same thing. However, if that is not the case, there requires to be a licence setting out the conditions under which the Service Provider may make the material available.

There may also be a payment from the Service Provider to the Contributor in respect of the IPR to allow for non-profit, non proliferation educational use from then on.

The User Institution will be licensed to use the resources by the Service Provider and may pay an annual fee to allow that access. The User Institution agrees in the licence to certain conditions - normally non-profit, non proliferation use.

The User institution then allows its user group - students, those visiting a library or museum - to access the resources.

It is expected that so long as the use was purely for non-profit, non-proliferation educational purposes, the service provider would not make a further IPR payment. The licence fee it charges being merely there to sustain the service.

4. Could you explain NOF's attitude to copyright ownership of bespoke software, custom databases and schemas developed for NOF-digi projects?

An explanation of NOF's IPR conditions and to issues around Open Source systems follows.

1. The NOF IPR conditions specify (page 4 under Definitions) that 'Material means any documentation or material (including without limitation software and databases) to be provided to the Fund etc..' This is further explained in the guidance letter (13 August 02 under 'Definitions' and under '2.2 Licences' on page 4) ...'in the case of material that is delivered with software or databases specially written for this project (including any adaptation of commercially available software or databases) The Fund would expect that any commercial exploitation would recognise the use of public funds in the generation of the material. Significant commercial exploitation might involve grant repayment.'

2. The IPR conditions give the Fund the right to use the materials developed for the programme but do not provide for any 'transfer' of IPR. This is an important difference. The Fund will not 'own' the IPR to materials created (including software) but through the conditions does have rights over the commercial exploitation of the material, as explained above.

3. If any grant holder in unclear on this point and has any query regarding terms and conditions of NOF grants please contact NOF directly either through your case manager or to this email address at digitisation@nof.org.uk.

4. If you are a supplier contracted to a grant-holder please address your queries to the grant-holder who will raise them with NOF where necessary.

5. Please note that neither the IPR conditions nor the Technical Standards conditions require a commitment to open source software, but we welcome that debate on the nof-jiscmail list as it raises awareness of an important issue.

Summary

This information is not tendered as advice but does indicate which projects should most likely take note of regulations as they might apply to their situation. It also provides details on sources of information of possible use to projects.

Requirements

If either of the above applies, then there are requirements which must be met in respect of three possible areas :

1 Information requirements, i.e. the information that must be provided to end-users. These requirements include providing your end users with:

The above requirements will probably apply to you if you sell or advertise goods or services online (i.e. via the Internet, interactive television or mobile telephone).

2 Commercial communications, i.e. essential identifications and explanations that must be provided to end-users, for example if a project markets via email. These requirements include providing your end users with:

Note therefore that any form of electronic communication designed to promote your goods, services or image, such as an e-mail advertising your goods or services, must:

The above requirements will probably apply to you if you promote goods or services through any form of electronic communication (e.g. an e-mail advertising your goods or services).

3 Electronic contracting, i.e. information and explanations about the process of creating a contract electronically with an end-user. These requirements include providing your end users with:

The above requirements will probably apply to you if you enable end users to place orders online.

The requirements contained in the three categories above represent the basic situation. There may be other requirements in addition which can be ascertained from the sources of information given below.
Some exceptions do apply.

In conclusion the DTI guidance states:
"Action you may need to take:
If the Regulations apply to you, you may need to make textual or structural changes to the medium you use to advertise or sell your goods or services online, e.g. your website, in order to comply with the new requirements."

Sources of Information

The Electronic Commerce Directive (00/31/EC) & The Electronic Commerce (EC Directive) Regulations 2002 (SI 2002 No. 2013)
HTML source on DTI website.

Frequently Asked Questions on The Electronic Commerce (EC Directive) Regulations 2002
HTML source on DTI website

Contacts

1]
Paul Redwin
Bay 202
Department of Trade and Industry
151 Buckingham Palace Road
London SW1W 9SS
Telephone: +44 (0)20 7215 1853
Fax: +44 (0)20 7215 4161
Email: ecom@dti.gsi.gov.uk

2]
Trading Standards Offices

You will find the address and telephone number of your local Trading Standards Department for England, Scotland or Wales in the telephone book under "Local Authority" or on the Internet by visiting http://www.tradingstandards.gov.uk/ and entering your postcode.

The address for Northern Ireland is:
Trading Standards Service
Department of Enterprise, Trade and Investment
176 Newtownbreda Road
Belfast BT8 6QS
Tel: (028) 9025 3900
Fax: (028) 9025 3953
Email: tss@detini.gov.uk

3]
Office of Fair Trading
You can contact the Office of Fair Trading through its website, http://www.oft.gov.uk/ or at:
Office of Fair Trading
Fleetbank House
2-6 Salisbury Square
London EC4Y 8JX
Tel: (020) 7211 8000
Fax: (020) 7211 8800
Email: enquiries@oft.gov.uk

4]
Her Majesty's Stationery Office

To obtain copies of relevant Acts of Parliament and Statutory Instruments, you should contact Her Majesty's Stationery Office (HMSO) at their website address:
http://www.hmso.gov.uk
or phone HMSO's Regulations Unit on 020 7276 5216.

1. I am trying to find information on whether there is any OCR software that can cope reliably with 17th-19th century printed material, including material in columns. I would also like pointers to information on how existing OCR software would cope with 19th-century newspapers.

Although we do not have very much experience of individual products most OCR software would still have problems with recognising these types of text. Even apart from the likelihood of non-standard typefaces and awkward columns, most OCR software might have problems with background noise (e.g. print bleed-through or foxing) and non-standard characters. It's probably worth testing OCR software before rejecting it, as the main alternatives would be re-keying the whole text (horribly expensive) or just digital imaging.

The AHDS/OTA's guide to Creating and documenting electronic texts is worth a look:

http://ota.ahds.ac.uk/documents/creating/

Some other recent opinions:

From: Alan Morrison, Michael Popham and Karen Wikander, Creating and documenting electronic texts: a guide to good practice. Oxford: Oxbow Books, 2000 (forthcoming):

" ... the first thing you must consider if you decide to use OCR for the text source is the condition of the document to be scanned. If the characters in the text are not fully formed or there are instances of broken type or damaged plates, the software will have a difficult time reading the material. The implications of this are that late 19th and 20th-century texts have a much better chance of being read well by the scanning software. As you move further away from the present, with the differences in printing, the OCR becomes much less dependable. The changes in paper, moving from a bleached white to a yellowed, sometimes foxed, background creates noise that the software must sift through. Then the font differences wreak havoc on the recognition capabilities. The gothic and exotic type found in the hand-press period contrasts markedly with the computer-set texts of the late 20th century. It is critical that you anticipate type problems when dealing with texts that have such forms as long esses, sloping descenders, and ligatures. Taking sample scans with the source materials will help pinpoint some of these digitizing issues early on in the project.

http://ota.ahds.ac.uk/documents/creating/chap3.html

From: Steven Killings, 'Optical Character Recognition.' Connect, Spring 1999:

"In the final analysis, when you hear 98 percent accuracy rates quoted for OCR software packages, consider that these were most likely accomplished using laser printed business documents, where the degree of variation among characters is significantly small and where the orientations of characters is fixed and regular. An OCR operation on an average nineteenth century imprint will almost certainly be completed with less exactness.

http://www.nyu.edu/acf/connect/spring99/HumOCRSp99.html

Older PC Magazine article on OCR Software (20 January 1998):

http://www.zdnet.com/pcmag/features/ocr/_intro.htm

2. What is the situation regarding servers? Supplying video, for example, to many institutions simultaneously places huge demands upon UK Internet infrastructure as well as servers. Should we be looking to host our servers at a high capacity Super Janet node or will other provisions be made?

At this stage the NOF are not going to provide central servers on high speed networks, and the like. It will be down to the project to make arrangements to have their content connected to the Internet at speeds sufficient to deliver it to users in a useful fashion. Thus, a project delivering high bandwidth video will probably need a more robust (and faster) connection to the Net than one delivering small static images. The extra costs of this connection will need to be laid out - and justified - in the business plan.

Connection via one of the bigger SuperJANET nodes is one possibility that projects with HE partners might pursue, provided their use falls within JANET's Acceptable Use Guidelines http://www.ja.net/documents/use.html

The term Contents Management System (CMS) is usually used to describe a database which organises and provides access to digital assets, from text and images to digital graphics, animation, sound and video. This type of product is relatively new and there are a few CMS available as off-the-shelf packages. CMS range from very basic databases to sophisticated tailor-made applications and can be used to carry out a wide range of tasks, such as holding digital content, holding information about digital content, publishing online and publishing on-the-fly.

The CMS provides mechamisms to support asset management, internal and external linking, validation, access control and other functionality. Typically, a CMS is built on an underlying database technology.

Content Management Systems range from very basic databases, to sophisticated tailor-made applications. They facilitate easier tracking of different parts of a Web site, enabling, for example, staff to easily see where changes have been made recently and - perhaps - where they might need to make changes (a 'News' page that hasn't been edited for 6 months?). They also ease the handling of routine updating/modifying of pages, where you want to change a logo or text on every page, for example.

A CMS can also simplify internal workflow processes and can ensure that you are working with a single master copy of each digital asset.

However there are other approaches which may be useable, such as making use of server-side scripting to manage resources.

Use of a dedicated CMS system. Note this may be expensive, and there may be costs in learning the system, using it, etc. In addition you should ensure that an 'off-the-shelf' CMS product supports the metadata standards one might expect to use.

Use of a an open source CMS system. This avoids licence costs, but there are still resource issues.

Use of a database. May manage the resources but will it address issues such as workflow?

Use of server-side scripting approaches, such as PHP (Unix) and ASP (NT). These may allow bespoke applications to be developed, and may sit on top of databases.

To summarise then, the issue to be aware of is the difficulties in maintaining resources in formats such as HTML. Using flat files and a CMS and/or databse is a way of addressing this management issue. Whilst it is not an explicit requirement that projects manage their resources with a CMS and/or a database, if such tools are not used, the project must show how it intends to faciltate good management of its digital assets.

Suitable resolutions for digital master files for various media types are discussed in the HEDS Matrix [3], and the JIDI Feasibility Study [7] contains a useful table of baseline standards of minimum values of resolutions according to original material type.

A detailed discussion of resolution, binary and bit depth can be found on TASI's Web pages [8] and a good basic guide to colour capture can also be found on the EPIcentre Web pages [9].

MS Access was designed as a database system for small scale office use. It was not designed for use as a database server, although it can be used in this mode for simple use.

Although Access may be capable of handling the sorts of query volume you suggest, at least in the short term, you do need to consider scalability (SQLServer scales better), Web site integration (SQLServer *probably* integrates better), enterprise access to the data (SQLServer will better enable intranet access to the data, etc).

6. Would NOF recommend the web site to be hosted on its own dedicated server or on a shared server? What are the things I should bear in mind to take such a decision?

The issues to be considered in making this decision relate to performance, security and potential conflicts between software applications:

Performance - with a shared server, projects will need to ensure that the performance of their service is not impacted by the other things the server is doing. Peak access times (of both the project's service and the other services on the shared machine) need to be considered. Projects need to think about the performance of the server itself, as well as available network bandwidth to the machine.

Security - as a general rule, the more services offered by a machine, the harder it is to make that machine secure. Projects should ensure that any machine they use is operated in a secure manner.

Conflicts - on a shared server there are more likely to be software conflicts, e.g. to run package X, package Y needs to be installed, but this conflicts with package Z that is already installed for some other service. There are also issues associated with hosting more than one Web server on a single machine. Typically these are resolved by hosting multiple 'named virtual hosts' (though under some operating systems it is also possible to assign multiple 'virtual' IP addresses to a single network interface, or to install multiple network interfaces). Where 'named virtual hosts' are used it should be noted that the browser must support HTTP 1.1. However, this is not a significant problem. The Apache manual says:

"The main disadvantage [of name-based virtual hosts] is that the client must support this part of the protocol. Almost all browsers do, but there are still tiny numbers of very old browsers in use which do not."

7. Should the web server be protected by a firewall? (Or is it enough to have a firewall installed on our office network server where we will store all our digital mastercopies?)

"Machines should be placed behind a firewall if possible, with access to the Internet only on those ports that are required for the project being delivered."

This applies to all machines used to deliver the project. Projects are strongly encouraged to protect all machines used to deliver material (Web servers and back-end master storage servers) with a firewall.

8. Should we require back ups from the web hosting organisation? (We will have all back ups of mastercopies on our own network server in the office.)

Projects will need to be able to recover their Web service in the event of server failure, disk failure, or malicious hacking. Backups therefore need to be taken of all files that need to be restored to recover a service. I would anticipate that, in most cases, this means that projects will need to take backups of more than just master copies.

It is not possible to give a single answer to this kind of question. The issue is ensuring sufficient bandwidth, given anticipated levels of traffic. Traffic levels will depend on numbers of users and the kind of material being accessed. A project that anticipated 10 concurrent users accessing text-based material will have significantly less bandwidth requirements than a project anticipating 100 concurrent users accessing streaming video.

Access performance should be 'reasonable' for all resources served by the project, but it is difficult to provide guidance currently on what 'reasonable' means. Available bandwidth at the server end of less than 56K for any individual end-user is likely to have an impact on their perception of server performance. Image or video projects will probably want to aim much higher than this. Available bandwidth is total bandwidth divided by total numbers of users (but remember to allow for bandwidth being used for other things - e.g. some public library/council networks will have bandwidth reserved for CCTV, administrative computing and so on.

10. What kinds of software are being used across the NOF-digitise programme?

11. What are the hardware and software issues involved in creating digital material for community languages?

The NOF technical standards and guidelines require that all text is encoded in a way that makes it compatible with Unicode UTF-8. This allows for the simultaneous use of languages that deploy different (e.g. Roman and non-Roman) character sets, including many of the community languages being used by NOF-digi projects. Project managers need to be aware of what hardware / software is required in order to use Unicode. Basic information on Unicode is available from
http://www.unicode.org/unicode/standard/WhatIsUnicode.html

Windows 2000 and XP currently both support Unicode, whereas earlier versions do not. However, *applications* running on the earlier Windows operating systems can still support Unicode.

Some web browsers are better than others are reading community language scripts but any browser which claims to support HTML4 should be able to support Unicode. Overall, Mozilla, the open-source browser, is the preferred choice, followed by Netscape Navigator, then Internet Explorer. See http://www.alanwood.net/unicode/browsers.html for more details on how browsers need to be configured to read Unicode

It is necessary to obtain a Unicode font in order to display the different character sets. For a list of all the different Unicode fonts, Alan Wood's site is again a good source of information (see http://www.alanwood.net/unicode/fonts.html). Often a Unicode font comes embedded within particular applications. Many PCs have Microsoft's Arial Unicode font installed along with their copy of Microsoft Office 2000. Those without Office 2000 used to be able to download this font for free, but the font was removed from the Microsoft website in August 2002, leaving no suitable free Unicode in existence. For a further discussion on this see http://slashdot.org/comments.pl?sid=38224&cid=4092943.

Many developers employing Unicode, however, prefer to use one of two software packages which act as multi-character set text editors. These come with their own rudimentary Unicode fonts. Unipad, available for free, can be downloaded from http://www.unipad.org/, with versions for Windows 95 and above. A trial version of Uniedit, which should run on Windows 3.1 and above, is available from http://www.humancomp.org/uniintro.htm. Both programs cater for built-in keyboards and a wide variety of character sets - although some of these sets require further downloading from the relevant websites.

1. Does anyone have any thoughts on the use of file formats such as Flash or SVG in projects? There is no mention of their use in the technical specifications so I wondered whether their suitability or otherwise had been considered.

The general advice is that where the job can be done effectively using non-proprietary solutions, and avoiding plug-ins, this should be done. If there is a compelling case for making use of proprietary formats or formats that require the user to have a plug-in then that case can be made in the business plan, provided this case does not contradict any of the MUST requirements of the nof technical guidelines document.

Flash is a proprietary solution, which is owned by Macromedia. As with any proprietary solutions there are dangers in adopting it as a solution: there is no guarantee that readers will remain free in the long term, readers (and authoring tools) may only be available on popular platforms, the future of the format would be uncertain if the company went out of business, was taken over, etc.

You should also note, for example, that indexing software, etc. often cannot index proprietary formats, so it can act as a barrier to resource discovery. Also there may be accessibility considerations, to users using old or specialist browsers.

SVG (Scalable Vector Graphics) is W3C's proposal for an open format in this area. SVG is an XML applications and technologies such as XSLT should allow structured information in XML format to be transformed into SVG images.

The Scalable Vector Graphics (SVG) 1.0 Specification
(http://www.w3.org/TR/2000/CR-SVG-20000802/ is now a Candidate Recommendation i.e. W3C is saying that this specification is maturing and is now ready for more widespread implementation testing. However authoring tools and readers are not yet widely available, so it will be some time before the format is accessible to large numbers.

Further information is available at
http://www.w3.org/Graphics/SVG/Overview.htm8

So, to summarise, if you *require* the functionality provided by Flash, you will need to be aware of the longer term dangers of adopting it. You should ensure that you have a migration strategy so that you can move to more open standards, once they become more widely deployed.

PDF (Portable Document Format) is a proprietary file format owned by Adobe, see <http://www.adobe.com/products/acrobat/adobepdf.html>. The format preserves the fonts, formatting, colours and graphics of the source document. PDF files are compact and can be viewed and printed with the freely available Adobe Acrobat Reader.

As with any proprietary solution there are dangers in adopting it as a solution: there is no guarantee that readers will remain free in the long term, readers (and authoring tools) may only be available on popular platforms, the future of the format would be uncertain if the company went out of business, was taken over, etc.

Other limitations of PDF include difficulties in defining the structure of documents (as opposed to the appearance), providing hyperlinking, and providing universal access to viewers with old or specialist browsers.

PDF does, however, have the advantage of being easy to create and preserving the appearance of source documents.

The recommended open formats for providing documents on the Web is HTML/HTML/XHTML (to define the document structure) together with CSS/XSL (to define the appearance of the document).

If PDF is used for a NOF project, the project holder should ensure that a case is made for its use in the business case and migration strategy has been established which will enable a transition to open standards to be made.

A few years ago VRML (Virtual Reality Markup Language) was thought to be the emerging standard for virtual reality. However VRML has failed to gain widescale market acceptance. VRML is now evolving. Its successor X3D will make use of XML to provide the syntax for 3D worlds. The development of X3D is being coordinated by the Web3D Consortium - see http://www.web3d.org/

A range of browser plugins to render X3D worlds are available, see the Web3D Consortium web site for details.

The requirement that alternative format must be provided if a plug-in is required is intended primarily for accessibility purposes and to ensure that an open format is available if a project makes use of a proprietary format which requires a plugin. In the case of 3D visualisation it is recognised that a textual equivalent will probably not be appropriate and since X3D is an open standard which is currently accessible primarily through use of browser plugins, the use of these plugins is acceptable.

4. Can NOF provide advice on the use of Java in NOF-digitise project Web sites?

The Java language was developed by Sun. Although it has been submitted to several standards bodies, it has not been standardised and remains a solution. Java applets which run within the Web browser do not appear to have taken off, due to inconsistencies in the Java virtual machine used within Web browsers and resource and performance problems. In addition much of the functionality provided initially by Java applets can now be carried out using open W3C standards such as HTML, CSS and the DOM (often referred to as Dynamic HTML or DHTML).

Java can, however, provide a suitable resource at the server, as opposed to the client. Java Server Pages can provide server-side scripting facilities, and Java Beans can provide integration with other server-side services.

However, you would need to make a very good case for the NEED in your business case; a whizzy, jazzy interface does not count as a NEED.

A capability to run Java is not included, by default, with the latest version of Microsoft's Internet Explorer (IE6). However, Microsoft's own set of Frequently Asked Questions about IE6
( http://www.microsoft.com/Windows/ie/evaluation/faq/defa ult.asp) includes the following:

A: Yes, Internet Explorer 6 supports Java. Java applets run in Internet Explorer 6 just as they run in older versions of Internet Explorer. The Java VM is not installed as part of the typical installation, but is installed on demand when a user encounters a page that uses a Java Applet.

It should be noted, though, that the download is not small, and that this may deter users visiting your site using a typical telephone modem, as they will need to download this additional plug-in before interacting with your Java-enabled site.

We would would stress that NOF projects are encouraged to avoid non-standardised solutions such as Java and that - if Java is used - it is strongly recommended that this be implemented on the SERVER, rather than expecting users' web clients to handle any Java.

5. Would it be acceptable to store some original images as PSD (i.e. PhotoShop) formatted files?

The standards guidelines advise against using a proprietary format, which this is, unless you have a very good reason to do so. Photoshop is perfectly capable of saving files in the TIFF format so you would need to justify why there is any benefit in storing archival images in PSD. That's not to say, of course, that you can't take a copy from an archival TIFF and store it in PSD to work on. Try to create a 'master file' in a format that can be re-digitised from easily.

It is suggested that all projects work from a format that can be re-digitised from easily e.g. for video DV or DV Cam. Media, particularly video will need to be redigitised for delivery as technology advances. An equally important issue is probably the copyright and making sure thatall footage is covered by "blood chits" which hands over all the rights to the projects.

Although the use of Realvideo is not particularly recommended, we accept that in some cases the use of proprietary or non-standard formats may be the most appropriate solution. However, where proprietary standards are used, the project must explore a migration strategy that will enable a transition to open standards to be made in the future.

With regard to Real, if you do use this you should check that the stringent conditions which encoding with Real imply are suitable for both the project and the programme.

7. Some of our material contains tables. Should we be treating it/them in the same way as for line-drawings i.e. provided on the web as GIFs?

Digitised materials must be accessible - for example people with visual impairments should be able to process information through use of a speaking browser. Tables are permitted if they are comprehensible when linearised.

The use of GIFs to display significant amounts of textual information is *not* acceptable as these cannot be interpreted through speaking browsers. Restricting tabulated information to an HTML format should not limit formatting possibilities given the use of cascading style sheets and that tables created in Excel, for example, can be saved as HTML.

For delivery of moving video to mobile devices, it is likely that UMTS will be available by 2002, with a bandwidth approaching 2Mbps. See http://www.umts-forum.org/

In terms of delivering different versions of sites to multiple platforms, the use of XSLT to transform XML out of a database in order to display (X)HTML of different forms might be worth exploring. For further infoamrtion see http://www.w3.org/TR/xslt

Standards for storage and playout may differ. Commonly an archive/library would wish to store (preserve) the highest quality possible - meaning uncompressed - but would deliver using a standard and datarate appropriate to user requirements.

Software which delivers at multiple datarates, according to an internet user's connection, is now available from Real and Quicktime, amongst others, but the 'master copy' should ordinarily be the 'best available', which would usually mean uncompressed, linear PCM with a sampling rate and quantisation appropriate to the bandwidth and dynamic range of the material. This form of audio is typically held in .WAV files (though there are over 100 registered forms of coded audio that are possible within WAV, including highly compressed).

Within European broadcasting, 16-bit quantisation and 48 kHz sampling are the EBU (European Broadcasting Union) recommendation for broadcast quality audio. The EBU has gone a step further and added metadata to WAV, to add information critical to broadcasting and broadcast archives, forming the "Broadcast Wave Format" standard: BWF.
http://www.ebu.ch/pmc_bwf.html

The actual transfer of analogue material to digital format, especially in bulk or for unique items, is not simple. For European Radio Archives, standardisation and guidance is being developed within EBU Panel 'Future Radio Archives', http://www.ebu.ch/pmc_fra.html

In their public service role, the BBC would be pleased to offer advice to libraries / archives requiring help - providing it is for non-commercial purposes.

10. We will be creating digital images of objects containing text, such as handbills and tax returns. Do we have to re-key the text in these images as HTML? We are concerned as there are significant cost implications associated with this, especially as we don't think OCR software may be suitable for the objects that we want to digitise.

The use that you intend to make of your digitised objects that contain text is the factor in deciding whether or not you re-key the text into a machine-readable format.

If you are just digitising one or two such objects as examples of what they look like (i.e. you are interested in the visual appearance of the objects rather than the content of the text) it may not be necessary to produce the text as HTML as well.

However, if the idea is that the original text can be searched etc. then it *will* need to be produced in a machine-readable format. In any case, you will need to create metadata to describe the image.

We do recognise that if text does need to be re-keyed, significant technical effort will be required, particularly if OCR software cannot be used.

So re-keying/using OCR software to convert to a machine-readable format would be the preferred solution. It would be for the project to make a case to NOF if this is not done, on grounds of cost, for example.

11. We are digitising audio and video for streaming over the net. There are a number of differing proprietary architects available. i.e. Realmedia, QuickTime, Windows media.
As far as I understand it. MPEG have created standards that are recognizing as de-facto and Realmedia, Quicktime and Windows media are producing players and encoders that are compatible with the MPEG format. The most recent being MPEG4.

The MPEG is a standard only - Realmedia,QuicKtime and Windowsmedia are attempting to put the MPEG standard into action - is this correct ?

How does the MPEG standard compare in conjunction with the current players/encoders available?
In the Nof Technical manual it is stated:
'Video must be created and stored using the appropriate MPEG format (MPEG-1, MPEG-2 or MPEG-4) or the proprietary formats Microsoft AVI, ASF, or Quicktime' --1
From the above you would think that information can be stored as an MPEG file, is this true ? (as in a raw open source state and not proprietary)

Which encoding software would allow this and how would it be played?

MPEG is a set of international standards for audio and video compression. MPEG-4 is the newest MPEG standard, and is designed for delivery of interactive multimedia across networks. As such, it is more than a single codec, and includes specifications for audio, video and interactivity. Windows Media encodes and decodes MPEG4. Quicktime and Realplayer are working on versions which will do the same.

MPEG produces high quality video which can be streamed over networks. Quicktime and Realmedia use the MPEG standards to improve the quality of their files for delivery on the web.

In answer to your questions, can information can stored as an MPEG file? Which encoding software would allow this and ... how would it be played?

It is possible to store audio or video in an MPEG format, and to play an MPEG file. This would be NOF's preferred solution, as proper MPEG files are open, non-proprietary, and should be readable by most audio and video player programs and plug-ins. Many/most current web browsers have the capability to play MPEG-1 video without any extra plug-ins.

RealPlayer, Windows Media Player et al support a variety of audio and video formats, including MPEG, and a range of proprietary formats such as AVI.

12. Is XHTML the only future-proof way of presenting multi-page document? Can PDF be used?

As the W3C's XHTML webpage says (see http://www.w3.org/TR/2001/WD-xhtml1-20011004/) "This specification defines the Second Edition of XHTML 1.0, a reformulation of HTML 4 as an XML 1.0 application, and three DTDs corresponding to the ones defined by HTML 4". If you're familiar with HTML 4.0, then the benefit that XHTML brings is to apply the rigor of XML to a markup language with which many people and applications are already familiar.

Whilst there is no guarantee that anything is future-proof, the huge amount of information already available in (X)HTML, and the likelihood that future generations of web browsers will be able to read (X)HTML data, makes it highly likely that (X)HTML data will remain usable for many years to come -- and the added rigor of XHTML improves the situation greatly.

However, XHTML (like HTML) has no notion of "page", and it is left to the document author to decide how information should be broken down into usable chunks. For example, I could convert the text of a standard dictionary into XHTML, and then choose to present the data as a single (very, very large!) XHTML page with alphabetical subsections, or a series of 26 (very large) XHTML pages with head-word subsections, or many many thousands of XHTML pages each containing a separate head-word entry, or simply turn the text of each printed page into a separate XHTML page and organize the whole thing by naming pages after page numbers. How I choose to organize the information might depend on a number of factors, for example whether I was attempting to in some way model the existing print publication in electronic form (e.g. by structuring it on the basis of printed pages), model its organizational structure (e.g. by breaking the text into 26 subsections, or many entries), or make the electronic version easy to use (e.g. by presenting each entry as a separate XHTML page).

In other words, XHTML will allow you to model your information pretty much however you see fit: if capturing "pages" is important to you then that is possible -- but if usability is an overriding concern, you may decide to present the electronic information organized in some other way.

By comparison, PDF can provide you with an effective mechanism for capturing pages (or page images), and producing an electronic book which can be used and navigated in a very similar fashion to a conventional print publication. If visual fidelity to a non-electronic source is important to you, then this may be an important consideration. However, a similar effect could be achieved by creating XHTML pages containing transcriptions of each page from the printed source, and also links to scanned images of each page.

The concern about PDF stems from how much faith one is prepared to put in the future availability of software that can read PDF files, and whether or not the (proprietary) owners of PDF will continue to make future readers and/or versions of PDF backwardly compatible. That said, there is a vast amount of (mainly commercial) information already available in PDF, and several large public bodies are willing to consider PDF as a long-term preservation format (although not NOF).

Unless you already have a vast amount of PDF data (which you are unwilling to consider converting to XHTML), then we would strongly recommend that you consider using XHTML to prepare you data -- assuming you are able to resolve the issue of pagination.

13. Can we store our digital masters as JPEGS instead of TIFFs? We have compared the quality of both at enormous magnification and discovered no difference at all. The NOF standards prefer TIFF but say that one could use JPEGs if necessary. We have used them with our lottery funded digitisation project very successfully. It would make our work a lot a lot easier...

We would need more information on what sort of images are being captured before giving a definite answer, as the lossy compression used in creating JPEGs has more of an impact on some sorts of imagery/colour variation than on others.

However, whilst a compressed JPEG may be more useful for everyday use and delivery, the quality of an uncompressed TIFF master image is your best bet both for long-term viability of the image, and in allowing you maximum flexibility in the future as to what you do to/with the images.

We would still therefore suggest that TIFF is the format which projects should aim to use, and any deviation from this should be supported by a stronger argument than 'it would be easier'...

14. Where can I find more information on alternatives to proprietary formats like SVG and SMIL?

The reason that TIFF is recommended over JPEG is because JPEG is an inherently lossy compression technique. This means that whenever an image file is converted to JPEG format, some detail is lost. However, as you have noticed, the changes that occur are very subtle at high "quality" settings of JPEG compression. You say that you cannot "see" any difference: can I suggest that you try: Open your two specimen files, JPEG and TIFF, side by side Blow up to maximum zoom the same area on both images. Select a portion of the image with a good range of colours: an edge of the object, for example. Select the Eyedropper tool (keystroke "I" for shortcut) and make sure the "Color" floating tool bar is open. Right click on any part of either image and select "Point Sample" (or adjust this setting in Eyedropper options on the "Options" floating tool bar. Now left click on a pixel in the TIFF image. In the "Color" tool bar you should see the colour value of the pixel you have selected. Note this value. Left click on the same pixel in the JPEG image. Note the displayed colour value.

You should observe that there is a general slight difference in the colour value at any specific point in the image. Indeed, it is very difficult to "see" this difference with the eye, but I hope that this numerical demonstration will prove to you that the two images are not identical. The JPEG compression routine does not store the discreet value of each pixel in the image, it stores a mathematical function that is used to re-generate the colour values and this process will result in approximate values for many of the pixels in the image.

Note also that TIFF files can be stored with LZW compression enabled, reducing the size of the file dramatically. LZW compression does not result in any change to the values of any pixels in the image, so is suitable for archiving and preservation purposes.

RealAudio is currently recommended in the technical standards as a format that can be used to create and store sound. However it has recently come to light that creating and storing sound as RealAudio could create major problems (delivery is still OK). RealAudio is an encrypted closed format which no other software package can import. Real have gone as far as suing the manufacturer of a software package that did allow this. The Managing Agent and Advisory Services (MAAS), a new national service acquiring moving pictures and sound for delivery online to the higher and further education communities in the UK, are not recommending RealAudio. This is because they also appear to retain IPR in the encodings.

17. Could you explain what is meant by Character encoding and DOCTYPE declarations?

This FAQ addresses two areas where many projects are failing to achieve compliance with NOF technical standards. Please take heed of the following comments to ensure that your project is fulfilling these requirements.

Where XML/XHTML is used, it is also good practice to identify the encoding in the "<?xml ..." declaration at the start of the document. However this may cause problems with some client web browsers and can be safely ommitted.

We would also like to point out that it is necessary to include a DOCTYPE declaration at the top of all HTML or XHTML pages. This is necessary to allow your mark-up to be correctly validated, and perhaps more importantly can affect the way the user agent (browser) interprets the tags it finds within the page. Therefore, a DOCTYPE statement correctly identifying the version of (X)HTML that you are using on the page will improve the chances of your page rendering correctly across different browsers and clients. Please ensure that you include the DOCTYPE declaration on all your pages.

18. Why do I have to think about a preservation strategy if I am satisfied the file
formats I am using are stable and likely to persist well into the future?

The short answer is that even the short history of information technology is already littered with
obsolete formats, hardware and software. There is no absolute certainty attached to any
format and even less about securing the long-term preservation of digital information.

It is important to recall the requirement that all projects should be aware of the need for
preserving their digital resources and be able to demonstrate that awareness. Hence the
need for a preservation strategy.

It is essential to remember that the issue of preservation is not confined to arguments about
selecting the best file formats or media types. It is a major factor in the management and
long-term use of your digital information. The issue is influenced by factors that do not sit
narrowly inside the digitisation process, factors such as funding, intellectual property rights,
the stability of the hosting institution, etc.

Even within the digitisation process there is a need for a technical strategy from the outset
if projects are not to be confronted with the need for unexpected and even duplicated effort.
An example of this aspect would the adoption of a strategy based on the 'digital master'.
Even where it seems suitable to adopt an existing approach such as this, it will nonetheless
require careful investigation of the implications of its adoption and the requirements that will
flow from that choice.

It is also important to accept that even once a preservation strategy has been developed, the shifting nature of technological development is such that any strategy has to be revisited if it
is to remain relevant to technological and other changes that occur after its adoption.

nof-digitise Technical Advisory Service Programme Manual: Section 2: Digital Preservation:
http://www.ukoln.ac.uk/nof/support/manual/digital-preservation/
This document explains the concept as well as describing different technical strategies for
digital preservation. It also covers procedures for the preparation of data and documentation
for storage and preservation.

nof-digitise Technical Advisory Service The Digitisation Process
http://www.ukoln.ac.uk/nof/support/help/papers/digitisation.htm
This document looks at the essential issues of the digitisation process that should be addressed
during the project planning stages and discusses techniques for creating digital files that will
conform to the guidelines. It includes material on the aforementioned digital master.

The Cedars Project
http://www.leeds.ac.uk/cedars/
The CEDARS Project gives in-depth information about digital preservation.

Creating A Viable Data Resource
http://www.ahds.ac.uk/viable.htm
This document looks at assessing the viability of resources and the steps one can take to ensure
that data resources created today are accessible now , and in the future despite numerous and
unpredictable changes in computer technologies.

Emulation as Preservation Strategy
http://www.dlib.org/dlib/october00/granger/10granger.html
This document provides a view of the issues confronting organisations with respect to preservation
of digital information and looks at emulation as a technical approach to those issues.

Migration - a CAMiLEON discussion paper
http://www.ariadne.ac.uk/issue29/camileon/
This paper explores migration issues for the long-term preservation of digital materials and is a useful
adjunct to the material in the following document regarding the strategy of emulation.

TASI : Establishing a Digital Preservation Strategy
http://www.tasi.ac.uk/advice/delivering/digpres2.html
This document considers both technical and organisational strategies for digital preservation and the
relationship between them.

1. Is there a glossary or simplified version of the various metadata standards?

2. As we are developing sites for lifelong learners, do you have any views on whether we should use metadata appropriate for learning packages, e.g. the IMS Learning Resource Metadata Model or LOM (Learning Object Metadata)?

Although the IMS Learning Resource Metadata Model or IEEE Learning Object Metadata (LOM) would be relevant, both these place a significant overhead on the metadata creator; a LOM record could take an hour or more to complete in extreme cases, for example.

We feel that LOM/IMS is too big an overhead for what these projects are meant to be doing (although a LOM/IMS description of each project might be worht considering).

An alternative might be to use Dublin Core with the extensions proposed by the Education Working Group (DCEd) of the DCMI. They have proposed an "Audience" element, and suggest adopting "InteractivityType", "InteractivityLevel", and "TypicalLearningTime" elements from the IEEE LOM standard. More information is available at:
http://dublincore.org/news/pr-20001206.shtml

Also see the UK's Metadata for Education Group at http://www.ukoln.ac.uk/metadata/education/
and note that the UK Government Information Age Champions group are currently working on a metadata schema that is likely to use Dublin Core.

3. Are there recommended standards for the core and extended metadata attributes that should be created for digitised resources, especially images. Dublin Core provides one simple model but is very general, other possible approaches would presumably include MARC and CIMI, but some shared approach to this is presumably seen as valuable.

There are, in fact, quite a few relevant standards. For resource discovery, the nof-digitise guidelines (5.2.1) suggest that "item-level descriptions should be based on the Dublin Core and should be in line with developing e-government and UfI metadata standards." In a Dublin Core context, the specifics of using DCMES for images was discussed at DC-3 - the Image Metadata Workshop held in Dublin, Ohio in September 1996. This workshop resulted in the addition of two new elements to the original thirteen and made some changes to element descriptions.

There is some useful information on DC and other image metadata formats in section 4 of the VADS/TASI guide to creating digital resources in the AHDS Guides to Good Practice series:

This mentions things like the CIMI DTD, MARC, the CIDOC standards, etc. as well as more specialised things like the Visual Resources Association (VRA) Core Record.

There is information on more specialised administrative and structural metadata in the Making of America II project's final report:

Bernard J. Hurley, John Price-Wilkin, Merrilee Proffitt and Howard Besser, The Making of America II Testbed Project: a digital library service model. Washington, D.C.: Council on Library and Information Resources, 1999.
http://www.clir.org/pubs/abstract/pub87abst.html

A shorter list of elements with a primary focus on preservation is available at:

RLG Working Group on Preservation Issues of Metadata, Final report. Mountain View, Calif.: Research Libraries Group, 1998.
http://www.rlg.org/preserv/presmeta.html

This paper first outlines a multi-level video indexing approach based on Dublin Core extensions and the Resource Description Framework (RDF). The advantages and disadvantages of this approach are discussed in the context of the requirements of the proposed MPEG-7 ("Multimedia Content Description Interface") standard. The related work on SMIL (Synchronized Multimedia Integration Language) by the W3C SYMM working group is then described. Suggestions for how this work can be applied to video metadata are made. Finally a hybrid approach is proposed based on the combined use of Dublin Core and the currently undefined MPEG-7 standard within the RDF which will provide a solution to the problem of satisfying widely differing user requirements.

4. Can you advise on approaches to/chosen standards for metadata for sound files. Are there any recently developed models of good practice?

As a more practical example, Jon Maslin (J.Maslin@surrey.ac.uk) describes the approach taken to creating metadata for music recordings, scores and video in the performing arts at the University of Surrey:

We have adopted the Dublin Core as a basis for our metadata because we needed a clearly defined structure and wanted, if possible, to adopt a standard. It was adopted while it was still unclear in some respects, but we knew what we had to achieve so we selected only the relevant elements, expanded some and extended DC with new elements needed for the application.

So, while it was convenient to use it we had to extend it, but did not use all the elements.

We are using the same schema for music recordings, scores and video in the performing arts.

Creators and contributors: The roles of these are defined with their names. There can be an unlimited number. We have not adopted a dictionary of defined roles as the performing arts has a potentially unlimited number, but have taken the view that different applications will act upon the metadata and that retrieval software will be sufficiently intelligent to take care of interpreting different roles (hence an informal convention of adopting the terminology on the source and defining the instrument rather than the role (largely to avoid contortions such as guitarist). It is debatable, as is the difference between creator and contributor in some instances. We have tended to class producers, recording engineers as a contributor. One of the benefits of defining a role is that the importance may not terribly significant.

A similar approach has been adopted for other elements, such as the place and time of recording. We have limited this to a few attributes for our own convenience. There is no reason why this should not be expanded in the way that creators element is used.

The location elemented is structured as a URL. In the example you will see it pointing to the patronserver. It can be to any other web server or a direct file access

In addition a number of patron elements have been added which relate to courses. Another element has been added to define uniquely a title, eg all scores and recordings of a piece have an id.

The most extensive addition has been to define the contents of a piece in a standard way regardless of type or medium. In effect this gives a multi-level table of contents. It has been designed to provide an objective series of access points which can be created without extensive subject knowledge. Typically a classical piece of music will list the movements with references to starting and stopping times. Scores have access points to movements and page numbers and repeats if required. There is no limit to the granularity (beyond time and patience).

It is important to remember that this is entirely independent of the application. The advantage of the XML implementation is that variations in application are relatively simple - in Patron the application displays these in cascading hierarchies.One of the objectives has been to include sufficient data and structure to allow the metadata to be exchanged and processed for the current implementation of Patron and possible enhancement,and also to be developed with more universal standards.

We have created the metadata from a MS Access database which also holds rights information. We have also developed a form builder which automatically creates an input form from the metadata schema. This enables metadata to be created and tested rapidly, and allows inputters to adopt previously entered data to reduce time and to ensure accuracy.

So, the answer is yes and it works, but that it has been application-driven: other applications would need to add to it.

Metadata should be capable of supporting the delivery of item-level DC descriptions of all project resources.

7. Is simple Dublin Core metadata sufficient or are qualifiers needed? If they are, which ones should be used and how will interoperability between different domains be handled?

The 15 Dublin Core metadata elements form a fairly basic cross-domain core that ensures a degree of commonality across domains and applications. In order to less ambiguously express richer or more structured information than is possible in the 15 elements, the Dublin Core community supports the notion of qualification, using element refinements and encoding schemes.

An initial set of these is defined by the Dublin Core community, in the Dublin Core Qualifiers , and these are a good place to start. Where the agreed qualifiers do not meet your needs, it is possible to define others, either within your project or as part of a broader domain-based interest group.

In defining new qualifiers it is important to ensure that:
- they REFINE, and do not EXTEND, the definition of one of the Dublin Core elements
- they do not OVERLAP with the function of an existing qualifier
- if the qualifier is IGNORED by a system/user that does not understand it, the value that is left should still make sense within the definition of the parent element

As an illustration, the DC-Government Working Group recently proposed 'previousAccessMarkingChangeDate' as a refinement of DC.Rights. This was rejected because the definition of DC.Rights is:

A value of the proposed 'previousAccessMarkingChangeDate' element refinement would have been a simple date, which, on its own, does not constitute 'information about rights held in and over the resource'.

8. We are planning to digitise and make accessible through a database 20,000 photographs. We are collecting enough detail at item level to create dublin core. Please can you give some examples of dynamic, database-driven sites which use this.

The AHDS gateway http://www.ahds.ac.uk visibly displays DC metadata.

However, a lot of large sites - including the ADS http://ads.ahds.ac.uk/ - are designed in a Dublin Core-aware fashion and could send a *computer* DC-marked up metadata. It often makes sense to do as ADS have and to display the content to human readers in a way that uses language and field names more intelligible to that audience. Just because the human-readable name has been changed doesn't mean it isn't a Dublin Core field.

9. We are cataloguing video clips and each item has approximately 20 metadata fields that need to be incorporated in the site, offering advanced search options. How would I incorporate a metadata structure that conforms to e-Government standards. What steps do I need to take to achieve this?

The Dublin Core (DC) metadata scheme is based on a set of 15 core elements that are generic enough to define individual digital objects, however and wherever they have been created. Elements included in the list include 'title', 'creator', 'date' etc. A full list of these elements is available from http://dublincore.org/documents/dces/.

In many cases, however, these 15 elements are not sufficient to define accurately the objects in question. The elements are then extended or qualified to define further the resource. For one type of digital resource, an HTML page, one often sees the date element extended to include fields called 'date.created' and 'date.lastmodified', i.e. the metadata includes two dates, one informing when the page was first created and a second informing when it was last updated. For a video collection the rights element may well need to be extended so to record the various copyright issues involved. Sometimes DC elements can be qualified according to examples set by others trying to define similar digital objects; in other cases, projects need to develop their own qualifying terms.

For the criteria mentioned in the query, it would probably be best to have multiple qualifications of the creator and contributor elements to record details of interviewers, interviewees, gender etc. "Which tape" and "absolute address" could probably be slotted under the 'title', 'identifier' or 'source' elements.

It's important to note that there is no perfect metadata scheme for any one collection. How you qualify your DC metadata can depend on how your resources are being digitised or what soft- and hardware you are using. Perhaps most importantly, any metadata scheme depends on who will be searching for your resources. A metadata scheme has to be set up to allow users to find the information they need, so, in an ideal world, the creation of a metadata scheme will follow a period of research on user needs. Users must be thought of in the broadest terms, including not only a general public, for example, but future custodians of the collection. While members of the general public may want to metadata fields which permit they to do advanced searches, future custodians may need to find detailed information on the copyright holders of the videos in questions. This could be recorded in the 'rights' element.

There is a Dublin Core user group especially devoted to metadata issues surrounding moving images, although it is not particularly well developed at the moment. The user group is housed at http://dublincore.org/groups/moving-pictures/ One case study (at http://ahds.ac.uk/shakespeare.htm) gives an indication of how one digital project recording theatrical performances went about creating its metadata.

Dublin Core is recommended by the NOF-digi technical standards because its common takeup should allow digital collections around the country to be interoperable with one another, i.e. to allow users to search through more than one collection at the same time.

We would also point you to a (rather technical) paper looking at video metadata representation (mainly MPEG-7) at:
http://archive.dstc.edu.au/RDU/staff/jane-hunter/www8 /paper.html
which you may find useful.

In addition, as this metadata seems to describes individuals there may also be important data protection problems that need to be solved.

The Identifier element has to be an unambiguous reference because it defines the actual item/resource being described. When describing the kind of resources you will be creating within your NOF project the Identifier element will most likely need to include the Project's Image Number and any reference numbers used by the host institutions (e.g. accession numbers). It could be a URI (Uniform resource identifier) but should not be just the URL of the resource, though the URL could be included.

A few examples of the type of thing you should be putting in the Identifier element are listed below (these are taken from DC Assist)

You could look at some examples from dc-assist - http://www.ukoln.ac.uk/metadata/dcassist/
Bear in mind when you are looking at these in DC Assist that they are presented in a form for representing them in meta elements in HTML, but the values are still useful e.g.

For spatial coverage, values might be:
Columbus, Ohio, USA; Lat: 39 57 N Long: 082 59 W
(using encoding scheme TGN) Columbus (C,V)
(using encoding scheme DCMI Box) northlimit=23.5; southlimit=-23.5; name=The Tropics

For temporal coverage, values might be:
(using encoding scheme W3CDTF) 1945
(using encoding scheme DCMI Period) start=1929; end=1939; name=The Great Depression

Question - How does one use the Period encoding scheme for the element Coverage, Time.? Can I just simply list the Period in a field called Coverage, Period. I found the explanation in the DCMI site difficult to understand.

Again, how you manage it in your database is up to you, but it probably makes sense to have separate fields for the start date, end date and name of the Period (I'd suggest you probably don't need to store the name of the date scheme in your database as that should be constant). You might need to make the group repeatable if you envisage multiple ranges for temporal coverage, but that does seem quite complex.

When you expose/export your metadata, the start date, end date, scheme and name of a range all form part of the value of an occurrence of the spatial coverage property. N.B. this is still a spatial coverage property: "DCMI Period" is the name of an encoding scheme. You might want to check the distinction DC makes between "element refinements" (like "spatial" and "temporal") and "encoding schemes" (like DCMI-Period, or a subject scheme). See the start of:
http://dublincore.org/documents/2000/07/11/dcmes-qualifiers/

Anyway....In the database, you might have a record with fields like:
Identifier - Project 6789
Title - Banking during the Great Depression
Creator - John Smith
Subject - Economic history
Temporal coverage start - 1929
Temporal coverage end - 1939
Temporal coverage name - The Great Depression
etc etc etc

But when you expose/export the DC metadata record the value of the temporal coverage property would be encoded as
start=1929; end=1939; name=The Great Depression
i.e. a single property value with an internal "structure".

12. Can NOF recommend or suggest any models for preservation metadata that we might use for our own projects?

The RLG Working Group which suggests using 16 elements to capture crucial information about a digital file, their elements are fairly 'lightweight' and would probably be OK for a digitisation project, assuming that some descriptive metadata (e.g. DCMES) is also available. It's a bit old now, and it might be worth looking at METS http://www.ukoln.ac.uk/metadata/resources/mets/ or the more detailed set of elements which can be found in the draft NISO Technical Metadata for Digital Still Images standard. This can be found (in PDF) at http://www.niso.org/committees/committee_au.html and is also mentioned in the NOF guidelines.

Other guidance would be available in:
Anne R. Kenney and Oya Y. Rieger, Moving theory into practice: digital imaging for libraries and archives. Mountain View, Calif.: Research Libraries Group, 2000.

You could also have a look at the OCLC/RLG Preservation Metadata Working Group which has published an overview (chiefly of the OAIS model, and the specifications developed by Cedars, NEDLIB and NLA) and recommendations for 'Content Information' and a forthcoming one on 'Preservation Description Information' (these are OAIS terms): http://www.oclc.org/research/pmwg/

The "Gathering the Jewels" NOF digitisation project in Wales has settled on what metadata elements and digitisation guidelines it is going to adopt. In the interests of sharing this information as widely as possible, they have put it up on their Web site - please see http://www.gtj.org.uk/technical_logo.html and scroll down the page.

It can be useful to embed metadata into the HTML meta elements on a Web page, however when doing so keep the points below in mind.

(a) it depends on a service provider (i) finding the document (search engines still have issues when harvesting dynamically created pages, for more information see Search Engine Watch and the NOF dissemination section of the programme manual) and (ii) extracting and using the metadata; and

(b) HTML meta elements are not the only way of exposing metadata. Further information on OAI and other ways of making your metadata available will follow.

The dc:relation element is used to encode a reference to a resource which is related to the resource being described. The value of the dc:relation element should be an identifier for the related resource.

In any DC metadata record, there may be multiple occurrences of the dc:relation element, expressing relationships between the current resource and a number of other resources.

In simple/unqualified Dublin Core, dc:relation allows you to express the fact that a relationship exists between the current resource and a related resource, but it does not permit you to say anything more about the nature of that relationship between the two resources.

Qualified Dublin Core introduces a number of element refinements to dc:relation, which allow you to express the nature of the relationship between the current resource and the related resource.

In both simple/unqualified DC and in qualified DC, the value of the dc:relation element (or the value of any of its element refinements) should be an identifier for the related resource.

e.g.
[Simple DC] dc:relation = http://my.project/resource2
[Qualified DC] dcterms:isPartOf = http://my.project/resource2

16. How would you define the language of a bilingual item using Dublin Core?

Hopefully you are storing this metadata some how in your database for your own use. If so it should be fairly straightforward to define the language of your pamphlet using Dublin Core. Although DC doesn't allow for a second language you can have multiple occurrences of one element (in fact all of the 15 elements allow unlimited occurrence). So for example in a html page this would appear as below (depending on which encoding scheme you choose to use - ISO 639 or RFC 1766, and how many characters).

17. NOF require that I submit a sample of my project's metadata. Could you show me an example of how to do this?

All projects are required to submit samples of their item-level metadata and indicate which fields are being used for Dublin Core metadata.

This is a fictional sample taken from the digitisation project of Sandfordshire Council. It is of a digitised image of an etching done by the artist John Shade. It gives an indication of the format that should be used when forwarding metadata samples to case managers. Many of the fields here are loosely based on the JIDI Metadata Guidelines.

The example shows what categories the project is using for its metadata, the actual descriptions used for one item and how the fields relate to the core element set of Dublin Core. Note that not every Dublin Core element needs to be mapped to; DC.RELATION and DC.SOURCE, for instance, were omitted from the example below. Other Dublin Core fields, in this case DC.COVERAGE, can be qualified to add extra descriptive richness. The notes on the right indicate what controlled vocabularies can be used, but there is no need for projects to indicate which ones they are utilising.

Projects will have developed different metadata schema according to their collection and content; some will have more detail in certain areas and some will have less. There is no need to replicate the schema shown here. What is important with the sample is to give a sense of how each of your metadata categories are being interpreted and how they are being mapped to Dublin Core.

Image management systems can be used for a wide range of purposes, including: managing workflows, storing metadata, maintaining relationships between images and their metadata, user access, etc., so you need to have a clear idea of what you would want from an image management system (e.g. some functional requirements), in order to decide whether the current library system would be able to do the job properly.

In general, many library systems have now started to have image management capabilities, but it is sometimes difficult to know how well it has been implemented. The vendor's information on DB/TextWorks at:
http://www.inmagic.com/prod_data_dbt.htm
implies that it could be used for image management.

Peter Hirtle also has a chapter on image management systems in Kenney and Reiger's book "Theory into practice" (RLG, 2000).

2. Can you advise if backup onto CD of digital resources is sufficient for preservation purposes.

Copying to CD might be useful for short term backups but that on its own it isn't a very sustainable *long-term* preservation strategy. The reason for this is that the dyes used by recordable CDs (CD-R) tend to break down over time.

For more information, see section: 1.1.6 in Ross and Gow (1999). The same authors wrote in their executive summary that they felt that the stability of CD-R was over-rated and "far from being a secure medium it is unstable and prone to degradation under all but the best storage conditions." Best practice would beto keep an additional copy on some magnetic media. For more details see: Ross, S., Gow, A., 1999, Digital archaeology: rescuing neglected and damaged data resources. London: South Bank University, Library Information Technology Centre, February.
http://www.hatii.arts.gla.ac.uk/Projects/BrLibrary/

In practice, preservation is about managing information content over time. It is not enough just to make backups, but to create (at the time of digitisation) well-documented digital master files. Copies of these files should be stored on more than one media type, and (ideally) in more than one geographical location. These files should be used to derive other files used for user access (which may be in different formats) and would be the versions used for later format migration or for the repackaging of information content. If the files are images, the 'master' file format should be uncompressed, e.g. something like TIFF.

This is not to denigrate making backups in any way. Any service will need to generate these to facilitate its recovery in the case of disaster.

Creating a full 'digital master' with associated metadata will be a complex (and therefore expensive) task that should be done once only and at the time that the resource is being digitised. All equipment needs (or the choice of a digitisation bureau) should be considered with the creation of such digital masters in mind.

Projects will also need to decide where these digital masters should be kept for the duration of the project itself and where backup copies of them (and maybe other parts of the service) should be stored. Thought could be given to subscribing to a third-party storage service. An example is the National Data Repository at the University of London Computer Center (ULCC). More information on the services offered by the National Data Repository Service is available at:

The various service providers of the Arts and Humanities Data Service (AHDS) will also provide a long-term storage service for digital resources. They have also published various guides to good practice. For more information, see:

3. Can you clarify the emphisis that NOF is placing on digital preservation?

The nof-digitise programme is linked to lifelong learning and the overriding objective is to fund the digitisation of a wide range of materials that will be available free of charge at the point of access on the People's Network and National Grid for Learning.

It is important to secure the long-term future of materials created, so that the benefit of the investment is maximised and the cultural record is maintained in its historical continuity and media diversity. Therefore preservation issues should be considered an integral part of the digital creation process.

Projects should consider the value in creating a fully documented high-quality ‘digital master’ from which all other versions (e.g. compressed versions for accessing via the Web) can be derived. This will help with the periodic migration of data and with the development of new products and resources.

NOF is providing guidance to applicants on digital preservation, further details of which can be found in the nof-digitise Technical Standards and Guidelines.

4. How do you set up shots of awkward items such as football shirts, international caps, medals?

However questions regarding your awkward objects are best answered by a good professional photographer. Photography of strange 3d objects will need all the skills of a photographer, and really it does not make any difference if the images are being shot with a digital camera or a film one. The main problems are those of lighting and photo-craft rather than anything to do with 'digital' imaging. Photoskills at that level are hard to teach and hard to provide any material for on a Web site, really it just comes down to experience, as different things will need different lighting setups and backgrounds. Standardisation of lighting and background will make the images far easier to present and work with once they are part of a collection.

5. Is it really necessary to digitally watermark our images as those we are making available for the web are compartively low quality and we would freely allow their use for non-profit, educational uses anyway?

The NOF Technical standards states on the matter of watermarking and fingerprinting that "Projects should give consideration to watermarking and fingerprinting the digital material they produce." Must has not been used therefore it is up to individual projects if they want to use this method of copyrighting.

If your project aims to make your resources available for non-profit, educational use, your online images will be of comparatively low quality and you have a copyright statement on your site then watermarking will probably not be necessary for your images. If you feel that your money could be used more constructively in another way then you are free to do this. Note that it may be worth making a note of this choice and the reasons why you have done it and add it to your Project Plan for future reference for your case manager.

6. What is archival or preservation quality? And what is the recommended capture resolution/file size for archived material?

First, Archival and preservation are both just words....they are not standards and do not have any exact values.

second, what is archival or preservation for one project or image will not necessarily be so for another. This totally depends on what the intended use of the image is. It would be the intention to archive or preserve all the quality that is required to present the image to the highest quality chosen as appropriate for delivery. This of course sounds like a chicken or egg situation and for any project, the first question is to decide what level of quality is appropriate for that image.

You could decide on a largest output printed size required for images, for instance A3 quality print would mean 50Mb or so, A4print would mean about 24Mb. If like the National Gallery you wish to create life size copies, then you would need many hundred Mb for each image.

You could decide on how much visual information is kept within the image and then choose appropriately. For instance a 20ft x 12ft painting is going to have more visual information in it than a postage stamp, but also a turner watercolour will have much less visual information in it that a Turner engraving. The watercolour will be visually interesting with acceptable quality to understand and appreciate the image with about 800 x 600 pix but the engraving would be meaningless at this resolution and would need to be imaged at 3000 x 2000 (at least) to show the vast amount of visual information contained within the image.

Of course this is also dependent upon who the user is. A teacher wanting to show a Turner watercolour wants to show the whole image on screen to her students......the paper historian wants to zoom right into the image to be able to see the paper detail. Both are legitimate but a decision has to be made as to what level of detail the project wishes to capture.

As a rule of thumb, to print an image at the same size as the original you need approx 300ppi for a continuous tone image and 600ppi for a bi-tonal image. It is these figures that are sometimes erroneously given as 'preservation' and 'archive' quality......but don't just grab these figures and stop thinking about the more fundamental issues.

There are different influences as well......Do you wish to have a standard image quality within your collection (advisable normally!) so they are all the same size. This makes it much easier to work with as you always know how big every image is....because they are the same. On the other hand if you had a collection of maps and stamps, then what was right for one would be totally inappropriate for the other and you would need to at least have two standards...

How much visual information within the image needs to be captured
How big the original images are
What is the biggest size that you would wish to output (print)
How if in any way can you foresee these answers changing in the future...with new technology and larger expectations.

Having found those out, you can decide what size the captured image needs to be to fulfill your requirements......When you know what you need....it is easy enough to find out what the appropriate resolution is to give you that.

Having done that.....it is simple.....you just preserve and archive a file that is big enough to provide you with all the quality that fulfills your needs.

And what is the recommended capture resolution/file size for archived material?

We don't recommend absolute file sizes because they depend upon a host of factors, including physical image size, scan resolution, file format, and colour depth. There are just too many variables to enunciate sensibly in something like the Technical Standards and Guidelines document.

Instead, you should bear in mind the need for preservation, re-use, etc, and should scan as well as feasible, using your judgment and working conditions.

It is true that capture resolution is linked to scale. We would assume 1:1 for the scale in most cases, except where that wasn't feasible.

There is some very useful information on exactly this problem at the TASI Web site: http://www.tasi.ac.uk/advice/creating/creating.html. The TASI Web site gives useful information on the decision making process. Another source that provides useful information is the AHDS Visual Arts Data Service Guide to Good Practice for creators of visual resources. The Guide can be found at: http://vads.ahds.ac.uk and look under Guides to Good Practice.

7. Could you tell me more about cameras versus scanners and archival quality?

1) The better equipment for taking a digital copy of a sheet of paper is a scanner. A camera will distort the geometry of the page around the edges - perhaps not enough to be visible, but still it remains a fact of life of capturing an image through a glass lens.

2) Use of a camera would have to be controlled: I would not recommend that the camera was held by hand, it should be mounted on a support of some kind, and lighting conditions would have to be suitable.

3) Use of a camera may be appropriate if the nature of the material is delicate and subject to damage on a flatbed scanner.

4) The key parameters for judging the file size are: -the dpi
-the colour depth
neither of which can be universally specified. For preservation we are generally looking at dpi of 300, but this is just a ballpark figure...we can say the higher the better, within practical limits.

5) With regard to the filesizes, TIFF is a suitable format for preservation. The difference in file size is probably because one is stored uncompressed and the other is probably compressed with the LZW algorithm. From a long term perspective use of LZW is not recommended as the algorithm used is patented (same codec as used in GIF), and there may be cost implications down the line. There are other compression schemes available for TIFF, and storing the collection of TIFF files in a compressed archive may also be effective (ie zip, tar, rar etc)

6) With regard to filesizes, the other issue is the colour depth (bit depth) - ie the number of bits used to store colour information. This is generally 24 bit (ie 8 bit per RGB channel).

In conclusion I would recommend that you look carefully at the digitisation process and the results. Once you are satisfied that you have chosen the superior method of digitisation, then worry about the file storage problems.

8. A small museum has found during a trial of digitising its 5X8 b/w glass negatives at 600dpi and saving in TIFF files, that this level of resolution places far too great a demand on the museum in terms system resources and processing time - files are being saved at up to 235MB and can take a long time to open. The guidance given by the NOF standards and other HE advisory agencies recommends scanning to 'best resolution within an organisation's budget and resources'.
Can NOF provide advice as to what a reasonable resolution (and resulting filesize) would be - or point to an agency that can? The HEDS matrix referred to in the Digitisation paper provides insufficient detail.

There is always a compromise between the quality that would be needed to use the image file for conservation and the cost of digitisation and storage of that file. Where that compromise is made is dependant upon the resources of the organisation.

There is no magic figure that can be given as a correct resolution as this will always depend upon the size of the item that is being digitised and its intended use.

That is why the recommendations state that scanning should be to 'best resolution within a organisation's budget and resources'.

For an organisation to work out what they can and cannot afford they must start at the other end of the workflow by establishing what is the organisations budget and resources. From that, they can establish what file size they can create within that budget.

The budget will control the amount of storage space available to the project. You can divide the available space by the number of items and get a maximum file-size possible within the 'budget and resources'.

Whether the file size that you have created will be big enough will depend on what it is that you want to do with it. We would suggest that 235Mb is certainly a very large file and could produce beautiful large prints but is most probably more than you need to produce for most digitisation projects.

If the goal of the project was to solely produce images to be used via a monitor delivery system then these images would indeed be quite vast!

As a rough rule of thumb we would recommend that you worked to something like these guidelines:

If your intention was to produce colour files, they would look nicer, then they would need to be 24.8Mb (A4 24bit col at 300ppi)

(Other possibilities would be to save the file in 16bit b/w which would certainly be better quality but giving larger files. Or the image could be digitised at a size big enough to create bigger output sizes, eg up to A3?)

Continuing on this basis, the 5x8 inch negatives would need to be scanned at 450ppi to create the filesize we have discussed.

This however is much smaller than the 235Mb that they are getting,
which would suggest to me that either:

They are scanning at a much higher res than 600ppi. The are scanning at a very high colour depth like 48bit colour.

If space and money are an issue then we would question whether this is the best way forward.

We have to still point out that there is no magic "correct res" as it is totally dependent upon the size of the original and intended use of the images within the project.

9. The 10,000 digitised images we hope to have on our Web site will have been either photographed or scanned. However, we would also like to include a small amount of slides of historic costume that we believe it would be detrimental to handle again in order for them to be re-photographed. Having examined the slides, they are not as good a quality as we are now capturing, and although they will look fine on the web they they will not make for good archival images. Is it possible for us to use these slides, or should we not include them and therefore leave out some important historical costume items that we would very much like to include?

It sounds like you have a fairly common problem on your hands - deciding what to digitise is never easy. You may have tried using the matrix approach already in selecting, if not some of these sites may be of interest:

The best matrix I've seen is in Stuart Lee's book 'Digital Imaging: a practical handbook' Lee does say that there isn't weighting attached to categories nor is it suggested that you just add up the number of ticks under each section and give items a score. The matrix is more for demonstrating that you've taken the selection criteria serious and have made some analysis.

With the slides you talk about you really will just have to weigh up the pros (their important historical value) and cons (scanning slides may take longer and as you've explained the quality isn't that good - no archival master) of using them and decide what is the most important.

NOF-digitise would accept the use of these slides because there is significant historical value attached and they will be very useful for learning (and that is what the programme is all about). Also the number of slides to be used only makes up a small number or your overall collection. However I would advise adding something to your quarterly progress report about your intentions and mentioning them to your case manager.

You might also find the section on creating digital images from slides on the TASI site useful as you will need to optimise their quality.

1. Is there a definitive guide available about developing government and UfI standards?

2. Is there any general information or guidelines about how to choose the right web hosting company, ie what we should be looking for, what kind of questions should we ask a potential supplier...?

Backup arrangements - How will they back-up your data? How often? How quickly can you get it back?

Hardware support arrangements - What will they do when the disk that your data is on breaks? Do they run RAID of any kind. How long to replace the disk and get back data (see above).

Security policies - How do they make their systems secure? What facilities do they offer to other users of the same machine.

Available bandwidth - What kind of connectivity do they have? What peak-time traffic do they have using that bandwidth?

3. What sort of bandwidth will be available for delivery of nof-digi projects?

Up to 20% of schools (mainly secondary) will be connected at 2mbps by end of 2002, the rest will be ISDN2. Libraries have an aspiration for all public libraries to be connected at 2Mbps by end of 2002. But do remember, though, the needs of home users, who may well be using traditional modems. They don't necessarily need full functionality, but they do need to be catered for.

4. Could you explain what is meant by 'Web services' in para 3 of Section 5.1.1: Access to Resources, i.e. "Web services must be accessible to a wide range of browsers and hardware devices (e.g. PDAs)..."

The resources which are being digitised through the NOF digitisation programme are expected to be available for a long period and to be accessible to new devices which may be developed in the future.

Even today PDAs can be used to access Web sites containing images, and in the near future we can expect PDAs to have multimedia capabilities. We are also hearing the term E-Book Library being used instead of E-Book Reader to convey a future in which large amounts of data can be stored on mobile devices.

In order to ensure that NOF projects can exploit such devices they should be based on open standards (such as XML) as opposed to using formats which are aimed at desktop devices.

Proprietary file formats (such as the Microsoft Word format, Abobe PDF, Macromedia Flash, etc.) are owned by a company or organisation. The company is at liberties to make changes to the format or to change the licence conditions governing use of the format (including software to create files and software to view files). Use of proprietary formats leaves the user hostage to fortune - for example the owner of a popular and widely-used format may increase the costs of its software (or introduce charges for viewing software).

Open formats are not owned by a copany or organisation - instead they are owned by a national or international body which is indendent of individual companies.

Many standards bodies exist. Within the Web community important standards organisations include the World Wide Web Consortium (W3C), the Internet Engineering Task Force (IETF), ECMA (European Computer Manufactures Association), ISO, etc. These standards organisations have different cultures and working practices and coverage. W3C, for example, is a consortium of member organisations (who pay $5,000 to $50,000 per year to be members). W3C seeks to develop concensus amongst its members on the development of core Web standards. The IETF, in contrast with the W3C, is open to individuals. ISO probably has the most bureaucratic structure, but can develop robust standards. The bodies have different approaches to defining standards - ISO, for example, solicits comments from member organisations (national standards bodies) whereas W3C solicits comments from member organisations and from the general public.

You should not confuse "open standards" with "open source". Open source software means that the source code of the software is available for you to modify and the software is available for free. This is in contrast to licensed (or proprietary) software in which the source of the software is not normally available. Both open source and proprietary software can be used to create and view open standard formats.

The Technical Standards have recently changes their wording on this area. They now state "Projects should seek to provide maximum availability of their project Web site. Significant periods of unavailability should be brought to the attention of the NOF case manager and in addition should be reported to NOF through the standard quarterly reporting process. "

Projects should aim for as high level of availability as is possible. However, 24/7 cover is likely to be quite expensive and the cost/benefit doesn't justify it. Ensuring that you can ensure a fast response to breakdowns during office hours is acceptable.

7. Could you tell us how we can anticipate the eventual demand on our resources so we can tell our hosting service?

There is no simple answer to this question, and I am afraid that we do not have any figures that you could use to guide you. Each web resource is unique in this aspect and giving you the user figures for a different Web site would be of no use to you. Below are some pointers that might help you to come to a sensible estimate of the type of load to anticipate.

Do you have any existing web services within your organisation? What usage rates do they generate?

Do you have any usage statistics to any other institutions? Museums, Libraries, etc? These may give you a starting figure.

Do you have a feel for your target audience - perhaps you can derive a figure from the numbers of students who may use your resource?

However, in all cases these figures will leave you with but a very rough feel for what might happen, and the very nature of a web project means that usage rates depend on many things and are very unpredictable. However, it is likely that demand will build up over time as word of your site disseminates, search engines start to index pages and people start to link to your site. Monitor your site usage carefully to observe this happening. Have a look at the Web Site Performance Monitoring Information paper for advice on how to do this.

From a technical point of view you will need to establish some ballpark target figures to enable the system architecture to be specified: some of these figures would be:

From these figures the total bandwidth requirements can be calculated. Web hosts often will cap the monthly bandwidth available to you, so it is good to have a feel for this.

You also need to consider what is the maximum number of connections you will have to your web server at one time. This can impact on the system performance in many ways...The maximum number of simultaneous users will help you establish the maximum network throughput you will require: This too is usually restricted at a web host, so that you will be restricted to a data rate of say 128 kbps.

However, that said, it is our experience that the average hosting deal, of say 15GB per month data transfer and a network connection of 128 kbps, connected to a bog standard spec PC (say 1Ghz / 256 MB RAM) will host a healthy busy website - say maximum 50 simultaneous users and 10,000 page requests per day...

Whilst we would definitely recommend that you consider this issue and derive some target figures that you expect to achieve (this is a way for you to measure the success of your site as much as anything else), we would suggest that you work with your 3rd party suppliers to achieve this as they should be able to offer some good real-world experience of usage and help guide you in creating a system architecture.

Print journals you could try include the CLIP (what was the Library Association) jobs pullout (http://www.cilip.org.uk/) they also run Lisjobnet online and INFOMatch. There's also the Guardian.

9. There is a lot of talk about making Web sites more accessible to those with disabilities through use of standards...but what about making your site more readable for these people?

Knowing how to write for the Web is an important area of accessibility. NOF believe that it is very important that sites are written in plain English for people with learning difficulties. It is felt that beneficiaries of funds should have to commit themselves to this, as well as providing a level of content accessibility. There is also a need for research to bring together existing and emerging good practice and develop tailor made guidance and this is something NOF is working on.

10. What other funding sources are there that we could apply to in order to sustain our project?

Funding streams are changing all the time so you need to keep your ear to the ground and watch out for messages to mailing lists and articles in magazines.

11. How have other projects dealt with the problem of documenting issues that arise?

An example of good practice in handling 'Issues' and documenting how they are being dealt with has been provided by the Cistercians in Yorkshire project. They create word documents detailing each issue and how it is resolved. Such information could be held in a content management system.

1. All projects are expected to have at least some live content publicly available on a Web site by 31 December 2002.

2. From the point at which your Web site goes live you are expected to start recording a) Web usage statistics (see point 5. Below) and b) responses from users through a user feedback form on your site. User forms should be prominently displayed or signposted from your home page. The Fund is not specifying the questions to be asked on the form, as this will vary depending on the information individual projects may want to collect to meet their own user evaluation targets. It would be helpful however for projects within each consortium to discuss and share views on the design of the forms, particularly where projects have had prior experience of evaluating online user involvement. We would encourage you to share your proposals widely with other nof-digi projects through the programme email list nof-projects@ukoln.ac.uk and the NOF-DIGI@jiscmail.ac.uk list.

3. Before the end of December 2002 the Fund will email you a form for the Annual Monitoring Report (AMR) which will contain questions related to the above information. The first AMR for each project is required to be completed and returned to NOF by 31 December 2003 (unless you have been requested to return the form earlier by your Case Manager).

4. To ensure that we are able to collect sufficient data and to certify that sites are operational throughout the monitoring period, the Fund will require at least three AMRs from each project, depending on when the content will be completed. Some of the longer projects may be expected to submit up to five AMRs. Your case manager will confirm the next due date each year when your report is requested. The Fund will also be carrying out independent checks on web site availability during the monitoring period.

5. The AMR will include (apart from standard information about your organisation and its finances) several questions about the usage of your site, which you will be able to collect through the use of Web statistics software such as Web Trends and Analog.

6. If you have any queries about how this affects your particular project please contact your designated NOF Case Manager (via the digitisation@nof.org.uk email).

A hit - a hit is a request to the server for a file. This includes all the images, audio, graphics, pages and other supporting files as well as the HTML files themselves.

Page Views - Pages are files with the extensions htm, html, shtml, asp and so on as defined in webtrends. This value gives the number of pages viewed, not all the supporting files. This means that the total number of hits is always bigger that the number of page views. For example if a web page had five graphics files on it, everytime a user visited that page 6 successful hits would be reported, and only 1 page view.

User Sessions - A single user session is defined as a person accessing the site. If they are inactive for more than 30 minutes then a new session is started.

Visitors - This attribute is taken from the IP address or username in the logfile. An individual visitor is specified by its individual IP address of the computer being used. For example the number of Unique Visitors is the number of different IP addresses in the log file.