Technical Advisory Service
2001 - 2004 archive
Frequently Asked Questions
This is the full list of FAQs on the Technical Advisory site.
Web site design
Hardware and software
Web Site Design
1. Will you be preparing advice on colour to ensure that people looking at their computer screens will actually see images with the same colours as the originals?
Ensuring exact consistency in how images are displayed across the range of machines to which the content will be delivered is difficult to achieve given that there are a wide range of different types of display devices in use.
In practice existing projects do tend to create
several different formats and sizes for images, e.g. SCRAN
display 72 dpi, 256 colour JPEGs on the Web, with 150 x 150 pixel
thumbnails. Information about the SCRAN solution to this can be
2. Will NOF provide guidance on screen size, screen resolution and size text?
It will be difficult to provide NOF projects with guidance on screen size, screen resolution and size of text. Although modern PCs provide high-resolution graphics capabilities, significant numbers of users may continue to use older PCs. In addition we are currently seeing development of a range of devices for accessing Internet resources which have limited display and graphical capabilities, such as Internet-enabled TVs, PDAs, mobile phones, etc. In addition people who access NOF resources may have particular requirements (visually-impaired, colour blind, etc.) NOF projects should ensure that their services can be accessed in a device-independent way, although enhanced services which will exploit the end users PC setup or personal preferences may be provided.
This is an area the NOF Technical Advisory Service is looking into. The following resources may be of use:
4. Can you provide advice on the advantages of static versus dynamic Web pages?
The terms "dynamic Web pages" and "dynamic Web sites" can be used in a number of senses, so it is important to clarify the meaning.
Movement on a Web page (example 1) may be useful in some cases. However, for accessibility purposes, the end user should be able to switch off scrolling text or moving images.
Access to search facilities, backend databases and legacy systems (example 2) is desirable on many Web sites.
Web sites which can be personalised for the end user (example 3) may be desirable in some cases.
Web sites which can be personalised for the end user's client environment (example 4) may be desirable. However users should not be disenfranchised if they have an unusual client environment.
Dynamic Web sites (example 5) may be desirable in some cases. However users should not be disenfranchised if their browser does not support ECMAscript, or if ECMAscript is disabled (e.g. for security purposes).
If you are considering developing a dynamic Web site you should consider the performance implications and the effect on caching. Will the performance of your service deteriorate if, for example, server-side scripting is used, or will the performance on the end users PC deteriorate if it is not powerful enough to support Java or ECMAscript? In addition will pages on the Web site fail to be cached and what will effect will this have on the performance for the end user?
You should also consider how easy it will be to cite and bookmark dynamic resources. Will resources have a meaningful URL which is easy to remember? Will bookmarked URLs return the same resource at a future date?
For further information on Dynamic HTML see DHTML Lab at http://www.webreference.com/dhtml/
For a definition of Dynamic HTML see the What Is site at http://www.whatis.com/WhatIs_Definition_Page/0,4152 ,212022,00.html
5. Are there any standards in relation to people with disabilities?
NOF projects should conform to W3C WAI guidelines. The Bobby Web service and Bobby Java application will help in monitoring compliance with the guidelines. See <http://www.w3.org/WAI> for further information.
6. I plan to use a piece of proprietary software to create the HTML for the web sites, and will be checking with the manufacturers to find out how well it fits the NOF technical standards. However, I do know that it does not make use of cascading style sheets to control the appearance of web pages. Instead it enables the web master to specify background/font colours and font sizes, page widths/heights etc. in a background "document" which is then referenced by every single new web page created - thereby splitting out web page content from design, achieving the same effect as CSS but in a different way. Will this still be acceptable to NOF?
We do recommend that CSS are used (see section 5.1.1) although it is not being insisted upon (this is a *should* rather than a *must*). If you can justify the reasons that you have decided against them then that's OK. We would say though that your solution does (at first sight) sound a bit cumbersome and potentially ties you to that piece of software for ever, and we do recommend against using proprietary software unless absolutely necessary.
7. Do you have any specific guidance about how we can ensure that web sites are compatible with PDAs? (Unfortunately our budget won't stretch to purchasing one so that we can work out what is needed.)
It is not necessary (or always desirable) to purchase and test Web pages against every combination of hardware device, browser version etc. Instead you should check that your Web pages are compliant with the version of HTML / XML / CSS that you use.
The testing should be carried out using a HTML / XML / CSS validator rather than relying on checking how it looks using a browser. A variety of validator are available - for example see the HTML and CSS validators at http://www.w3.org/.
In addition to these (and other) Web-based validators, many authoring tools will have their own validators.
There are a number of validators available for WAP phones. These may be bundled in with WAP / WML authoring tools.
There may be similar tools available for PDAs. However if the PDA supports HTML, you will be able to use a HTML validator.
Note that there are a number of WAP emulators available - e.g. see http://www.gelon.net/. These can be used to test out WAP sites. However, as stated above, it cannot be guaranteed that if a site works correctly in an emulator that it will work in the device itself.
8. Although we appreciate the importance of standards, we feel that we cannot implement them fully within our project. The NOF-digi Standards document mentions that if this is the case we should document a "migration strategy" in our project report. Can you explain what is meant by this?
You should provide full details of failure to comply with standards, and also any instances in which you feel the need to implement solutions which may not comply with best practices.
In your report your should provide detailed information which will inform your NOF-digi case manager and associated bodies. You should justify the decisions you have made, show that you are aware of potential problems by outlining the disadvantages (as well as summarising the advantages). You should also describe how you would move to a more standards-compliant solution or implement a better solution, the costs of doing this, and how the work could be funded.
You should be aware of the different approaches which can be taken in migrating resources to open file formats: you could use software to convert files from one format to another; you could provide an emulation environment; you could redigitise your resource; etc. In your migration strategy you should outline the approach you are likely to take.
Two examples are given below.
Example 1 - Use of Flash
Compliance With Standards
Please document areas in which your project may deviate from compliance with the NOF Technical standards. In this section you must (a) describe the areas in which compliance will not be achieved; (b) explain why compliance will not be achieved (including research on appropriate open standards); (c) describe your migration strategies to ensure compliance in the future and (d) how the migration may be funded.
(a) Area in which compliance will not be achieved
We will be providing an online game on our Web site. This is aimed at children. The game (on our Victoriana Web site) will allow users to dress images of Victorian dolls from a selection of costumes.
(b) Explain why compliance will not be achieved including research on appropriate open standards)
However since our online game is only a very minor part of our NOF-digi project Web site and we have already developed a solution in Flash, we intend to make this available in the short term.
(c) Describe the advantages and disadvantages of your proposed solution
Our proposed Flash solution will be easy to implement as similar work has already been carried out. It will be usable by modern browsers which have a Flash plugin.
However (a) it is a proprietary solution; (b) it may not be accessible; (c) it will probably not work on non-standard devices, such as a digital TV.
(d) Describe your migration strategies to ensure compliance in the future
(e) Describe how the migration may be funded
This will be funded by our organisation, as part of our inhouse resources which will be ensuring that our Web site conforms more fully with accessibility guidelines.
Example 2 - Use of An Externally-Hosted Service for Web Statistics
Compliance With Best Practices
Please document areas in which your project may not implement best practices or in which a compromise solution is proposed. In this section you must (a) describe the areas in which best practices will not be achieved; (b) explain why best practices will not be achieved; (c) describe your migration strategies in case of problems and (d) how the migration may be funded.
(a) Area in which best practices will not be achieved An externally-hosted service (company X at Web site http://www.xxx.com/) is proposed from providing realtime access to usage statistics.
(b) Explain why best practices will not be achieved This is a risk associated with use of externally-hosted services (especially those which provide services for free): the service may go out of business; the service may introduce charging; etc. A worst case scenario is that the service goes out of business and its domain name is taken over by a porn company. A small pornographic icon could then be included on our Web site!
The best practice solution would be to provide analysis of usage statistics locally. This could be achieved by using a Web analysis package (e.g. Web Trends at <http://www.webtrends.com/>). There is a cost associated with purchasing the software and resource implications in using it (rotating log files, managing large files, etc.). We do intend to analyse our own Web server log server files. However this will be for internal use and will not provide (a) realtime access and (b) detailed statistics such as browser functionality.
(c) Describe the advantages and disadvantages of your proposed solution;
The use of an externally-hosted usage service is: (a) cheap (free); (b) requires no special technical expertise (HTML code has to be added to our pages); (c) requires no software or hardware to be installed and maintained locally and (d) provides usage statistics on browser and client machine functionality which is not provided by analysing our server log files.
This approach has the following disadvantages: (a) reliance on a third party, which we have no formal contractual arrangements with; (b) only provides usage statistics for graphical browsers; does not allow statistics to be easily reused (unless the licensed version of the service is purchased).
(d) Describe your migration strategies in case of problems
The company we intend to use has been in business since 19xx. We have been in email contact with them and they have reassured us of the financial reliability of the company. They have agreed that if they change the conditions of their service they will give us at least one month's notice.
The links to the externally hosted service will be managed within our Content Management System (or through use of Server-Side Includes). This will enable links to the service to be switched off be editing a single file.
Access to realtime usage statistics is a value-added service. Our project will continue to provide its deliverables if this service becomes unavailable.
We would lose usage data which is held by the company. However (a) we could purchase a licence for this service which would allow us to access our data and import it to a spreadsheet and (b) we still have usage log files held on our Web server.
(e) Describe how the migration may be funded. If we feel that we still require realtime usage statistics we will probably want this across a range of our organisational Web sites. We would therefore purchase a licensed package such as Web Trends Live (see <http://www.webtrends.com/products/wtl/>).
Some Useful Links
Risk management of Digital Information: a File Format Investigation - http://www.clir.org/pubs/abstract/pub93abst.html
Avoiding technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation - http://www.clir.org/pubs/abstract/pub77.html
Migration: a Camileon discussion paper - http://www.ariadne.ac.uk/issue29/camileon/
Archiving and Preserving PDF files - http://www.rlg.org/preserv/diginews/diginews5-1.html#fea ture2
Flash and SVG - http://www.ep.cs.nott.ac.uk/projects/SVG/flash2svg/
1) The site must still be accessible to browsers which are not scriptable. Use <noscript>< tags and "sniffer" routines to determine the client capabilities and provide content-equivalent pages to non-scriptable browsers.
2) Thoroughly test your pages for functionality under a wide range of browser / platform configurations.
A few general comments on client-side scripting:
Internet Explorer also supports VBscript, based on Visual Basic. This will not work in other browser versions.
( ECMA = European Computer Manufacturing Association )
USE "language" Attributes in <script> Tags:
DHTML: What Is It?:
Use of DHTML is compatible with the open standards developed by W3C, and so its use is OK from a standards point of view.
Script Block Content Shielding:
Browser Detection Aka Sniffing:
The Technical Standards currently states "Web services must be accessible to a wide range of browsers and hardware devices (e.g. Personal Digital Assistants (PDAs) as well as PCs). Web services must be usable by browsers that support W3C recommendations such as HTML, Cascading Style Sheets (CSS) and the Document Object Model (DOM)."
We have deliberately left this fairly open and avoided listing specific browser types and specs because we would like projects to try to make their sites/resources as accessible as is possible and this may differ from project to project.As a guideline I would recommend that you make sure that you support at the very least Netscape/IE 4.x upwards. I would also recommend that your CMS supplier has a look at the WAI site where a good list of alternative browsing methods are available. The resources available from there will allow you to see if your site works well with screen-readers, voice browsers etc.
Some of the FAQs may also be of use. See Web site design no 7
This is an important part of setting up your Web site and it is probably useful to read some background information before you start any registering.
Background and Term Definitions
Registry: the organisation responsible for administering a top-level domain. Like Nominet for .co.uk names.
Agent: an affiliate or partner of the registry that accepts requests for domain names and administers ownership of a domain, also know as a registrar.
Registrant: the registered owner of a domain.
Domains are divided into TLDs (top-level domains) and ccTLDs (country code top-level domains). There is an administrative body - the "registry" - to oversee the TLDs, and each ccTLD. Terms and conditions vary amongst them.
ICANN administer the TLDs : com, .net, .org etc, and DNS in general: Internet Corporation for Assigned Names and Numbers at http://www.icann.org/.
It devolves authority for naming to the Internet Assigned Numbers Authority (IANA)
In the UK, .uk is administered by http://www.nic.uk/. You can register a .co.uk site directly with Nominet or register through one of their agents - aka Nominet members. Direct registration is around £80+VAT for a two year period. Nominet members receive a huge bulk discount - £5+VAT for 2 years.
Terms and conditions for registering a .co.uk name are given at http://www.nominet.org.uk/ref/terms.html.
Generally these 2 sites have quite a lot of useful information on them that it would be helpful to read. This can be accessed from http://www.icann.org/general/faq1.htm and http://www.internic.com/faqs/domain-names.html.
General points and guidelines concerning domain registration and administration
To register a domain name it is necessary to approach an "agent" (aka member, registrar) of the registry that controls its TLD. Many "agents" can exist for each registry, with ICANN and it's partners ensuring, behind the scenes, that the namespace remains unique - no duplicate names are allowed.
The registry is responsible for setting up your domain name on the Domain Naming System (DNS). DNS generally describes the way that (domain) names are mapped to IP numbers, The DNS system has at its highest level in the namespace hierarchy a system of "root" servers. This is where a request for a domain name resolution is initiated. The root servers will give the location of the primary and secondary name server that is "authoritive" for your domain. It is this link, from the root server to the name server that the agent maintains and alone can modify.
For top level domains: the domain registration has 3 separate contact fieldsets. These are the "technical", "administrative" and "billing" contacts, alongside the owning individual or organisation's details. The same details can appear in all three contact types.
The content of a domain registration is different amongst ccTLDs. However they all have in common the owner (registrant) details and the location of the nameserver.
Requests to modify the nameservers for a particular domain are generally permitted only by the registrant (or owner) of the domain.
Transfer of a domain should be initiated by a request to the agent by the registrant. If they have gone bust or causing trouble, you can approach the registry itself. They will not release the domain if you have signed a contract with the agent requiring you to pay a release fee.
The nameserver can be anywhere on the net. It is simply a machine that handles domain name lookups, translating them to the numeric, lower-level IP address necessary for transport over TCP/IP. The nameserver holds a "zone file" for the domain, and this file contains the mappings for various uses of the domain - the location of the web or mail server, amongst others.
Only the administrators of the nameserver can (directly) make changes to the zone file. They will generally only respond to the person identified as the technical contact for the domain, although in some cases these changes can be made online by the user, identifying themselves with a password.
Some issues to consider about the DNS world
You can often get nameserver services from the registry you buy domains from. This can be handy, as it means you will be able to make all your adjustments, including zone-file settings, through one point of contact. Easyspace.co.uk, for example, have a great web-based account administration allowing you to easily alter DNS details, change nameservers, buy new domains and obtain other related services.
Web-based management systems are very common with agents and will give you a lot of control over the administration of the names you control. Other places can have there own tedious in-house procedures of faxing or mailing company letter-headed paper to make a change.
Most agents will automatically inform you when your registration is due to expire, giving you the option of letting it lapse or automatically renewing it.
Take care when completing the application with your personal details.The details of the owners and contacts for a domain are publicly viewable, so make sure you use suitable names, addresses and email addresses. Make sure you keep these details up to date.
You can make a "WHOIS" search at most agents, this will allow you to look up the registration details for a domain. If you search with an agent who is not responsible for the domain, you may receive a reduced set of information. You will then have to locate the site of the agent (registrar) for the domain and make the WHOIS search again.
Generally the advice above applies to the TLDs and the ".uk" group of domains. If you are using other ccTLDs be sure to check carefully their terms and conditions. ".uk" is unusual amongst ccTLDs in that it allows registrations from non-UK residents, for example, and many other differences may exist between other ccTLDs.
Unfortunately NOF are unable to give you any direct recommendations of agents, however we do recommend that you take some time to read around the nominet (nic.uk) and ICANN sites to give yourself more background information. It is a murky area of the internet and many companies have pursued this to wring out extra money from their registrants : be sure to read the terms and conditions carefully and consider the advice given above.
Should you register the site with more than one TLD?
Each domain you register will incur an ongoing cost to your organisation for renewals, redirection and alias issues of different domains. The purpose of multiple registrations is to protect your name and to try to cover misspellings of your name and to prevent competitors from bagging domains that might siphon legitimate users of your site. More a concern for commercial organisations. However, having multiple domains makes your system administration more complex. Suppose your main domain is www.abc.com, and that site has a page called www.abc.com/search/. What will you do with a second domain www.abcd.com? will www.abcd.com/search be the same as www.abc.com/search, or will your users be redirected to the main domain first? This can cause problems...you have to decide a coherent strategy for this and you should discuss the options with the systems administration people carefully to avoid problems down the road. On the whole, people will reach your site either through a link, in which case the domain is, to the user, mostly irrelevant, or through printed publicity they receive, over which you have control!
Must I have a server set up and ready before I register a name or can I do it anytime?
....Anytime, you will just need to link the two together when your server is ready...this involves configuring the webserver so that is knows about the domains it is to serve, and changing the pointer (to the IP address of the web server) in the name server. For this reason, it is great to have control over your name server: either host it yourself, or use an organisation that gives you your own web-based administration of the nameserver configuration for your domains.
Often, the registrar will also host the name server for the domain, this is fine as long as you are still able to make changes to the config - ideally online.
Domains that you do register that do not have a live webserver ready for them should have a relevant holding page giving info about your project, a contact perhaps and if possible a means to capture email addresses to build up a user base asap. Again, this should guide your choice of registrar as they should give you a couple of free, editable pages that you can use as holding pages. Or set up a temporary web server for these purposes.
Don't forget that changes to name server settings will take up to 48 hours to be distributed around the net: it isn't an instantaneous change over the whole web. When a computer requests a DNS lookup, in order to find the IP address to send a URL, it will use it's local nameserver, and that server will keep a cached copy of the DNS info for that domain for a period known as the "time to live"...only once this has expired will the nameserver go back to the authoritive nameserver to refresh it's local copy of the record. (Keeps the traffic load on the net down, basically).
What suffix should we use?
There are no rules about the suffix (.com .org .co etc.) that you use for your NOF site but again as expense is an important factor it makes sense to use the most appropriate which is probably org.uk or co.uk. Some more information on the domain name you use is available on the NOF site at http://www.ukoln.ac.uk/nof/support/help/papers/website.htm#Domain.
Netscape 4.x browsers, i.e. browsers in Netscape series 4, (e.g. Netscape 4.07, 4.08, 4.71, etc.) have very poor support for CSS and we are aware of the difficulties this can pose. However just as the difficulties posed by Netscape 4.x differ amongst those projects affected, so do their potential response to the problem; hence there is no definitive one-size-fits-all answer.
13. Can you detail some simple checks one can perform to ensure that my website is compliant with the technical standards?
There are two issues which we have noticed on a fairly regular basis and which usually require attention:
We would like to remind all projects that it is important that the character encoding used in a text document delivered over the web is clearly identified. Best practice is to identify the character encoding in use BOTH in the HTTP header and within the <head> section of the document.
Where XML/XHTML is used, it is also good practice to identify the encoding in the "<xml .." declaration at the start of the document.
Whilst the Technical Standard and Guidelines document currently mandates the use of UTF-8 encoding (setion 2.1.2), this is under revision and it is permissible to use any appropriate encoding (ie iso-8859-1, etc), as long as this is explicitly stated as discussed above. The TS&G will be updated at the end of this year to reflect this change, and at that time the section B of the quarterly monitoring reports will also be modified. In the meantime, please indicate the actual encoding your project is using when filling out your quarterly progress report.
We would also like to point out that it is necessary to include a DOCTYPE declaration at the top of all HTML or XHTML pages. This is necessary to allow your mark-up to be correctly validated, and perhaps more importantly can affect the way the user agent (browser) interprets the mark-up it finds within the page. Therefore, a DOCTYPE statement correctly identifying the version of (X)HTML that you are using on the page will improve the chances of your page rendering correctly across different browsers and clients. Please ensure that you include the DOCTYPE declaration on all your pages.
1. Where can I get a draft form to use for copyright clearance?
IPR and copyright is a very complex area and unfortunately there is no "one-size-fits-all" solution to these issues. Every resource or collection of resources may have its own IPR problems that will need to be solved before a digitisation project can go ahead. However, as it is an issue of such importance when working in a networked environment, a number of excellent resources have been produced to guide you through the process of clearing resources for use.
VADS (The Visual Arts Data Service) and TASI (The
Technical Advisory Service for Images) have produced a guide to
creating digital resources:
2. The technical standards document talks about certain institutions having access to additional resources by signing a licence committing them to non-commercial use (section 3.1.5). Who would be parties to these licences? Would these licences involve an exchange of money and if so between whom? Would the copyright owner be entitled to a fee for reproduction in the same way as with non-digital reproduction?
The principle is that the end user will be provided with access free at the point of use but that two issues have to be covered:
There is a variety of models of course but a proven model involves using contracts i.e. issuing a licence.
Let us define three parties. The contributor is the body which owns the IPR in the resource. The Service Provider is the body which stores and makes available the resource. The User Institution is the body which accesses the Service Provider under licence.
The Contributor and the Service Provider could be the same thing. However, if that is not the case, there requires to be a licence setting out the conditions under which the Service Provider may make the material available.
There may also be a payment from the Service Provider to the Contributor in respect of the IPR to allow for non-profit, non proliferation educational use from then on.
The User Institution will be licensed to use the resources by the Service Provider and may pay an annual fee to allow that access. The User Institution agrees in the licence to certain conditions - normally non-profit, non proliferation use.
The User institution then allows its user group - students, those visiting a library or museum - to access the resources.
It is expected that so long as the use was purely for non-profit, non-proliferation educational purposes, the service provider would not make a further IPR payment. The licence fee it charges being merely there to sustain the service.
An explanation of NOF's IPR conditions and to issues around Open Source systems follows.
1. The NOF IPR conditions specify (page 4 under Definitions) that 'Material means any documentation or material (including without limitation software and databases) to be provided to the Fund etc..' This is further explained in the guidance letter (13 August 02 under 'Definitions' and under '2.2 Licences' on page 4) ...'in the case of material that is delivered with software or databases specially written for this project (including any adaptation of commercially available software or databases) The Fund would expect that any commercial exploitation would recognise the use of public funds in the generation of the material. Significant commercial exploitation might involve grant repayment.'
2. The IPR conditions give the Fund the right to use the materials developed for the programme but do not provide for any 'transfer' of IPR. This is an important difference. The Fund will not 'own' the IPR to materials created (including software) but through the conditions does have rights over the commercial exploitation of the material, as explained above.
3. If any grant holder in unclear on this point and has any query regarding terms and conditions of NOF grants please contact NOF directly either through your case manager or to this email address at firstname.lastname@example.org.
4. If you are a supplier contracted to a grant-holder please address your queries to the grant-holder who will raise them with NOF where necessary.
5. Please note that neither the IPR conditions nor the Technical Standards conditions require a commitment to open source software, but we welcome that debate on the nof-jiscmail list as it raises awareness of an important issue.
This information is not tendered as advice but does indicate which projects should most likely take note of regulations as they might apply to their situation. It also provides details on sources of information of possible use to projects.
This FAQ is potentially of importance to any project which opts to:
If either of the above applies, then there are requirements which must be met in respect of three possible areas :
1 Information requirements, i.e. the information that must be provided to end-users. These requirements include providing your end users with:
The above requirements will probably apply to you if you sell or advertise goods or services online (i.e. via the Internet, interactive television or mobile telephone).
2 Commercial communications, i.e. essential identifications and explanations that must be provided to end-users, for example if a project markets via email. These requirements include providing your end users with:
Note therefore that any form of electronic communication designed to promote your goods, services or image, such as an e-mail advertising your goods or services, must:
The above requirements will probably apply to you if you promote goods or services through any form of electronic communication (e.g. an e-mail advertising your goods or services).
3 Electronic contracting, i.e. information and explanations about the process of creating a contract electronically with an end-user. These requirements include providing your end users with:
The above requirements will probably apply to you if you enable end users to place orders online.
The requirements contained in the three categories above represent the basic situation. There may
be other requirements in addition which can be ascertained from the sources of information given below.
In conclusion the DTI guidance states:
Sources of Information
The Electronic Commerce Directive (00/31/EC) & The Electronic Commerce (EC Directive)
Regulations 2002 (SI 2002 No. 2013)
Guidance on Electronic Commerce Regulations
Beginners Guide to the E-Commerce Regulations 2002
Frequently Asked Questions on The Electronic Commerce (EC Directive) Regulations 2002
The address for Northern Ireland is:
Hardware and Software
1. I am trying to find information on whether there is any OCR software that can cope reliably with 17th-19th century printed material, including material in columns. I would also like pointers to information on how existing OCR software would cope with 19th-century newspapers.
Although we do not have very much experience of individual products most OCR software would still have problems with recognising these types of text. Even apart from the likelihood of non-standard typefaces and awkward columns, most OCR software might have problems with background noise (e.g. print bleed-through or foxing) and non-standard characters. It's probably worth testing OCR software before rejecting it, as the main alternatives would be re-keying the whole text (horribly expensive) or just digital imaging.
The AHDS/OTA's guide to Creating and documenting
electronic texts is worth a look:
2. What is the situation regarding servers? Supplying video, for example, to many institutions simultaneously places huge demands upon UK Internet infrastructure as well as servers. Should we be looking to host our servers at a high capacity Super Janet node or will other provisions be made?
At this stage the NOF are not going to provide central servers on high speed networks, and the like. It will be down to the project to make arrangements to have their content connected to the Internet at speeds sufficient to deliver it to users in a useful fashion. Thus, a project delivering high bandwidth video will probably need a more robust (and faster) connection to the Net than one delivering small static images. The extra costs of this connection will need to be laid out - and justified - in the business plan.
Connection via one of the bigger SuperJANET nodes is one possibility that projects with HE partners might pursue, provided their use falls within JANET's Acceptable Use Guidelines http://www.ja.net/documents/use.html
The term Contents Management System (CMS) is usually used to describe a database which organises and provides access to digital assets, from text and images to digital graphics, animation, sound and video. This type of product is relatively new and there are a few CMS available as off-the-shelf packages. CMS range from very basic databases to sophisticated tailor-made applications and can be used to carry out a wide range of tasks, such as holding digital content, holding information about digital content, publishing online and publishing on-the-fly.
For more information see http://www.ukoln.ac.uk/nof/support/help/papers/cms/
Do I need one for my project? Is a database sufficient?
The CMS provides mechamisms to support asset management, internal and external linking, validation, access control and other functionality. Typically, a CMS is built on an underlying database technology.
Content Management Systems range from very basic databases, to sophisticated tailor-made applications. They facilitate easier tracking of different parts of a Web site, enabling, for example, staff to easily see where changes have been made recently and - perhaps - where they might need to make changes (a 'News' page that hasn't been edited for 6 months?). They also ease the handling of routine updating/modifying of pages, where you want to change a logo or text on every page, for example.
A CMS can also simplify internal workflow processes and can ensure that you are working with a single master copy of each digital asset.
However there are other approaches which may be useable, such as making use of server-side scripting to manage resources.
Solutions may include:
Use of a dedicated CMS system. Note this may be expensive, and there may be costs in learning the system, using it, etc. In addition you should ensure that an 'off-the-shelf' CMS product supports the metadata standards one might expect to use.
Use of a an open source CMS system. This avoids licence costs, but there are still resource issues.
Use of a database. May manage the resources but will it address issues such as workflow?
Use of server-side scripting approaches, such as PHP (Unix) and ASP (NT). These may allow bespoke applications to be developed, and may sit on top of databases.
To summarise then, the issue to be aware of is the difficulties in maintaining resources in formats such as HTML. Using flat files and a CMS and/or databse is a way of addressing this management issue. Whilst it is not an explicit requirement that projects manage their resources with a CMS and/or a database, if such tools are not used, the project must show how it intends to faciltate good management of its digital assets.
Advice on selecting scanning equipment can be found in the
Digitistation Process information paper available at:
Suitable resolutions for digital master files for various media types are discussed in the HEDS Matrix , and the JIDI Feasibility Study  contains a useful table of baseline standards of minimum values of resolutions according to original material type.
A detailed discussion of resolution, binary and bit depth can be found on TASI's Web pages  and a good basic guide to colour capture can also be found on the EPIcentre Web pages .
References refer to :
Also both the HEDS and TASI sites, particularly at:
5. Which is better SQL Server or an Access Database?
MS Access was designed as a database system for small scale office use. It was not designed for use as a database server, although it can be used in this mode for simple use.
For further information we suggest you look at the Microsoft Web site.
Some comments on Access vs SQLServer are available at:
You may also wish to look at:
Although Access may be capable of handling the sorts of query volume you suggest, at least in the short term, you do need to consider scalability (SQLServer scales better), Web site integration (SQLServer *probably* integrates better), enterprise access to the data (SQLServer will better enable intranet access to the data, etc).
Data structures are unlikely to be affected by a move to SQLServer.
6. Would NOF recommend the web site to be hosted on its own dedicated server or on a shared server? What are the things I should bear in mind to take such a decision?
The issues to be considered in making this decision relate to performance, security and potential conflicts between software applications:
Performance - with a shared server, projects will need to ensure that the performance of their service is not impacted by the other things the server is doing. Peak access times (of both the project's service and the other services on the shared machine) need to be considered. Projects need to think about the performance of the server itself, as well as available network bandwidth to the machine.
Security - as a general rule, the more services offered by a machine, the harder it is to make that machine secure. Projects should ensure that any machine they use is operated in a secure manner.
Conflicts - on a shared server there are more likely to be software conflicts, e.g. to run package X, package Y needs to be installed, but this conflicts with package Z that is already installed for some other service. There are also issues associated with hosting more than one Web server on a single machine. Typically these are resolved by hosting multiple 'named virtual hosts' (though under some operating systems it is also possible to assign multiple 'virtual' IP addresses to a single network interface, or to install multiple network interfaces). Where 'named virtual hosts' are used it should be noted that the browser must support HTTP 1.1. However, this is not a significant problem. The Apache manual says:
"The main disadvantage [of name-based virtual hosts] is that the client must support this part of the protocol. Almost all browsers do, but there are still tiny numbers of very old browsers in use which do not."
7. Should the web server be protected by a firewall? (Or is it enough to have a firewall installed on our office network server where we will store all our digital mastercopies?)
As the technical standards state (section 3.1.4):
"Machines should be placed behind a firewall if possible, with access to the Internet only on those ports that are required for the project being delivered."
This applies to all machines used to deliver the project. Projects are strongly encouraged to protect all machines used to deliver material (Web servers and back-end master storage servers) with a firewall.
8. Should we require back ups from the web hosting organisation? (We will have all back ups of mastercopies on our own network server in the office.)
Projects will need to be able to recover their Web service in the event of server failure, disk failure, or malicious hacking. Backups therefore need to be taken of all files that need to be restored to recover a service. I would anticipate that, in most cases, this means that projects will need to take backups of more than just master copies.
9. What type of connection and recommended speed should the web server have to the local ethernet (eg 100Mbits/sec)?
It is not possible to give a single answer to this kind of question. The issue is ensuring sufficient bandwidth, given anticipated levels of traffic. Traffic levels will depend on numbers of users and the kind of material being accessed. A project that anticipated 10 concurrent users accessing text-based material will have significantly less bandwidth requirements than a project anticipating 100 concurrent users accessing streaming video.
Access performance should be 'reasonable' for all resources served by the project, but it is difficult to provide guidance currently on what 'reasonable' means. Available bandwidth at the server end of less than 56K for any individual end-user is likely to have an impact on their perception of server performance. Image or video projects will probably want to aim much higher than this. Available bandwidth is total bandwidth divided by total numbers of users (but remember to allow for bandwidth being used for other things - e.g. some public library/council networks will have bandwidth reserved for CCTV, administrative computing and so on.
As part of the Technical Evaluation process run by
BECTa, the software that
each project is using is recorded.
The NOF technical standards and guidelines require that all text is encoded
in a way that makes it compatible with Unicode UTF-8. This allows for the
simultaneous use of languages that deploy different (e.g. Roman and
non-Roman) character sets, including many of the community languages being
used by NOF-digi projects. Project managers need to be aware of what
hardware / software is required in order to use Unicode. Basic information
on Unicode is available from
Windows 2000 and XP currently both support Unicode, whereas earlier versions do not. However, *applications* running on the earlier Windows operating systems can still support Unicode.
The upcoming Mac OS 10.2 is alleged to have better Unicode support than
previous versions. For information on Unicode and previous Mac operating
Some web browsers are better than others are reading community language scripts but any browser which claims to support HTML4 should be able to support Unicode. Overall, Mozilla, the open-source browser, is the preferred choice, followed by Netscape Navigator, then Internet Explorer. See http://www.alanwood.net/unicode/browsers.html for more details on how browsers need to be configured to read Unicode
It is necessary to obtain a Unicode font in order to display the different character sets. For a list of all the different Unicode fonts, Alan Wood's site is again a good source of information (see http://www.alanwood.net/unicode/fonts.html). Often a Unicode font comes embedded within particular applications. Many PCs have Microsoft's Arial Unicode font installed along with their copy of Microsoft Office 2000. Those without Office 2000 used to be able to download this font for free, but the font was removed from the Microsoft website in August 2002, leaving no suitable free Unicode in existence. For a further discussion on this see http://slashdot.org/comments.pl?sid=38224&cid=4092943.
Many developers employing Unicode, however, prefer to use one of two software packages which act as multi-character set text editors. These come with their own rudimentary Unicode fonts. Unipad, available for free, can be downloaded from http://www.unipad.org/, with versions for Windows 95 and above. A trial version of Uniedit, which should run on Windows 3.1 and above, is available from http://www.humancomp.org/uniintro.htm. Both programs cater for built-in keyboards and a wide variety of character sets - although some of these sets require further downloading from the relevant websites.
1. Does anyone have any thoughts on the use of file formats such as Flash or SVG in projects? There is no mention of their use in the technical specifications so I wondered whether their suitability or otherwise had been considered.
The general advice is that where the job can be done effectively using non-proprietary solutions, and avoiding plug-ins, this should be done. If there is a compelling case for making use of proprietary formats or formats that require the user to have a plug-in then that case can be made in the business plan, provided this case does not contradict any of the MUST requirements of the nof technical guidelines document.
Flash is a proprietary solution, which is owned
by Macromedia. As with any proprietary solutions there are
dangers in adopting it as a solution: there is no guarantee that
readers will remain free in the long term, readers (and authoring
tools) may only be available on popular platforms, the future of
the format would be uncertain if the company went out of
business, was taken over, etc.
2. Will NOF allow information to be digitised using PDF as opposed to HTML?
PDF (Portable Document Format) is a proprietary file format owned by Adobe, see <http://www.adobe.com/products/acrobat/adobepdf.html>. The format preserves the fonts, formatting, colours and graphics of the source document. PDF files are compact and can be viewed and printed with the freely available Adobe Acrobat Reader.
As with any proprietary solution there are dangers in adopting it as a solution: there is no guarantee that readers will remain free in the long term, readers (and authoring tools) may only be available on popular platforms, the future of the format would be uncertain if the company went out of business, was taken over, etc.
Other limitations of PDF include difficulties in defining the structure of documents (as opposed to the appearance), providing hyperlinking, and providing universal access to viewers with old or specialist browsers.
PDF does, however, have the advantage of being easy to create and preserving the appearance of source documents.
The recommended open formats for providing documents on the Web is HTML/HTML/XHTML (to define the document structure) together with CSS/XSL (to define the appearance of the document).
If PDF is used for a NOF project, the project holder should ensure that a case is made for its use in the business case and migration strategy has been established which will enable a transition to open standards to be made.
A few years ago VRML (Virtual Reality Markup Language) was thought to be the emerging standard for virtual reality. However VRML has failed to gain widescale market acceptance. VRML is now evolving. Its successor X3D will make use of XML to provide the syntax for 3D worlds. The development of X3D is being coordinated by the Web3D Consortium - see http://www.web3d.org/
A range of browser plugins to render X3D worlds are available, see the Web3D Consortium web site for details.
The requirement that alternative format must be provided if a plug-in is required is intended primarily for accessibility purposes and to ensure that an open format is available if a project makes use of a proprietary format which requires a plugin. In the case of 3D visualisation it is recognised that a textual equivalent will probably not be appropriate and since X3D is an open standard which is currently accessible primarily through use of browser plugins, the use of these plugins is acceptable.
4. Can NOF provide advice on the use of Java in NOF-digitise project Web sites?
The Java language was developed by Sun. Although it has been submitted to several standards bodies, it has not been standardised and remains a solution. Java applets which run within the Web browser do not appear to have taken off, due to inconsistencies in the Java virtual machine used within Web browsers and resource and performance problems. In addition much of the functionality provided initially by Java applets can now be carried out using open W3C standards such as HTML, CSS and the DOM (often referred to as Dynamic HTML or DHTML).
Java can, however, provide a suitable resource at the server, as opposed to the client. Java Server Pages can provide server-side scripting facilities, and Java Beans can provide integration with other server-side services.
However, you would need to make a very good case for the NEED in your business case; a whizzy, jazzy interface does not count as a NEED.
We should like to use Java as part of the user interface but will Java continue to be supported in IE Version 6, etc.?
A capability to run Java is not included, by default, with the
latest version of Microsoft's Internet Explorer (IE6). However,
Microsoft's own set of Frequently Asked Questions about IE6
A: Yes, Internet Explorer 6 supports Java. Java applets run in Internet Explorer 6 just as they run in older versions of Internet Explorer. The Java VM is not installed as part of the typical installation, but is installed on demand when a user encounters a page that uses a Java Applet.
It should be noted, though, that the download is not small, and that this may deter users visiting your site using a typical telephone modem, as they will need to download this additional plug-in before interacting with your Java-enabled site.
We would would stress that NOF projects are encouraged to avoid non-standardised solutions such as Java and that - if Java is used - it is strongly recommended that this be implemented on the SERVER, rather than expecting users' web clients to handle any Java.
5. Would it be acceptable to store some original images as PSD (i.e. PhotoShop) formatted files?
The standards guidelines advise against using a proprietary format, which this is, unless you have a very good reason to do so. Photoshop is perfectly capable of saving files in the TIFF format so you would need to justify why there is any benefit in storing archival images in PSD. That's not to say, of course, that you can't take a copy from an archival TIFF and store it in PSD to work on. Try to create a 'master file' in a format that can be re-digitised from easily.
6. Is Realvideo an acceptable format for video?
It is suggested that all projects work from a format that can be re-digitised from easily e.g. for video DV or DV Cam. Media, particularly video will need to be redigitised for delivery as technology advances. An equally important issue is probably the copyright and making sure thatall footage is covered by "blood chits" which hands over all the rights to the projects.
Although the use of Realvideo is not particularly recommended, we accept that in some cases the use of proprietary or non-standard formats may be the most appropriate solution. However, where proprietary standards are used, the project must explore a migration strategy that will enable a transition to open standards to be made in the future.
With regard to Real, if you do use this you should check that the stringent conditions which encoding with Real imply are suitable for both the project and the programme.
7. Some of our material contains tables. Should we be treating it/them in the same way as for line-drawings i.e. provided on the web as GIFs?
Digitised materials must be accessible - for example people with visual impairments should be able to process information through use of a speaking browser. Tables are permitted if they are comprehensible when linearised.
The use of GIFs to display significant amounts of textual information is *not* acceptable as these cannot be interpreted through speaking browsers. Restricting tabulated information to an HTML format should not limit formatting possibilities given the use of cascading style sheets and that tables created in Excel, for example, can be saved as HTML.
8. Can you advise on delivering video on mobile devices?
For delivery of moving video to mobile devices, it is likely that UMTS will be available by 2002, with a bandwidth approaching 2Mbps. See http://www.umts-forum.org/
In terms of delivering different versions of sites to multiple platforms, the use of XSLT to transform XML out of a database in order to display (X)HTML of different forms might be worth exploring. For further infoamrtion see http://www.w3.org/TR/xslt
9. Do you have any advice on standards of digital audio?
Standards for storage and playout may differ. Commonly an archive/library would wish to store (preserve) the highest quality possible - meaning uncompressed - but would deliver using a standard and datarate appropriate to user requirements.
Electronic delivery could then involve compression.
Software which delivers at multiple datarates, according to an internet user's connection, is now available from Real and Quicktime, amongst others, but the 'master copy' should ordinarily be the 'best available', which would usually mean uncompressed, linear PCM with a sampling rate and quantisation appropriate to the bandwidth and dynamic range of the material. This form of audio is typically held in .WAV files (though there are over 100 registered forms of coded audio that are possible within WAV, including highly compressed).
Within European broadcasting, 16-bit quantisation and 48 kHz
sampling are the EBU (European Broadcasting Union) recommendation
for broadcast quality audio. The EBU has gone a step further
and added metadata to WAV, to add information critical to
broadcasting and broadcast archives, forming the "Broadcast Wave
Format" standard: BWF.
The actual transfer of analogue material to digital format, especially in bulk or for unique items, is not simple. For European Radio Archives, standardisation and guidance is being developed within EBU Panel 'Future Radio Archives', http://www.ebu.ch/pmc_fra.html
In their public service role, the BBC would be pleased to offer advice to libraries / archives requiring help - providing it is for non-commercial purposes.
10. We will be creating digital images of objects containing text, such as handbills and tax returns. Do we have to re-key the text in these images as HTML? We are concerned as there are significant cost implications associated with this, especially as we don't think OCR software may be suitable for the objects that we want to digitise.
The use that you intend to make of your digitised objects that contain text is the factor in deciding whether or not you re-key the text into a machine-readable format.
If you are just digitising one or two such objects as examples of what they look like (i.e. you are interested in the visual appearance of the objects rather than the content of the text) it may not be necessary to produce the text as HTML as well.
However, if the idea is that the original text can be searched etc. then it *will* need to be produced in a machine-readable format. In any case, you will need to create metadata to describe the image.
We do recognise that if text does need to be re-keyed, significant technical effort will be required, particularly if OCR software cannot be used.
So re-keying/using OCR software to convert to a machine-readable format would be the preferred solution. It would be for the project to make a case to NOF if this is not done, on grounds of cost, for example.
11. We are digitising audio and video for streaming over the net. There
are a number of differing proprietary architects available. i.e.
Realmedia, QuickTime, Windows media.
MPEG is a set of international standards for audio and video compression. MPEG-4 is the newest MPEG standard, and is designed for delivery of interactive multimedia across networks. As such, it is more than a single codec, and includes specifications for audio, video and interactivity. Windows Media encodes and decodes MPEG4. Quicktime and Realplayer are working on versions which will do the same.
MPEG produces high quality video which can be streamed over networks. Quicktime and Realmedia use the MPEG standards to improve the quality of their files for delivery on the web.
For detailed information on codecs it is recommend that you
look at the following site
In answer to your questions, can information can stored as an MPEG file? Which encoding software would allow this and ... how would it be played?
It is possible to store audio or video in an MPEG format, and to play an MPEG file. This would be NOF's preferred solution, as proper MPEG files are open, non-proprietary, and should be readable by most audio and video player programs and plug-ins. Many/most current web browsers have the capability to play MPEG-1 video without any extra plug-ins.
RealPlayer, Windows Media Player et al support a variety of audio and video formats, including MPEG, and a range of proprietary formats such as AVI.
See, amongst many possibilities,
More info on software is available from
12. Is XHTML the only future-proof way of presenting multi-page document? Can PDF be used?
As the W3C's XHTML webpage says (see http://www.w3.org/TR/2001/WD-xhtml1-20011004/) "This specification defines the Second Edition of XHTML 1.0, a reformulation of HTML 4 as an XML 1.0 application, and three DTDs corresponding to the ones defined by HTML 4". If you're familiar with HTML 4.0, then the benefit that XHTML brings is to apply the rigor of XML to a markup language with which many people and applications are already familiar.
Whilst there is no guarantee that anything is future-proof, the huge amount of information already available in (X)HTML, and the likelihood that future generations of web browsers will be able to read (X)HTML data, makes it highly likely that (X)HTML data will remain usable for many years to come -- and the added rigor of XHTML improves the situation greatly.
However, XHTML (like HTML) has no notion of "page", and it is left to the document author to decide how information should be broken down into usable chunks. For example, I could convert the text of a standard dictionary into XHTML, and then choose to present the data as a single (very, very large!) XHTML page with alphabetical subsections, or a series of 26 (very large) XHTML pages with head-word subsections, or many many thousands of XHTML pages each containing a separate head-word entry, or simply turn the text of each printed page into a separate XHTML page and organize the whole thing by naming pages after page numbers. How I choose to organize the information might depend on a number of factors, for example whether I was attempting to in some way model the existing print publication in electronic form (e.g. by structuring it on the basis of printed pages), model its organizational structure (e.g. by breaking the text into 26 subsections, or many entries), or make the electronic version easy to use (e.g. by presenting each entry as a separate XHTML page).
In other words, XHTML will allow you to model your information pretty much however you see fit: if capturing "pages" is important to you then that is possible -- but if usability is an overriding concern, you may decide to present the electronic information organized in some other way.
By comparison, PDF can provide you with an effective mechanism for capturing pages (or page images), and producing an electronic book which can be used and navigated in a very similar fashion to a conventional print publication. If visual fidelity to a non-electronic source is important to you, then this may be an important consideration. However, a similar effect could be achieved by creating XHTML pages containing transcriptions of each page from the printed source, and also links to scanned images of each page.
The concern about PDF stems from how much faith one is prepared to put in the future availability of software that can read PDF files, and whether or not the (proprietary) owners of PDF will continue to make future readers and/or versions of PDF backwardly compatible. That said, there is a vast amount of (mainly commercial) information already available in PDF, and several large public bodies are willing to consider PDF as a long-term preservation format (although not NOF).
Unless you already have a vast amount of PDF data (which you are unwilling to consider converting to XHTML), then we would strongly recommend that you consider using XHTML to prepare you data -- assuming you are able to resolve the issue of pagination.
13. Can we store our digital masters as JPEGS instead of TIFFs? We have compared the quality of both at enormous magnification and discovered no difference at all. The NOF standards prefer TIFF but say that one could use JPEGs if necessary. We have used them with our lottery funded digitisation project very successfully. It would make our work a lot a lot easier...
We would need more information on what sort of images are being captured before giving a definite answer, as the lossy compression used in creating JPEGs has more of an impact on some sorts of imagery/colour variation than on others.
However, whilst a compressed JPEG may be more useful for everyday use and delivery, the quality of an uncompressed TIFF master image is your best bet both for long-term viability of the image, and in allowing you maximum flexibility in the future as to what you do to/with the images.
We would still therefore suggest that TIFF is the format which projects should aim to use, and any deviation from this should be supported by a stronger argument than 'it would be easier'...
Have a look at the following resources...
Synchronised Multimedia Integration Language
Synchronized Multimedia On The Web
Scalable Vector Graphics (SVG)
Scalable Vector Graphics (SVG): Vector Graphics for the Web
Scalable Vector Graphics (SVG)
The reason that TIFF is recommended over JPEG is because JPEG is an inherently lossy compression technique. This means that whenever an image file is converted to JPEG format, some detail is lost. However, as you have noticed, the changes that occur are very subtle at high "quality" settings of JPEG compression. You say that you cannot "see" any difference: can I suggest that you try: Open your two specimen files, JPEG and TIFF, side by side Blow up to maximum zoom the same area on both images. Select a portion of the image with a good range of colours: an edge of the object, for example. Select the Eyedropper tool (keystroke "I" for shortcut) and make sure the "Color" floating tool bar is open. Right click on any part of either image and select "Point Sample" (or adjust this setting in Eyedropper options on the "Options" floating tool bar. Now left click on a pixel in the TIFF image. In the "Color" tool bar you should see the colour value of the pixel you have selected. Note this value. Left click on the same pixel in the JPEG image. Note the displayed colour value.
You should observe that there is a general slight difference in the colour value at any specific point in the image. Indeed, it is very difficult to "see" this difference with the eye, but I hope that this numerical demonstration will prove to you that the two images are not identical. The JPEG compression routine does not store the discreet value of each pixel in the image, it stores a mathematical function that is used to re-generate the colour values and this process will result in approximate values for many of the pixels in the image.
Note also that TIFF files can be stored with LZW compression enabled, reducing the size of the file dramatically. LZW compression does not result in any change to the values of any pixels in the image, so is suitable for archiving and preservation purposes.
RealAudio is currently recommended in the technical standards as a format that can be used to create and store sound. However it has recently come to light that creating and storing sound as RealAudio could create major problems (delivery is still OK). RealAudio is an encrypted closed format which no other software package can import. Real have gone as far as suing the manufacturer of a software package that did allow this. The Managing Agent and Advisory Services (MAAS), a new national service acquiring moving pictures and sound for delivery online to the higher and further education communities in the UK, are not recommending RealAudio. This is because they also appear to retain IPR in the encodings.
For more information see the MAAS Site.
This FAQ addresses two areas where many projects are failing to achieve compliance with NOF technical standards. Please take heed of the following comments to ensure that your project is fulfilling these requirements.
1) CHARACTER ENCODING:
We would like to remind all projects that it is important that the character encoding used in a text document delivered over the web is clearly identified. Best practice is to identify the character encoding in use BOTH in the HTTP header and within thesection of the document.
Where XML/XHTML is used, it is also good practice to identify the encoding in the "<?xml ..." declaration at the start of the document. However this may cause problems with some client web browsers and can be safely ommitted.
Whilst the Technical Standard and Guidelines document currently mandates the use of UTF-8 encoding (setion 2.1.2), this is under revision and it is permissible to use any appropriate encoding (ie iso-8859-1, etc), as long as this is explicitly stated as discussed above. The TS&G will be updated at the end of this year to reflect this change, and at that time the section B of the quarterly monitoring reports will also be modified. In the meantime, please indicate the actual encoding your project is using when filling out your quarterly progress report.
2) DOCTYPE DECLARATION:
We would also like to point out that it is necessary to include a DOCTYPE declaration at the top of all HTML or XHTML pages. This is necessary to allow your mark-up to be correctly validated, and perhaps more importantly can affect the way the user agent (browser) interprets the tags it finds within the page. Therefore, a DOCTYPE statement correctly identifying the version of (X)HTML that you are using on the page will improve the chances of your page rendering correctly across different browsers and clients. Please ensure that you include the DOCTYPE declaration on all your pages.
18. Why do I have to think about a preservation
strategy if I am satisfied the file
The short answer is that even the short history of information technology
is already littered with
It is important to recall the requirement that all projects should be
aware of the need for
It is essential to remember that the issue of preservation is not confined
to arguments about
Even within the digitisation process there is a need for a technical strategy
from the outset
It is also important to accept that even once a preservation strategy has been
developed, the shifting nature of technological development is such that any
strategy has to be revisited if it
nof-digitise Technical Advisory Service Programme Manual: Section 2: Digital
nof-digitise Technical Advisory Service The Digitisation Process
The Cedars Project
Creating A Viable Data Resource
Emulation as Preservation Strategy
Migration - a CAMiLEON discussion paper
TASI : Establishing a Digital Preservation Strategy
1. Is there a glossary or simplified version of the various metadata standards?
UKOLN has produced a metadata glossary at http://www.ukoln.ac.uk/metadata/glossary/
2. As we are developing sites for lifelong learners, do you have any views on whether we should use metadata appropriate for learning packages, e.g. the IMS Learning Resource Metadata Model or LOM (Learning Object Metadata)?
Although the IMS Learning Resource Metadata Model or IEEE Learning Object Metadata (LOM) would be relevant, both these place a significant overhead on the metadata creator; a LOM record could take an hour or more to complete in extreme cases, for example.
We feel that LOM/IMS is too big an overhead for what these projects are meant to be doing (although a LOM/IMS description of each project might be worht considering).
An alternative might be to use Dublin Core with the extensions
proposed by the Education Working Group (DCEd) of the DCMI. They
have proposed an "Audience" element, and suggest adopting
"InteractivityType", "InteractivityLevel", and
"TypicalLearningTime" elements from the IEEE LOM standard. More
information is available at:
More general information on educational metadata is available
in the two recent SCHEMAS Metadata Watch reports:
Also see the UK's Metadata for Education Group at http://www.ukoln.ac.uk/metadata/education/
3. Are there recommended standards for the core and extended metadata attributes that should be created for digitised resources, especially images. Dublin Core provides one simple model but is very general, other possible approaches would presumably include MARC and CIMI, but some shared approach to this is presumably seen as valuable.
There are, in fact, quite a few relevant standards. For resource discovery, the nof-digitise guidelines (5.2.1) suggest that "item-level descriptions should be based on the Dublin Core and should be in line with developing e-government and UfI metadata standards." In a Dublin Core context, the specifics of using DCMES for images was discussed at DC-3 - the Image Metadata Workshop held in Dublin, Ohio in September 1996. This workshop resulted in the addition of two new elements to the original thirteen and made some changes to element descriptions.
There is some useful information on DC and other image metadata formats in section 4 of the VADS/TASI guide to creating digital resources in the AHDS Guides to Good Practice series:
Catherine Grout, et al., Creating digital resources for the
visual arts: standards and good practice. Oxford: Oxbow Books,
This mentions things like the CIMI DTD, MARC, the CIDOC standards, etc. as well as more specialised things like the Visual Resources Association (VRA) Core Record.
There is information on more specialised administrative and structural metadata in the Making of America II project's final report:
Bernard J. Hurley, John Price-Wilkin, Merrilee Proffitt and
Howard Besser, The Making of America II Testbed Project: a
digital library service model. Washington, D.C.: Council on
Library and Information Resources, 1999.
A shorter list of elements with a primary focus on preservation is available at:
RLG Working Group on Preservation Issues of Metadata, Final
report. Mountain View, Calif.: Research Libraries Group,
There may be some useful background information in the
following conference paper: Michael Day, Metadata for images:
emerging practice and standards (1999).
Also, see The Application of Metadata Standards to Video
Indexing Jane Hunter, CITEC (email@example.com) and Renato
Iannella, DSTC Pty Ltd.
This paper first outlines a multi-level video indexing approach based on Dublin Core extensions and the Resource Description Framework (RDF). The advantages and disadvantages of this approach are discussed in the context of the requirements of the proposed MPEG-7 ("Multimedia Content Description Interface") standard. The related work on SMIL (Synchronized Multimedia Integration Language) by the W3C SYMM working group is then described. Suggestions for how this work can be applied to video metadata are made. Finally a hybrid approach is proposed based on the combined use of Dublin Core and the currently undefined MPEG-7 standard within the RDF which will provide a solution to the problem of satisfying widely differing user requirements.
CIMI Guide to Good Practice (DC)
4. Can you advise on approaches to/chosen standards for metadata for sound files. Are there any recently developed models of good practice?
The following resources may be useful
An evaluation of resource discovery metadata for sound
resources by the Performing Arts Data Service:
Outcomes of the Harmonica project might also be helpful:
As a more practical example, Jon Maslin (J.Maslin@surrey.ac.uk) describes the approach taken to creating metadata for music recordings, scores and video in the performing arts at the University of Surrey:
We have adopted the Dublin Core as a basis for our metadata because we needed a clearly defined structure and wanted, if possible, to adopt a standard. It was adopted while it was still unclear in some respects, but we knew what we had to achieve so we selected only the relevant elements, expanded some and extended DC with new elements needed for the application.
So, while it was convenient to use it we had to extend it, but did not use all the elements.
We are using the same schema for music recordings, scores and video in the performing arts.
A copy of the schema (DTD) and some samples are attached.
Dublin core elements are preceded by DC
The title conforms to a standard agreed with the music department.
Creators and contributors: The roles of these are defined with their names. There can be an unlimited number. We have not adopted a dictionary of defined roles as the performing arts has a potentially unlimited number, but have taken the view that different applications will act upon the metadata and that retrieval software will be sufficiently intelligent to take care of interpreting different roles (hence an informal convention of adopting the terminology on the source and defining the instrument rather than the role (largely to avoid contortions such as guitarist). It is debatable, as is the difference between creator and contributor in some instances. We have tended to class producers, recording engineers as a contributor. One of the benefits of defining a role is that the importance may not terribly significant.
A similar approach has been adopted for other elements, such as the place and time of recording. We have limited this to a few attributes for our own convenience. There is no reason why this should not be expanded in the way that creators element is used.
The location elemented is structured as a URL. In the example you will see it pointing to the patronserver. It can be to any other web server or a direct file access
In addition a number of patron elements have been added which relate to courses. Another element has been added to define uniquely a title, eg all scores and recordings of a piece have an id.
The most extensive addition has been to define the contents of a piece in a standard way regardless of type or medium. In effect this gives a multi-level table of contents. It has been designed to provide an objective series of access points which can be created without extensive subject knowledge. Typically a classical piece of music will list the movements with references to starting and stopping times. Scores have access points to movements and page numbers and repeats if required. There is no limit to the granularity (beyond time and patience).
Processing and exchange
It is important to remember that this is entirely independent of the application. The advantage of the XML implementation is that variations in application are relatively simple - in Patron the application displays these in cascading hierarchies.One of the objectives has been to include sufficient data and structure to allow the metadata to be exchanged and processed for the current implementation of Patron and possible enhancement,and also to be developed with more universal standards.
We have created the metadata from a MS Access database which also holds rights information. We have also developed a form builder which automatically creates an input form from the metadata schema. This enables metadata to be created and tested rapidly, and allows inputters to adopt previously entered data to reduce time and to ensure accuracy.
So, the answer is yes and it works, but that it has been application-driven: other applications would need to add to it.
5. Are there any software tools for handling various metadata formats?
Tools that may be of some assistance in describing Web sites
may be found at:
Metadata should be capable of supporting the delivery of item-level DC descriptions of all project resources.
7. Is simple Dublin Core metadata sufficient or are qualifiers needed? If they are, which ones should be used and how will interoperability between different domains be handled?
The 15 Dublin Core metadata elements form a fairly basic cross-domain core that ensures a degree of commonality across domains and applications. In order to less ambiguously express richer or more structured information than is possible in the 15 elements, the Dublin Core community supports the notion of qualification, using element refinements and encoding schemes.
An initial set of these is defined by the Dublin Core community, in the Dublin Core Qualifiers , and these are a good place to start. Where the agreed qualifiers do not meet your needs, it is possible to define others, either within your project or as part of a broader domain-based interest group.
In defining new qualifiers it is important to ensure
As an illustration, the DC-Government Working Group recently proposed 'previousAccessMarkingChangeDate' as a refinement of DC.Rights. This was rejected because the definition of DC.Rights is:
'Information about rights held in and over the resource.'
A value of the proposed 'previousAccessMarkingChangeDate' element refinement would have been a simple date, which, on its own, does not constitute 'information about rights held in and over the resource'.
8. We are planning to digitise and make accessible through a database 20,000 photographs. We are collecting enough detail at item level to create dublin core. Please can you give some examples of dynamic, database-driven sites which use this.
The AHDS gateway http://www.ahds.ac.uk visibly
displays DC metadata.
9. We are cataloguing video clips and each item has approximately 20 metadata fields that need to be incorporated in the site, offering advanced search options. How would I incorporate a metadata structure that conforms to e-Government standards. What steps do I need to take to achieve this?
The Dublin Core (DC) metadata scheme is based on a set of 15 core elements that are generic enough to define individual digital objects, however and wherever they have been created. Elements included in the list include 'title', 'creator', 'date' etc. A full list of these elements is available from http://dublincore.org/documents/dces/.
In many cases, however, these 15 elements are not sufficient to define accurately the objects in question. The elements are then extended or qualified to define further the resource. For one type of digital resource, an HTML page, one often sees the date element extended to include fields called 'date.created' and 'date.lastmodified', i.e. the metadata includes two dates, one informing when the page was first created and a second informing when it was last updated. For a video collection the rights element may well need to be extended so to record the various copyright issues involved. Sometimes DC elements can be qualified according to examples set by others trying to define similar digital objects; in other cases, projects need to develop their own qualifying terms.
For the criteria mentioned in the query, it would probably be best to have multiple qualifications of the creator and contributor elements to record details of interviewers, interviewees, gender etc. "Which tape" and "absolute address" could probably be slotted under the 'title', 'identifier' or 'source' elements.
It's important to note that there is no perfect metadata scheme for any one collection. How you qualify your DC metadata can depend on how your resources are being digitised or what soft- and hardware you are using. Perhaps most importantly, any metadata scheme depends on who will be searching for your resources. A metadata scheme has to be set up to allow users to find the information they need, so, in an ideal world, the creation of a metadata scheme will follow a period of research on user needs. Users must be thought of in the broadest terms, including not only a general public, for example, but future custodians of the collection. While members of the general public may want to metadata fields which permit they to do advanced searches, future custodians may need to find detailed information on the copyright holders of the videos in questions. This could be recorded in the 'rights' element.
There is a Dublin Core user group especially devoted to metadata issues surrounding moving images, although it is not particularly well developed at the moment. The user group is housed at http://dublincore.org/groups/moving-pictures/ One case study (at http://ahds.ac.uk/shakespeare.htm) gives an indication of how one digital project recording theatrical performances went about creating its metadata.
Dublin Core is recommended by the NOF-digi technical standards because its common takeup should allow digital collections around the country to be interoperable with one another, i.e. to allow users to search through more than one collection at the same time.
We would also point you to a (rather technical) paper looking
at video metadata representation (mainly MPEG-7) at:
In addition, as this metadata seems to describes individuals there may also be important data protection problems that need to be solved.
The Identifier element has to be an unambiguous reference because it defines the actual item/resource being described. When describing the kind of resources you will be creating within your NOF project the Identifier element will most likely need to include the Project's Image Number and any reference numbers used by the host institutions (e.g. accession numbers). It could be a URI (Uniform resource identifier) but should not be just the URL of the resource, though the URL could be included.
A few examples of the type of thing you should be putting in the Identifier element are listed below (these are taken from DC Assist)
<meta name="DC.Identifier" scheme="URI" content="http://foo.bar.org/zaf/">
Just to clarify the temporal coverage is "date range" (1939/1945).
You could look at some examples from dc-assist - http://www.ukoln.ac.uk/metadata/dcassist/
For spatial coverage, values might be:
For temporal coverage, values might be:
Question - How does one use the Period encoding scheme for the element Coverage, Time.? Can I just simply list the Period in a field called Coverage, Period. I found the explanation in the DCMI site difficult to understand.
Again, how you manage it in your database is up to you, but it probably makes sense to have separate fields for the start date, end date and name of the Period (I'd suggest you probably don't need to store the name of the date scheme in your database as that should be constant). You might need to make the group repeatable if you envisage multiple ranges for temporal coverage, but that does seem quite complex.
When you expose/export your metadata, the start date, end date, scheme
and name of a range all form part of the value of an occurrence of the
spatial coverage property. N.B. this is still a spatial coverage
property: "DCMI Period" is the name of an encoding scheme. You might
want to check the distinction DC makes between "element refinements"
(like "spatial" and "temporal") and "encoding schemes" (like
DCMI-Period, or a subject scheme). See the start of:
Anyway....In the database, you might have a record with fields like:
But when you expose/export the DC metadata record the value of the
temporal coverage property would be encoded as
The RLG Working Group which suggests using 16 elements to capture crucial information about a digital file, their elements are fairly 'lightweight' and would probably be OK for a digitisation project, assuming that some descriptive metadata (e.g. DCMES) is also available. It's a bit old now, and it might be worth looking at METS http://www.ukoln.ac.uk/metadata/resources/mets/ or the more detailed set of elements which can be found in the draft NISO Technical Metadata for Digital Still Images standard. This can be found (in PDF) at http://www.niso.org/committees/committee_au.html and is also mentioned in the NOF guidelines.
Other guidance would be available in:
You could also have a look at the OCLC/RLG Preservation Metadata Working Group which has published an overview (chiefly of the OAIS model, and the specifications developed by Cedars, NEDLIB and NLA) and recommendations for 'Content Information' and a forthcoming one on 'Preservation Description Information' (these are OAIS terms): http://www.oclc.org/research/pmwg/
Or there is OCLC's own preservation metadata set at http://www.oclc.org/digitalpreservation/archiving/metadataset.pdf
The "Gathering the Jewels" NOF digitisation project in Wales has settled on what metadata elements and digitisation guidelines it is going to adopt. In the interests of sharing this information as widely as possible, they have put it up on their Web site - please see http://www.gtj.org.uk/technical_logo.html and scroll down the page.
It can be useful to embed metadata into the HTML meta elements on a Web page, however when doing so keep the points below in mind.
(a) it depends on a service provider (i) finding the document (search engines still have issues when harvesting dynamically created pages, for more information see Search Engine Watch and the NOF dissemination section of the programme manual) and (ii) extracting and using the metadata; and
(b) HTML meta elements are not the only way of exposing metadata. Further information on OAI and other ways of making your metadata available will follow.
The dc:relation element is used to encode a reference to a resource which is related to the resource being described. The value of the dc:relation element should be an identifier for the related resource.
In any DC metadata record, there may be multiple occurrences of the dc:relation element, expressing relationships between the current resource and a number of other resources.
In simple/unqualified Dublin Core, dc:relation allows you to express the fact that a relationship exists between the current resource and a related resource, but it does not permit you to say anything more about the nature of that relationship between the two resources.
Qualified Dublin Core introduces a number of element refinements to dc:relation, which allow you to express the nature of the relationship between the current resource and the related resource.
In both simple/unqualified DC and in qualified DC, the value of the dc:relation element (or the value of any of its element refinements) should be an identifier for the related resource.
Hopefully you are storing this metadata some how in your database for your own use. If so it should be fairly straightforward to define the language of your pamphlet using Dublin Core. Although DC doesn't allow for a second language you can have multiple occurrences of one element (in fact all of the 15 elements allow unlimited occurrence). So for example in a html page this would appear as below (depending on which encoding scheme you choose to use - ISO 639 or RFC 1766, and how many characters).
<meta name="DC.Language" scheme="ISO639-2" content="en">
No priority is given to either of the occurrences of the element.
All projects are required to submit samples of their item-level metadata and indicate which fields are being used for Dublin Core metadata.
This is a fictional sample taken from the digitisation project of Sandfordshire Council. It is of a digitised image of an etching done by the artist John Shade. It gives an indication of the format that should be used when forwarding metadata samples to case managers. Many of the fields here are loosely based on the JIDI Metadata Guidelines.
The example shows what categories the project is using for its metadata, the actual descriptions used for one item and how the fields relate to the core element set of Dublin Core. Note that not every Dublin Core element needs to be mapped to; DC.RELATION and DC.SOURCE, for instance, were omitted from the example below. Other Dublin Core fields, in this case DC.COVERAGE, can be qualified to add extra descriptive richness. The notes on the right indicate what controlled vocabularies can be used, but there is no need for projects to indicate which ones they are utilising.
Projects will have developed different metadata schema according to their collection and content; some will have more detail in certain areas and some will have less. There is no need to replicate the schema shown here. What is important with the sample is to give a sense of how each of your metadata categories are being interpreted and how they are being mapped to Dublin Core.
1. What are suitable methods for storing electronic photos?
Image management systems can be used for a wide range of purposes, including: managing workflows, storing metadata, maintaining relationships between images and their metadata, user access, etc., so you need to have a clear idea of what you would want from an image management system (e.g. some functional requirements), in order to decide whether the current library system would be able to do the job properly.
In general, many library systems have now started to have
image management capabilities, but it is sometimes difficult to
know how well it has been implemented. The vendor's information
on DB/TextWorks at:
Some general advice on database management systems has been
produced by TASI:
Peter Hirtle also has a chapter on image management systems in Kenney and Reiger's book "Theory into practice" (RLG, 2000).
2. Can you advise if backup onto CD of digital resources is sufficient for preservation purposes.
Copying to CD might be useful for short term backups but that on its own it isn't a very sustainable *long-term* preservation strategy. The reason for this is that the dyes used by recordable CDs (CD-R) tend to break down over time.
For more information, see section: 1.1.6 in Ross and Gow
(1999). The same authors wrote in their executive summary that
they felt that the stability of CD-R was over-rated and "far from
being a secure medium it is unstable and prone to degradation
under all but the best storage conditions." Best practice would
beto keep an additional copy on some magnetic media. For more
details see: Ross, S., Gow, A., 1999, Digital archaeology:
rescuing neglected and damaged data resources. London: South Bank
University, Library Information Technology Centre,
In practice, preservation is about managing information content over time. It is not enough just to make backups, but to create (at the time of digitisation) well-documented digital master files. Copies of these files should be stored on more than one media type, and (ideally) in more than one geographical location. These files should be used to derive other files used for user access (which may be in different formats) and would be the versions used for later format migration or for the repackaging of information content. If the files are images, the 'master' file format should be uncompressed, e.g. something like TIFF.
This is not to denigrate making backups in any way. Any service will need to generate these to facilitate its recovery in the case of disaster.
Creating a full 'digital master' with associated metadata will be a complex (and therefore expensive) task that should be done once only and at the time that the resource is being digitised. All equipment needs (or the choice of a digitisation bureau) should be considered with the creation of such digital masters in mind.
Projects will also need to decide where these digital masters should be kept for the duration of the project itself and where backup copies of them (and maybe other parts of the service) should be stored. Thought could be given to subscribing to a third-party storage service. An example is the National Data Repository at the University of London Computer Center (ULCC). More information on the services offered by the National Data Repository Service is available at:
The various service providers of the Arts and Humanities Data Service (AHDS) will also provide a long-term storage service for digital resources. They have also published various guides to good practice. For more information, see:
3. Can you clarify the emphisis that NOF is placing on digital preservation?
The nof-digitise programme is linked to lifelong learning and the overriding objective is to fund the digitisation of a wide range of materials that will be available free of charge at the point of access on the People's Network and National Grid for Learning.
It is important to secure the long-term future of materials created, so that the benefit of the investment is maximised and the cultural record is maintained in its historical continuity and media diversity. Therefore preservation issues should be considered an integral part of the digital creation process.
Projects should consider the value in creating a fully documented high-quality digital master from which all other versions (e.g. compressed versions for accessing via the Web) can be derived. This will help with the periodic migration of data and with the development of new products and resources.
NOF is providing guidance to applicants on digital preservation, further details of which can be found in the nof-digitise Technical Standards and Guidelines.
Have a look at the TASI site. They have a fair amount of information on Data Capture and Creation - http://www.tasi.ac.uk/advice/creating/creating.html.
However questions regarding your awkward objects are best answered by a good professional photographer. Photography of strange 3d objects will need all the skills of a photographer, and really it does not make any difference if the images are being shot with a digital camera or a film one. The main problems are those of lighting and photo-craft rather than anything to do with 'digital' imaging. Photoskills at that level are hard to teach and hard to provide any material for on a Web site, really it just comes down to experience, as different things will need different lighting setups and backgrounds. Standardisation of lighting and background will make the images far easier to present and work with once they are part of a collection.
5. Is it really necessary to digitally watermark our images as those we are making available for the web are compartively low quality and we would freely allow their use for non-profit, educational uses anyway?
The NOF Technical standards states on the matter of watermarking and fingerprinting that "Projects should give consideration to watermarking and fingerprinting the digital material they produce." Must has not been used therefore it is up to individual projects if they want to use this method of copyrighting.
If your project aims to make your resources available for non-profit, educational use, your online images will be of comparatively low quality and you have a copyright statement on your site then watermarking will probably not be necessary for your images. If you feel that your money could be used more constructively in another way then you are free to do this. Note that it may be worth making a note of this choice and the reasons why you have done it and add it to your Project Plan for future reference for your case manager.
For alternatives to Digimark have a look at this page on the TASI Web site: http://www.tasi.ac.uk/advice/using/uissues.html
You may also find the information about Copyright, IPR, Ethics and Data Protection useful: http://www.tasi.ac.uk/advice/managing/copyright.html.
First, Archival and preservation are both just words....they are not standards and do not have any exact values.
second, what is archival or preservation for one project or image will not necessarily be so for another. This totally depends on what the intended use of the image is. It would be the intention to archive or preserve all the quality that is required to present the image to the highest quality chosen as appropriate for delivery. This of course sounds like a chicken or egg situation and for any project, the first question is to decide what level of quality is appropriate for that image.
There are very many ways of going about making this decision....
You could decide on a largest output printed size required for images, for instance A3 quality print would mean 50Mb or so, A4print would mean about 24Mb. If like the National Gallery you wish to create life size copies, then you would need many hundred Mb for each image.
You could decide on how much visual information is kept within the image and then choose appropriately. For instance a 20ft x 12ft painting is going to have more visual information in it than a postage stamp, but also a turner watercolour will have much less visual information in it that a Turner engraving. The watercolour will be visually interesting with acceptable quality to understand and appreciate the image with about 800 x 600 pix but the engraving would be meaningless at this resolution and would need to be imaged at 3000 x 2000 (at least) to show the vast amount of visual information contained within the image.
Of course this is also dependent upon who the user is. A teacher wanting to show a Turner watercolour wants to show the whole image on screen to her students......the paper historian wants to zoom right into the image to be able to see the paper detail. Both are legitimate but a decision has to be made as to what level of detail the project wishes to capture.
As a rule of thumb, to print an image at the same size as the original you need approx 300ppi for a continuous tone image and 600ppi for a bi-tonal image. It is these figures that are sometimes erroneously given as 'preservation' and 'archive' quality......but don't just grab these figures and stop thinking about the more fundamental issues.
There are different influences as well......Do you wish to have a standard image quality within your collection (advisable normally!) so they are all the same size. This makes it much easier to work with as you always know how big every image is....because they are the same. On the other hand if you had a collection of maps and stamps, then what was right for one would be totally inappropriate for the other and you would need to at least have two standards...
But the key to this, is that you have to decide or research:
How much visual information within the image needs to be captured
Having found those out, you can decide what size the captured image needs to be to fulfill your requirements......When you know what you need....it is easy enough to find out what the appropriate resolution is to give you that.
Having done that.....it is simple.....you just preserve and archive a file that is big enough to provide you with all the quality that fulfills your needs.
For further information see http://www.tasi.ac.uk/advice/creating/digitalimage.html
And what is the recommended capture resolution/file size for archived material?
We don't recommend absolute file sizes because they depend upon a host of factors, including physical image size, scan resolution, file format, and colour depth. There are just too many variables to enunciate sensibly in something like the Technical Standards and Guidelines document.
Instead, you should bear in mind the need for preservation, re-use, etc, and should scan as well as feasible, using your judgment and working conditions.
"Images should be 8Mb" is a totally useless statement in itself.
It is true that capture resolution is linked to scale. We would assume 1:1 for the scale in most cases, except where that wasn't feasible.
There is some very useful information on exactly this problem at the TASI Web site: http://www.tasi.ac.uk/advice/creating/creating.html. The TASI Web site gives useful information on the decision making process. Another source that provides useful information is the AHDS Visual Arts Data Service Guide to Good Practice for creators of visual resources. The Guide can be found at: http://vads.ahds.ac.uk and look under Guides to Good Practice.
Some points to bear in mind when going for archival quality:
1) The better equipment for taking a digital copy of a sheet of paper is a scanner. A camera will distort the geometry of the page around the edges - perhaps not enough to be visible, but still it remains a fact of life of capturing an image through a glass lens.
2) Use of a camera would have to be controlled: I would not recommend that the camera was held by hand, it should be mounted on a support of some kind, and lighting conditions would have to be suitable.
3) Use of a camera may be appropriate if the nature of the material is delicate and subject to damage on a flatbed scanner.
4) The key parameters for judging the file size are:
5) With regard to the filesizes, TIFF is a suitable format for preservation. The difference in file size is probably because one is stored uncompressed and the other is probably compressed with the LZW algorithm. From a long term perspective use of LZW is not recommended as the algorithm used is patented (same codec as used in GIF), and there may be cost implications down the line. There are other compression schemes available for TIFF, and storing the collection of TIFF files in a compressed archive may also be effective (ie zip, tar, rar etc)
6) With regard to filesizes, the other issue is the colour depth (bit depth) - ie the number of bits used to store colour information. This is generally 24 bit (ie 8 bit per RGB channel).
In conclusion I would recommend that you look carefully at the digitisation process and the results. Once you are satisfied that you have chosen the superior method of digitisation, then worry about the file storage problems.
8. A small museum has found during a trial of digitising its 5X8
b/w glass negatives at 600dpi and saving in TIFF files, that
this level of resolution places far too great a demand on the
museum in terms system resources and processing time - files are
being saved at up to 235MB and can take a long time to open. The
guidance given by the NOF standards and other HE advisory
agencies recommends scanning to 'best resolution within an
organisation's budget and resources'.
There is always a compromise between the quality that would be needed to use the image file for conservation and the cost of digitisation and storage of that file. Where that compromise is made is dependant upon the resources of the organisation.
There is no magic figure that can be given as a correct resolution as this will always depend upon the size of the item that is being digitised and its intended use.
That is why the recommendations state that scanning should be to 'best resolution within a organisation's budget and resources'.
For an organisation to work out what they can and cannot afford they must start at the other end of the workflow by establishing what is the organisations budget and resources. From that, they can establish what file size they can create within that budget.
The budget will control the amount of storage space available to the project. You can divide the available space by the number of items and get a maximum file-size possible within the 'budget and resources'.
Whether the file size that you have created will be big enough will depend on what it is that you want to do with it. We would suggest that 235Mb is certainly a very large file and could produce beautiful large prints but is most probably more than you need to produce for most digitisation projects.
If the goal of the project was to solely produce images to be used via a monitor delivery system then these images would indeed be quite vast!
As a rough rule of thumb we would recommend that you worked to something like these guidelines:
Create files big enough to produce A4 b/w printed images at 300lpi:
this would need b/w files of approx 8.25Mb (A4 8bit b/w at 300ppi)
If your intention was to produce colour files, they would look nicer, then they would need to be 24.8Mb (A4 24bit col at 300ppi)
(Other possibilities would be to save the file in 16bit b/w which would certainly be better quality but giving larger files. Or the image could be digitised at a size big enough to create bigger output sizes, eg up to A3?)
Continuing on this basis, the 5x8 inch negatives would need to be scanned at 450ppi to create the filesize we have discussed.
This however is much smaller than the 235Mb that they are
They are scanning at a much higher res than 600ppi. The are scanning at a very high colour depth like 48bit colour.
If space and money are an issue then we would question whether this is the best way forward.
We have to still point out that there is no magic "correct res" as it is totally dependent upon the size of the original and intended use of the images within the project.
9. The 10,000 digitised images we hope to have on our Web site will have been either photographed or scanned. However, we would also like to include a small amount of slides of historic costume that we believe it would be detrimental to handle again in order for them to be re-photographed. Having examined the slides, they are not as good a quality as we are now capturing, and although they will look fine on the web they they will not make for good archival images. Is it possible for us to use these slides, or should we not include them and therefore leave out some important historical costume items that we would very much like to include?
It sounds like you have a fairly common problem on your hands - deciding what to digitise is never easy. You may have tried using the matrix approach already in selecting, if not some of these sites may be of interest:
The best matrix I've seen is in Stuart Lee's book 'Digital Imaging: a practical handbook' Lee does say that there isn't weighting attached to categories nor is it suggested that you just add up the number of ticks under each section and give items a score. The matrix is more for demonstrating that you've taken the selection criteria serious and have made some analysis.
With the slides you talk about you really will just have to weigh up the pros (their important historical value) and cons (scanning slides may take longer and as you've explained the quality isn't that good - no archival master) of using them and decide what is the most important.
NOF-digitise would accept the use of these slides because there is significant historical value attached and they will be very useful for learning (and that is what the programme is all about). Also the number of slides to be used only makes up a small number or your overall collection. However I would advise adding something to your quarterly progress report about your intentions and mentioning them to your case manager.
You might also find the section on creating digital images from slides on the TASI site useful as you will need to optimise their quality.
1. Is there a definitive guide available about developing government and UfI standards?
See http://www.govtalk.gov.uk/ and http://www.iagchampions.gov.uk/ for more information on e-Government and the Interoperability Framework and http://www.ufiltd.co.uk/materials/qualified.htm for UfI standards documents.
Also, Paul Miller provides an overview of Government Web stanadards in his presetation to the Public Library Web Managers Workshop, available at: http://www.ukoln.ac.uk/public/events/managing/miller/
2. Is there any general information or guidelines about how to choose the right web hosting company, ie what we should be looking for, what kind of questions should we ask a potential supplier...?
Questions to ask of potential suppliers are:
Backup arrangements - How will they back-up your data? How often? How quickly can you get it back?
Hardware support arrangements - What will they do when the disk that your data is on breaks? Do they run RAID of any kind. How long to replace the disk and get back data (see above).
Security policies - How do they make their systems secure? What facilities do they offer to other users of the same machine.
Available bandwidth - What kind of connectivity do they have? What peak-time traffic do they have using that bandwidth?
3. What sort of bandwidth will be available for delivery of nof-digi projects?
Up to 20% of schools (mainly secondary) will be connected at 2mbps by end of 2002, the rest will be ISDN2. Libraries have an aspiration for all public libraries to be connected at 2Mbps by end of 2002. But do remember, though, the needs of home users, who may well be using traditional modems. They don't necessarily need full functionality, but they do need to be catered for.
4. Could you explain what is meant by 'Web services' in para 3 of Section 5.1.1: Access to Resources, i.e. "Web services must be accessible to a wide range of browsers and hardware devices (e.g. PDAs)..."
The term "Web service" means, in this context, Web site.
The resources which are being digitised through the NOF digitisation programme are expected to be available for a long period and to be accessible to new devices which may be developed in the future.
Even today PDAs can be used to access Web sites containing images, and in the near future we can expect PDAs to have multimedia capabilities. We are also hearing the term E-Book Library being used instead of E-Book Reader to convey a future in which large amounts of data can be stored on mobile devices.
In order to ensure that NOF projects can exploit such devices they should be based on open standards (such as XML) as opposed to using formats which are aimed at desktop devices.
5. What is meant by open (as opposed to proprietary) standards? How does a standard become a standard? Are there standards awarding bodies?
Proprietary file formats (such as the Microsoft Word format, Abobe PDF, Macromedia Flash, etc.) are owned by a company or organisation. The company is at liberties to make changes to the format or to change the licence conditions governing use of the format (including software to create files and software to view files). Use of proprietary formats leaves the user hostage to fortune - for example the owner of a popular and widely-used format may increase the costs of its software (or introduce charges for viewing software).
Open formats are not owned by a copany or organisation - instead they are owned by a national or international body which is indendent of individual companies.
Many standards bodies exist. Within the Web community important standards organisations include the World Wide Web Consortium (W3C), the Internet Engineering Task Force (IETF), ECMA (European Computer Manufactures Association), ISO, etc. These standards organisations have different cultures and working practices and coverage. W3C, for example, is a consortium of member organisations (who pay $5,000 to $50,000 per year to be members). W3C seeks to develop concensus amongst its members on the development of core Web standards. The IETF, in contrast with the W3C, is open to individuals. ISO probably has the most bureaucratic structure, but can develop robust standards. The bodies have different approaches to defining standards - ISO, for example, solicits comments from member organisations (national standards bodies) whereas W3C solicits comments from member organisations and from the general public.
Further information can be obtained from the following Web sites:
W3C - http://www.w3c.org/
IETF - http://www.ietf.org/
ISO - http://www.iso.ch/
ECMA - http://www.ecma.ch/
NISO - www.niso.org
IEEE - www.ieee.org
IMS - www.imsproject.org
The following resources have further links to standards bodies:
An overview of Web standards is available at:
You should not confuse "open standards" with "open source". Open source software means that the source code of the software is available for you to modify and the software is available for free. This is in contrast to licensed (or proprietary) software in which the source of the software is not normally available. Both open source and proprietary software can be used to create and view open standard formats.
6. Does our web service have to be available 24 hours a day, 7 days a week?
The Technical Standards have recently changes their wording on this area. They now state "Projects should seek to provide maximum availability of their project Web site. Significant periods of unavailability should be brought to the attention of the NOF case manager and in addition should be reported to NOF through the standard quarterly reporting process. "
Projects should aim for as high level of availability as is possible. However, 24/7 cover is likely to be quite expensive and the cost/benefit doesn't justify it. Ensuring that you can ensure a fast response to breakdowns during office hours is acceptable.
There is no simple answer to this question, and I am afraid that we do not have any figures that you could use to guide you. Each web resource is unique in this aspect and giving you the user figures for a different Web site would be of no use to you. Below are some pointers that might help you to come to a sensible estimate of the type of load to anticipate.
Do you have any existing web services within your organisation? What usage rates do they generate?
Do you have any usage statistics to any other institutions? Museums, Libraries, etc? These may give you a starting figure.
Do you have a feel for your target audience - perhaps you can derive a figure from the numbers of students who may use your resource?
How many users would you *like* to have?
However, in all cases these figures will leave you with but a very rough feel for what might happen, and the very nature of a web project means that usage rates depend on many things and are very unpredictable. However, it is likely that demand will build up over time as word of your site disseminates, search engines start to index pages and people start to link to your site. Monitor your site usage carefully to observe this happening. Have a look at the Web Site Performance Monitoring Information paper for advice on how to do this.
From a technical point of view you will need to establish some ballpark target figures to enable the system architecture to be specified: some of these figures would be:
From these figures the total bandwidth requirements can be calculated. Web hosts often will cap the monthly bandwidth available to you, so it is good to have a feel for this.
You also need to consider what is the maximum number of connections you will have to your web server at one time. This can impact on the system performance in many ways...The maximum number of simultaneous users will help you establish the maximum network throughput you will require: This too is usually restricted at a web host, so that you will be restricted to a data rate of say 128 kbps.
However, that said, it is our experience that the average hosting deal, of say 15GB per month data transfer and a network connection of 128 kbps, connected to a bog standard spec PC (say 1Ghz / 256 MB RAM) will host a healthy busy website - say maximum 50 simultaneous users and 10,000 page requests per day...
Whilst we would definitely recommend that you consider this issue and derive some target figures that you expect to achieve (this is a way for you to measure the success of your site as much as anything else), we would suggest that you work with your 3rd party suppliers to achieve this as they should be able to offer some good real-world experience of usage and help guide you in creating a system architecture.
NOF-digi projects can advertise positions on the NOF-digi list - for information on how to join see http://www.jiscmail.ac.uk/lists/NOF-DIGI.html. There is also a job bank where you can advertise NOF-digitise vacancies on the People's Network site: http://www.peoplesnetwork.gov.uk/content/jobbank.asp.
Also the following Web sites may be useful:
Print journals you could try include the CLIP (what was the Library Association) jobs pullout (http://www.cilip.org.uk/) they also run Lisjobnet online and INFOMatch. There's also the Guardian.
I would also have a look at the section on human resources available in the programme manual: http://www.ukoln.ac.uk/nof/support/manual/human-resources/
Knowing how to write for the Web is an important area of accessibility. NOF believe that it is very important that sites are written in plain English for people with learning difficulties. It is felt that beneficiaries of funds should have to commit themselves to this, as well as providing a level of content accessibility. There is also a need for research to bring together existing and emerging good practice and develop tailor made guidance and this is something NOF is working on.
Listed below are some relevant Web sites that you might find useful:
Writing for the Web - Jakob Nielson
Accessible web design
Writing for the Web
Web site tips
Funding streams are changing all the time so you need to keep your ear to the ground and watch out for messages to mailing lists and articles in magazines.
Some of the possible main funding streams/bodies are listed below:
UK Funding Streams
European Funding Streams
Worldwide/Global Funding Streams
A useful document titled 'Overview of funding streams for libraries and learning in England report' is available from Resource:
This page also has some useful links http://www.ceangal.com/diglibs/funding.html
An example of good practice in handling 'Issues' and documenting how they are being dealt with has been provided by the Cistercians in Yorkshire project. They create word documents detailing each issue and how it is resolved. Such information could be held in a content management system.
1. All projects are expected to have at least some live content publicly available on a Web site by 31 December 2002.
2. From the point at which your Web site goes live you are expected to start recording a) Web usage statistics (see point 5. Below) and b) responses from users through a user feedback form on your site. User forms should be prominently displayed or signposted from your home page. The Fund is not specifying the questions to be asked on the form, as this will vary depending on the information individual projects may want to collect to meet their own user evaluation targets. It would be helpful however for projects within each consortium to discuss and share views on the design of the forms, particularly where projects have had prior experience of evaluating online user involvement. We would encourage you to share your proposals widely with other nof-digi projects through the programme email list firstname.lastname@example.org and the NOF-DIGI@jiscmail.ac.uk list.
3. Before the end of December 2002 the Fund will email you a form for the Annual Monitoring Report (AMR) which will contain questions related to the above information. The first AMR for each project is required to be completed and returned to NOF by 31 December 2003 (unless you have been requested to return the form earlier by your Case Manager).
4. To ensure that we are able to collect sufficient data and to certify that sites are operational throughout the monitoring period, the Fund will require at least three AMRs from each project, depending on when the content will be completed. Some of the longer projects may be expected to submit up to five AMRs. Your case manager will confirm the next due date each year when your report is requested. The Fund will also be carrying out independent checks on web site availability during the monitoring period.
5. The AMR will include (apart from standard information about your organisation and its finances) several questions about the usage of your site, which you will be able to collect through the use of Web statistics software such as Web Trends and Analog.
Further information is available here:
6. If you have any queries about how this affects your particular project please contact your designated NOF Case Manager (via the email@example.com email).
Defintions - taken from Web Trends
A hit - a hit is a request to the server for a file. This includes all the images, audio, graphics, pages and other supporting files as well as the HTML files themselves.
Page Views - Pages are files with the extensions htm, html, shtml, asp and so on as defined in webtrends. This value gives the number of pages viewed, not all the supporting files. This means that the total number of hits is always bigger that the number of page views. For example if a web page had five graphics files on it, everytime a user visited that page 6 successful hits would be reported, and only 1 page view.
User Sessions - A single user session is defined as a person accessing the site. If they are inactive for more than 30 minutes then a new session is started.
Visitors - This attribute is taken from the IP address or username in the logfile. An individual visitor is specified by its individual IP address of the computer being used. For example the number of Unique Visitors is the number of different IP addresses in the log file.
Further Definitions - http://www.ukoln.ac.uk/nof/support/help/papers/performance/#interpreting
Web Trends - http://www.netiq.com/webtrends/
|UKOLN is funded by MLA, the Museums, Libraries and Archives Council, the Joint Information Systems Committee (JISC) of the Higher and Further Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.|
T A S : 2 0 0 1 - 2 0 0 4 : A R C H I V E
This page is part of the NOF-digi technical support pages http://www.ukoln.ac.uk/nof/support/.
The Web site is no longer maintained and is hosted as an archive by UKOLN.
Page last updated on Monday, May 09, 2005