Technical Advisory Service
2001 - 2004 archive
Frequently Asked Questions
The questions and answers on this page have been asked by nof-digitise applicants. This page will be updated frequently.
1. Does anyone have any thoughts on the use of file formats such as Flash or SVG in projects? There is no mention of their use in the technical specifications so I wondered whether their suitability or otherwise had been considered.
The general advice is that where the job can be done effectively using non-proprietary solutions, and avoiding plug-ins, this should be done. If there is a compelling case for making use of proprietary formats or formats that require the user to have a plug-in then that case can be made in the business plan, provided this case does not contradict any of the MUST requirements of the nof technical guidelines document.
Flash is a proprietary solution, which is owned
by Macromedia. As with any proprietary solutions there are
dangers in adopting it as a solution: there is no guarantee that
readers will remain free in the long term, readers (and authoring
tools) may only be available on popular platforms, the future of
the format would be uncertain if the company went out of
business, was taken over, etc.
2. Will NOF allow information to be digitised using PDF as opposed to HTML?
PDF (Portable Document Format) is a proprietary file format owned by Adobe, see <http://www.adobe.com/products/acrobat/adobepdf.html>. The format preserves the fonts, formatting, colours and graphics of the source document. PDF files are compact and can be viewed and printed with the freely available Adobe Acrobat Reader.
As with any proprietary solution there are dangers in adopting it as a solution: there is no guarantee that readers will remain free in the long term, readers (and authoring tools) may only be available on popular platforms, the future of the format would be uncertain if the company went out of business, was taken over, etc.
Other limitations of PDF include difficulties in defining the structure of documents (as opposed to the appearance), providing hyperlinking, and providing universal access to viewers with old or specialist browsers.
PDF does, however, have the advantage of being easy to create and preserving the appearance of source documents.
The recommended open formats for providing documents on the Web is HTML/HTML/XHTML (to define the document structure) together with CSS/XSL (to define the appearance of the document).
If PDF is used for a NOF project, the project holder should ensure that a case is made for its use in the business case and migration strategy has been established which will enable a transition to open standards to be made.
A few years ago VRML (Virtual Reality Markup Language) was thought to be the emerging standard for virtual reality. However VRML has failed to gain widescale market acceptance. VRML is now evolving. Its successor X3D will make use of XML to provide the syntax for 3D worlds. The development of X3D is being coordinated by the Web3D Consortium - see http://www.web3d.org/
A range of browser plugins to render X3D worlds are available, see the Web3D Consortium web site for details.
The requirement that alternative format must be provided if a plug-in is required is intended primarily for accessibility purposes and to ensure that an open format is available if a project makes use of a proprietary format which requires a plugin. In the case of 3D visualisation it is recognised that a textual equivalent will probably not be appropriate and since X3D is an open standard which is currently accessible primarily through use of browser plugins, the use of these plugins is acceptable.
4. Can NOF provide advice on the use of Java in NOF-digitise project Web sites?
The Java language was developed by Sun. Although it has been submitted to several standards bodies, it has not been standardised and remains a solution. Java applets which run within the Web browser do not appear to have taken off, due to inconsistencies in the Java virtual machine used within Web browsers and resource and performance problems. In addition much of the functionality provided initially by Java applets can now be carried out using open W3C standards such as HTML, CSS and the DOM (often referred to as Dynamic HTML or DHTML).
Java can, however, provide a suitable resource at the server, as opposed to the client. Java Server Pages can provide server-side scripting facilities, and Java Beans can provide integration with other server-side services.
However, you would need to make a very good case for the NEED in your business case; a whizzy, jazzy interface does not count as a NEED.
We should like to use Java as part of the user interface but will Java continue to be supported in IE Version 6, etc.?
A capability to run Java is not included, by default, with the
latest version of Microsoft's Internet Explorer (IE6). However,
Microsoft's own set of Frequently Asked Questions about IE6
A: Yes, Internet Explorer 6 supports Java. Java applets run in Internet Explorer 6 just as they run in older versions of Internet Explorer. The Java VM is not installed as part of the typical installation, but is installed on demand when a user encounters a page that uses a Java Applet.
It should be noted, though, that the download is not small, and that this may deter users visiting your site using a typical telephone modem, as they will need to download this additional plug-in before interacting with your Java-enabled site.
We would would stress that NOF projects are encouraged to avoid non-standardised solutions such as Java and that - if Java is used - it is strongly recommended that this be implemented on the SERVER, rather than expecting users' web clients to handle any Java.
5. Would it be acceptable to store some original images as PSD (i.e. PhotoShop) formatted files?
The standards guidelines advise against using a proprietary format, which this is, unless you have a very good reason to do so. Photoshop is perfectly capable of saving files in the TIFF format so you would need to justify why there is any benefit in storing archival images in PSD. That's not to say, of course, that you can't take a copy from an archival TIFF and store it in PSD to work on. Try to create a 'master file' in a format that can be re-digitised from easily.
6. Is Realvideo an acceptable format for video?
It is suggested that all projects work from a format that can be re-digitised from easily e.g. for video DV or DV Cam. Media, particularly video will need to be redigitised for delivery as technology advances. An equally important issue is probably the copyright and making sure thatall footage is covered by "blood chits" which hands over all the rights to the projects.
Although the use of Realvideo is not particularly recommended, we accept that in some cases the use of proprietary or non-standard formats may be the most appropriate solution. However, where proprietary standards are used, the project must explore a migration strategy that will enable a transition to open standards to be made in the future.
With regard to Real, if you do use this you should check that the stringent conditions which encoding with Real imply are suitable for both the project and the programme.
7. Some of our material contains tables. Should we be treating it/them in the same way as for line-drawings i.e. provided on the web as GIFs?
Digitised materials must be accessible - for example people with visual impairments should be able to process information through use of a speaking browser. Tables are permitted if they are comprehensible when linearised.
The use of GIFs to display significant amounts of textual information is *not* acceptable as these cannot be interpreted through speaking browsers. Restricting tabulated information to an HTML format should not limit formatting possibilities given the use of cascading style sheets and that tables created in Excel, for example, can be saved as HTML.
8. Can you advise on delivering video on mobile devices?
For delivery of moving video to mobile devices, it is likely that UMTS will be available by 2002, with a bandwidth approaching 2Mbps. See http://www.umts-forum.org/
In terms of delivering different versions of sites to multiple platforms, the use of XSLT to transform XML out of a database in order to display (X)HTML of different forms might be worth exploring. For further infoamrtion see http://www.w3.org/TR/xslt
9. Do you have any advice on standards of digital audio?
Standards for storage and playout may differ. Commonly an archive/library would wish to store (preserve) the highest quality possible - meaning uncompressed - but would deliver using a standard and datarate appropriate to user requirements.
Electronic delivery could then involve compression.
Software which delivers at multiple datarates, according to an internet user's connection, is now available from Real and Quicktime, amongst others, but the 'master copy' should ordinarily be the 'best available', which would usually mean uncompressed, linear PCM with a sampling rate and quantisation appropriate to the bandwidth and dynamic range of the material. This form of audio is typically held in .WAV files (though there are over 100 registered forms of coded audio that are possible within WAV, including highly compressed).
Within European broadcasting, 16-bit quantisation and 48 kHz
sampling are the EBU (European Broadcasting Union) recommendation
for broadcast quality audio. The EBU has gone a step further
and added metadata to WAV, to add information critical to
broadcasting and broadcast archives, forming the "Broadcast Wave
Format" standard: BWF.
The actual transfer of analogue material to digital format, especially in bulk or for unique items, is not simple. For European Radio Archives, standardisation and guidance is being developed within EBU Panel 'Future Radio Archives', http://www.ebu.ch/pmc_fra.html
In their public service role, the BBC would be pleased to offer advice to libraries / archives requiring help - providing it is for non-commercial purposes.
10. We will be creating digital images of objects containing text, such as handbills and tax returns. Do we have to re-key the text in these images as HTML? We are concerned as there are significant cost implications associated with this, especially as we don't think OCR software may be suitable for the objects that we want to digitise.
The use that you intend to make of your digitised objects that contain text is the factor in deciding whether or not you re-key the text into a machine-readable format.
If you are just digitising one or two such objects as examples of what they look like (i.e. you are interested in the visual appearance of the objects rather than the content of the text) it may not be necessary to produce the text as HTML as well.
However, if the idea is that the original text can be searched etc. then it *will* need to be produced in a machine-readable format. In any case, you will need to create metadata to describe the image.
We do recognise that if text does need to be re-keyed, significant technical effort will be required, particularly if OCR software cannot be used.
So re-keying/using OCR software to convert to a machine-readable format would be the preferred solution. It would be for the project to make a case to NOF if this is not done, on grounds of cost, for example.
11. We are digitising audio and video for streaming over the net. There
are a number of differing proprietary architects available. i.e.
Realmedia, QuickTime, Windows media.
MPEG is a set of international standards for audio and video compression. MPEG-4 is the newest MPEG standard, and is designed for delivery of interactive multimedia across networks. As such, it is more than a single codec, and includes specifications for audio, video and interactivity. Windows Media encodes and decodes MPEG4. Quicktime and Realplayer are working on versions which will do the same.
MPEG produces high quality video which can be streamed over networks. Quicktime and Realmedia use the MPEG standards to improve the quality of their files for delivery on the web.
For detailed information on codecs it is recommend that you
look at the following site
In answer to your questions, can information can stored as an MPEG file? Which encoding software would allow this and ... how would it be played?
It is possible to store audio or video in an MPEG format, and to play an MPEG file. This would be NOF's preferred solution, as proper MPEG files are open, non-proprietary, and should be readable by most audio and video player programs and plug-ins. Many/most current web browsers have the capability to play MPEG-1 video without any extra plug-ins.
RealPlayer, Windows Media Player et al support a variety of audio and video formats, including MPEG, and a range of proprietary formats such as AVI.
See, amongst many possibilities,
More info on software is available from
12. Is XHTML the only future-proof way of presenting multi-page document? Can PDF be used?
As the W3C's XHTML webpage says (see http://www.w3.org/TR/2001/WD-xhtml1-20011004/) "This specification defines the Second Edition of XHTML 1.0, a reformulation of HTML 4 as an XML 1.0 application, and three DTDs corresponding to the ones defined by HTML 4". If you're familiar with HTML 4.0, then the benefit that XHTML brings is to apply the rigor of XML to a markup language with which many people and applications are already familiar.
Whilst there is no guarantee that anything is future-proof, the huge amount of information already available in (X)HTML, and the likelihood that future generations of web browsers will be able to read (X)HTML data, makes it highly likely that (X)HTML data will remain usable for many years to come -- and the added rigor of XHTML improves the situation greatly.
However, XHTML (like HTML) has no notion of "page", and it is left to the document author to decide how information should be broken down into usable chunks. For example, I could convert the text of a standard dictionary into XHTML, and then choose to present the data as a single (very, very large!) XHTML page with alphabetical subsections, or a series of 26 (very large) XHTML pages with head-word subsections, or many many thousands of XHTML pages each containing a separate head-word entry, or simply turn the text of each printed page into a separate XHTML page and organize the whole thing by naming pages after page numbers. How I choose to organize the information might depend on a number of factors, for example whether I was attempting to in some way model the existing print publication in electronic form (e.g. by structuring it on the basis of printed pages), model its organizational structure (e.g. by breaking the text into 26 subsections, or many entries), or make the electronic version easy to use (e.g. by presenting each entry as a separate XHTML page).
In other words, XHTML will allow you to model your information pretty much however you see fit: if capturing "pages" is important to you then that is possible -- but if usability is an overriding concern, you may decide to present the electronic information organized in some other way.
By comparison, PDF can provide you with an effective mechanism for capturing pages (or page images), and producing an electronic book which can be used and navigated in a very similar fashion to a conventional print publication. If visual fidelity to a non-electronic source is important to you, then this may be an important consideration. However, a similar effect could be achieved by creating XHTML pages containing transcriptions of each page from the printed source, and also links to scanned images of each page.
The concern about PDF stems from how much faith one is prepared to put in the future availability of software that can read PDF files, and whether or not the (proprietary) owners of PDF will continue to make future readers and/or versions of PDF backwardly compatible. That said, there is a vast amount of (mainly commercial) information already available in PDF, and several large public bodies are willing to consider PDF as a long-term preservation format (although not NOF).
Unless you already have a vast amount of PDF data (which you are unwilling to consider converting to XHTML), then we would strongly recommend that you consider using XHTML to prepare you data -- assuming you are able to resolve the issue of pagination.
13. Can we store our digital masters as JPEGS instead of TIFFs? We have compared the quality of both at enormous magnification and discovered no difference at all. The NOF standards prefer TIFF but say that one could use JPEGs if necessary. We have used them with our lottery funded digitisation project very successfully. It would make our work a lot a lot easier...
We would need more information on what sort of images are being captured before giving a definite answer, as the lossy compression used in creating JPEGs has more of an impact on some sorts of imagery/colour variation than on others.
However, whilst a compressed JPEG may be more useful for everyday use and delivery, the quality of an uncompressed TIFF master image is your best bet both for long-term viability of the image, and in allowing you maximum flexibility in the future as to what you do to/with the images.
We would still therefore suggest that TIFF is the format which projects should aim to use, and any deviation from this should be supported by a stronger argument than 'it would be easier'...
Have a look at the following resources...
Synchronised Multimedia Integration Language
Synchronized Multimedia On The Web
Scalable Vector Graphics (SVG)
Scalable Vector Graphics (SVG): Vector Graphics for the Web
Scalable Vector Graphics (SVG)
The reason that TIFF is recommended over JPEG is because JPEG is an inherently lossy compression technique. This means that whenever an image file is converted to JPEG format, some detail is lost. However, as you have noticed, the changes that occur are very subtle at high "quality" settings of JPEG compression. You say that you cannot "see" any difference: can I suggest that you try: Open your two specimen files, JPEG and TIFF, side by side Blow up to maximum zoom the same area on both images. Select a portion of the image with a good range of colours: an edge of the object, for example. Select the Eyedropper tool (keystroke "I" for shortcut) and make sure the "Color" floating tool bar is open. Right click on any part of either image and select "Point Sample" (or adjust this setting in Eyedropper options on the "Options" floating tool bar. Now left click on a pixel in the TIFF image. In the "Color" tool bar you should see the colour value of the pixel you have selected. Note this value. Left click on the same pixel in the JPEG image. Note the displayed colour value.
You should observe that there is a general slight difference in the colour value at any specific point in the image. Indeed, it is very difficult to "see" this difference with the eye, but I hope that this numerical demonstration will prove to you that the two images are not identical. The JPEG compression routine does not store the discreet value of each pixel in the image, it stores a mathematical function that is used to re-generate the colour values and this process will result in approximate values for many of the pixels in the image.
Note also that TIFF files can be stored with LZW compression enabled, reducing the size of the file dramatically. LZW compression does not result in any change to the values of any pixels in the image, so is suitable for archiving and preservation purposes.
RealAudio is currently recommended in the technical standards as a format that can be used to create and store sound. However it has recently come to light that creating and storing sound as RealAudio could create major problems (delivery is still OK). RealAudio is an encrypted closed format which no other software package can import. Real have gone as far as suing the manufacturer of a software package that did allow this. The Managing Agent and Advisory Services (MAAS), a new national service acquiring moving pictures and sound for delivery online to the higher and further education communities in the UK, are not recommending RealAudio. This is because they also appear to retain IPR in the encodings.
For more information see the MAAS Site.
This FAQ addresses two areas where many projects are failing to achieve compliance with NOF technical standards. Please take heed of the following comments to ensure that your project is fulfilling these requirements.
1) CHARACTER ENCODING:
We would like to remind all projects that it is important that the character encoding used in a text document delivered over the web is clearly identified. Best practice is to identify the character encoding in use BOTH in the HTTP header and within thesection of the document.
Where XML/XHTML is used, it is also good practice to identify the encoding in the "<?xml ..." declaration at the start of the document. However this may cause problems with some client web browsers and can be safely ommitted.
Whilst the Technical Standard and Guidelines document currently mandates the use of UTF-8 encoding (setion 2.1.2), this is under revision and it is permissible to use any appropriate encoding (ie iso-8859-1, etc), as long as this is explicitly stated as discussed above. The TS&G will be updated at the end of this year to reflect this change, and at that time the section B of the quarterly monitoring reports will also be modified. In the meantime, please indicate the actual encoding your project is using when filling out your quarterly progress report.
2) DOCTYPE DECLARATION:
We would also like to point out that it is necessary to include a DOCTYPE declaration at the top of all HTML or XHTML pages. This is necessary to allow your mark-up to be correctly validated, and perhaps more importantly can affect the way the user agent (browser) interprets the tags it finds within the page. Therefore, a DOCTYPE statement correctly identifying the version of (X)HTML that you are using on the page will improve the chances of your page rendering correctly across different browsers and clients. Please ensure that you include the DOCTYPE declaration on all your pages.
18. Why do I have to think about a preservation
strategy if I am satisfied the file
The short answer is that even the short history of information technology
is already littered with
It is important to recall the requirement that all projects should be
aware of the need for
It is essential to remember that the issue of preservation is not confined
to arguments about
Even within the digitisation process there is a need for a technical strategy
from the outset
It is also important to accept that even once a preservation strategy has been
developed, the shifting nature of technological development is such that any
strategy has to be revisited if it
nof-digitise Technical Advisory Service Programme Manual: Section 2: Digital
nof-digitise Technical Advisory Service The Digitisation Process
The Cedars Project
Creating A Viable Data Resource
Emulation as Preservation Strategy
Migration - a CAMiLEON discussion paper
TASI : Establishing a Digital Preservation Strategy
|UKOLN is funded by MLA, the Museums, Libraries and Archives Council, the Joint Information Systems Committee (JISC) of the Higher and Further Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.|
T A S : 2 0 0 1 - 2 0 0 4 : A R C H I V E
This page is part of the NOF-digi technical support pages http://www.ukoln.ac.uk/nof/support/.
The Web site is no longer maintained and is hosted as an archive by UKOLN.
Page last updated on Monday, May 09, 2005