This page is for printing outthe briefing papers in the areas of digitisation. Note that some of the internal links may not work.

Image QA in the Digitisation Workflow

Introduction

Producing an archive of high-quality images with a server full of associated delivery images is not an easy task. The workflow consists of many interwoven stages, each building on the foundations laid before. If, at any stage, image quality is compromised within the workflow, it has been totally lost and can never be redeemed.

It is therefore important that image quality is given paramount consideration at all stages of a project from initial project planning through to exit strategy.

Once the workflow is underway, quality can only be lost and the workflow must be designed to capture the required quality right from the start and then safeguard it.

Image QA

Image QA within a digitisation project's workflow can be considered a 4-stage process.

1 Strategic QA

Strategic QA is undertaken in the initial planning stages of the project when the best methodology to create and support your images, now and into the future will be established. This will include:

Local Search Engine

Choosing the correct file types and establishing required sizes
Sourcing and benchmarking all equipment
Establishing capture guidelines
Selecting technical metadata

2 Process QA

Process QA is establishing quality control methods within the image production workflow that support the highest quality of capture and image processing, including:

Establishing best 'image capture' and 'image processing' methodology and then standardising and documenting this best practice
Regularly calibrating & servicing all image capture and processing equipment
Training operators and encouraging a pride in quality of work
Accurate capture of metadata

3 Sign-off QA

Sign-off QA is implementing an audited system to assure that all images and their associated metadata are created to the established quality standard. A QA audit history is made to record all actions undertaken on the image files.

Every image must be visually checked and signed off with name and time recorded within audit history
All metadata must be reviewed by operator and signed off with name and time
Equipment must be calibrated and checked regularly
All workflow procedures reviewed and updated as necessary

4 On-going QA

On-going QA is implementing a system to safeguard the value and reliability of the images into the future. However good the initial QA, it will be necessary to have a system that can report, check and fix any faults found within the images and associated metadata after the project has finished. This system should include:

Fault report system that allows faults to be checked and then if possible fixed
Provision for ongoing digital preservation (including migration of image data)
Ownership and responsibility for images, metadata and IMS
A reliable system for the on-going creation of surrogate images as required

QA in the Digitisation Workflow

Much of the final quality of a delivered image will be decided, long before, in the initial 'Strategic' and 'Process' QA stages where the digitisation methodology is planned and equipment sourced. However, once the process and infrastructure are in place it will be the operator who needs to manually evaluate each image within the 'Sign-off' QA stage. This evaluation will have a largely subjective nature and can only be as good as the operator doing it. The project team is the first and last line of defence against any drop in quality. All operators must be encouraged to take pride in their work and be aware of their responsibility for its quality.

It is however impossible for any operator to work at 100% accuracy for 100% of the time and faults are always present within a productive workflow. What is more important is that the system is able to accurately find the faults before it moves away from the operator. This will enable the operator to work at full speed without having to worry that they have made a mistake that might not be noticed.

The image digitisation workflow diagram in this document shows one possible answer to this problem.

Acknowledgements:

This document was written by TASI, the Technical Advisory Service For Images.

Briefing 18

QA Procedures For The Design Of CAD Data Models

Background

The creation of CAD (Computer Aided Design) models is an often complex and confusing procedure. To reduce long-term manageability and interoperability problems, the designer should establish procedures that will monitor and guide system checks.

Establish CAD Layout Standards

Interoperability problems are often caused by poorly understood or non-existent operating procedures for CAD. It is wise to establish and document your own CAD procedures, or adopt one of the national standards developed by the BSI (British Standards Institution) or NIBS (National Institute of Building Sciences). These may be used to train new members in the house-style of a project, provide essential information when sharing CAD data among different users, or provide background material when depositing the designs with a preservation repository. Particular areas to standardize include:

Drawing sheet templates
Paper layouts
Text fonts, dimensions, line types and line weights
Layer naming conventions
File naming conventions

Procedures on constructing your own CAD standard can be found in the Construct IT guidelines (see references).

Be Consistent With Layers And Naming Conventions

When creating CAD data models, a consistent approach to layer creation and naming conventions is useful. This will avoid confusion and increases the likelihood that the designer will be able to manipulate and search the data model at a later date.

The designer has two options to ensure interoperability:

Create layers that divide the object according to pre-defined criteria. E.g. a model building may be divided into building part, building phase, site stratum, material, chronological standing, etc. The placement of too many objects on a single layer will increase the computational requirements to process the model and cause unexpected problems when moving objects between layers.
Establish a layer name convention that is consistent and appropriate to avoid confusion in complex CAD model. Many users use 'wall', 'wall1', 'door', etc. to describe specific objects. This is likely to become confusing and difficult to identify when the design becomes complex. Layer names should be short and descriptive. A possible solution is the CSA layer-naming convention that uses each character in the layer name to describe its position within the model.

Ensure Tolerances Are Consistent

When exporting designs between different CAD applications it is common for model relationships to disintegrate, causing entities to appear disconnected or disappear from the design altogether. A common cause is the use of different tolerance levels - a method of placing limits on gaps between geometric entities. The method of calculating tolerance often varies in different applications: some use absolute tolerance levels (e.g. 0.005mm), others work to a tolerance level relative to the model size (e.g. 10-4 the size), while others have different tolerances according to the units used. When considering moving a design between different applications it is useful to ensure the tolerance level can be set to the same value and identify potential problem areas that may be corrupted when the data model is reopened.

Check For Illegal Geometry Definitions

Interoperability problems are also caused by differences in how the system identifies invalid geometry definitions, such as the three-sided degenerate NURBS surfaces. Some systems allow the creation of such entities, others will reject them, whereas some systems know that they are not permissible and in an effort to prevent them from being created, generate twisted four sided surfaces.

Further Information

AHDS Guides to Good Practice: CAD, AHDS,
<http://ads.ahds.ac.uk/project/goodguides/cad/>
National CAD Standard,
<http://www.nationalcadstandard.org/>
CAD Standards: Develop and Document,
<http://www.arch.vuw.ac.nz/papers/bbsc303/assign2/198mc.htm>
Construct I.T: Construct Your Own CAD Standard, (Note direct URL to document is not available)
<http://www.construct-it.org.uk/>
Common Interoperability Problems in CAD,
<http://www.caduser.com/reviews/reviews.asp?a_id=148>

Briefing 20

Documenting Digitisation Workflow

Background

By documenting the workflow of digitisation, a life history can be built-up for each digitised item. This information is an important way of recording decisions, tracking problems and helping to maintain consistency and give users confidence in the quality of your work.

What to Record

Workflow documentation should enable us to tell what the current status of an item is, and how it has reached that point. To do this the documentation needs to include important details about each stage in the digitisation process and its outcome.

What action was performed at a specific stage? Identify the action performed. For example, resizing an image.
Why was the action performed? Establish the reason that a change was made. For example, a photograph was resized to meet pre-agreed image standards.
When was the action performed? Indicate the specific date the action was performed. This will enable project development to be tracked through the system.
How was the action performed? Ascertain the method used to perform the action. A description may include the application in use, the machine ID, or the operating system.
Who performed the action? Identify the individual responsible for the action. This enables actions to be tracked and identify similar problems in related data.

The actual digitisation of an item is clearly the key point in the workflow and therefore formal capture metadata (metadata about the actual digitisation of the item) is particularly important.

Where to Record the Information

Where possible, select an existing schema with a binding to XML:

TEI (Text Encoding Initiative) and EAD (Encoded Archival Description) for textual documents.
NISO Z39.87 for digital still images.
SMIL (Synchronized Multimedia Integration Language), MPEG-7 or the Library of Congress' METS A/V extension for Audio/Video.

Quality Assurance

To check your XML document for errors, QA techniques should be applied:

Validate XML against your schema or an XML parser.
Check that free text entries follow local rules and style guidelines.

Further Information

Encoded Archival Description,
<http://www.loc.gov/ead/>
A Metadata Primer,
<http://www.cmswatch.com/Features/TopicWatch/FeaturedTopic/?feature_id=85>
Dublin Core Metadata Initiative,
<http://dublincore.org/>
MARC Standards,
<http://www.loc.gov/marc/>
MPEG Standard,
<http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm>
Synchronized Multimedia,
<http://www.w3.org/AudioVideo/>
TEI Consortium,
<http://www.tei-c.org/>
Three SGML metadata formats: TEI, EAD, and CIMI,
<http://hosted.ukoln.ac.uk/biblink/wp1/sgml/tei.rtf>
Z39.87: Technical Metadata For Still Digital Images,
<http://www.niso.org/standards/resources/Z39_87_trial_use.pdf>

Briefing 21

QA for GIS Interoperability

Background

Quality assurance is essential to ensure GIS (Geographic Information System) data is accurate and can be manipulated easily. To ensure data is interoperable, the designer should audit the GIS records and check them for incompatibilities and errors.

Ensure Content Is Available In An Appropriate GIS Standard

Interoperability between GIS standards is encouraged, enabling complex data types to be compared in unexpected methods. However, the varying standards can limit the potential uses of the data. Designers are often limited by the formats available in different tools. When possible, it is advisable to use OpenGIS - an open, multi-subject standard constructed by an international standard consortium.

Resolve Differences In The Data Structures

To integrate data from multiple databases, the data must be stored in a compatible field structure. Complementary fields in the source and target databases must be of a compatible type (Integer, Floating Point, Date, a Character field of an appropriate length etc.) to avoid the loss of data during the integration process. Checks should also be made that specific fields that are incompatible with similar products (e.g. dBase memo fields) are exported correctly. Specialist advice should be taken to ensure the memo information is not lost.

Ensure Data Meet The Required Standards

Databases are often created in an ad hoc manner without consideration of later requirements. To improve interoperability the designer should ensure data complies with relevant standards. Examples include the BS7666 standard for British postal addresses and the RCHME Thesauri of Architectural Types, Monument Types, and Building Materials.

Compensate For Different Measurement Systems

The merging of two different data sources is likely to present specific problems. When combining two GIS tables, the designer should consider the possibility that they have been constructed using different projection measurement systems (a method of representing the Earth's three-dimensional form on a two-dimensional plane and locate landmarks by a set of co-ordinates). The projection co-ordinate systems vary across nations and through time: the US has five primary co-ordinate systems in use that significantly differ with each other. The British National Grid removes this confusion by using a single co-ordinate, but can cause problems when merging contemporary with pre-1940 maps that were based upon Cassini projection. This may produce incompatibilities and unexpected results when plotted, such as moving boundaries and landmarks to different locations that will need to be rectified before any real benefits can be gained. The designer should understand the project system used for each layer to compensate for inaccuracies.

Ensure Precise Measurements Are Accurate

When recreating real-world objects created by two different people, the designer should note the degree of accuracy. One person may measure to the nearest millimetre, while the other measures to the centimetre. To check this, the designer should answer the following questions:

How many numbers are shown after the point (e.g. 2:12 cm)?
Is this figure consistent with the second designers' measurement methods?
Has the value been rounded up or down, or has a third figure been removed?

These subtle differences may influence the resulting model, particularly when designing smaller objects.

Further Information

AHDS Guides to Good Practice, AHDS,
<http://ads.ahds.ac.uk/project/goodguides/gis/>
Geoplace - The Authoritative Source For Spatial Information,
<http://www.geoplace.com/>
GIS - Computational Problems,
<http://www.cise.ufl.edu/~mssz/GIS/GIS-Basics.html>
Using GIS to help solve real-world problems,
<http://www.cadsystems.com/archive/0006f01.html>
Open GIS Consortium, Inc.,
<http://www.opengis.org/>

Briefing 22

Choosing A Suitable Digital Rights Solution

Background

Digital Rights Management (DRM) refers to any method for a software developer to monitor, control, and protect digital content. It was developed primarily as an advanced anti-piracy method to prevent illegal or unauthorised distribution of content. Common examples of DRM include watermarks, licensing, and user registration. It is in use by Microsoft and other businesses to prevent unauthorised copying and use of their software (obviously, the different protection methods do not always work!).

For institutions, DRM can have limited application. Academia actively encourages free dissemination of work, so stringent restrictive measures are unnecessary. However, it will have use in limiting plagiarism of work. An institution is able to distribute lecture notes or software without allowing the user to reuse text or images within their own work. Usage of software packages or site can also be tracked, enabling specific content to be displayed to different users. To achieve these goals different methodologies are available.

Why do I need Digital Rights Management?

As stated above, Digital Rights Management is not appropriate for all organisations. It can introduce additional complexity into the development process, limit use and cause unforeseen problems at a later date. The following questions will assess your needs:

Do you trust your users to use your work without plagiarising or stealing it?
If the answer to question 1 is yes, do you wish to track unauthorised distribution or impose rules to prevent it?
Will you be financially affected if your work is distributed without permission?
Will digital rights restrictions interfere with the project goals and legitimate usage?
In terms of cost and time management, can you afford to implement DRM restrictions?
If the answer to question 5 is yes, can you afford a strong and costly level of protection (restrictive digital rights) or weak protection (supportive) that costs significantly less?

What types of DRM Methodologies Exist?

Digital Rights methodologies can be divided into two types supportive and restrictive. The first relies upon the user's honesty to register or acquire a license for specific functionality. In contrast, the restrictive method assumes the user is untrustworthy, placing barriers (e.g. encryption and other preventive methods) that will thwart casual users who attempt to infringe their work.

1) Supportive digital rights

The simplest and most cost effective DRM strategy is supportive digital rights. This requires the user to register for data before they are allowed access, blocking all non-authorised users. This assumes that individuals will be less likely to distribute content if they can be identified as the source of the leak. Web sites are the most common use of this protection method. For example, Athens, the NYTimes and other portals provide registration forms or license agreement that the user must complete before access is allowed. The disadvantage of this protection method is the individual can easily copy or distribute data once they have it. Support digital rights is suited to organisations that want to place restrictions upon who can access specific content, but do not wish to restrict content being used by legitimate users.

2) Restrictive digital rights

Restrictive digital rights are more costly, but place more stringent controls over the user. It operates by checking if the user is authorised to perform a specific action and, if not, prevents them from doing it. Unlike supportive rights management, it ensures that content cannot be used at a later date, even if it has been saved to hard disk. This is achieved by incorporating watermarks and other identification methods into the content.

Restrictive digital can be divided into two sub-categories:

Self-Protecting - any digital rights process that involves encryption to prevent unauthorized access to confidential data. For viewing purposes, the content is decrypted, but enforces strict licence terms to limit its usage. Common examples include DVD player (and computer DVD-ROMs) that contain a region code. This ensures that discs can only be played on machines that contain the same code. In the event that an exact duplicate is made, the user cannot decrypt the content without the decryption key, rendering the file useless. Self-protecting is the most costly form of DRM and, for services that have a large market, is likely to be defeated by dedicated hackers. In this event, the work will require significant additional funding to establish a new method of self-protection.
Self-describing - Self-describing protection places less obvious restrictions upon the content. The content is unencrypted, but contains a watermark or unique code hidden within the file that identifies the original source of the material. For example, Windows XP identifies the machine ID to prevent hard disks bring transferred to a different PCs

The requirements for Digital rights implementations are costly and time-consuming, making them potentially unobtainable by the majority of service providers. For a data archive it is easier to prevent unauthorised access to resources than it is to limit use when they actually possess the information.

Ensuring Interoperability

Digital rights is reliant upon the need to record information and store it in a standard layout format that others can use to identify copyrighted work. Current digital rights establish a standard metadata schema to identify ownership

Two options are available to achieve this goal: create a bespoke solution or use an established rights schema. An established rights schema provides a detailed list of identification criteria that can be used to catalogue a collection and establish copyright holders at different stages. Two possible choices for multiple media types are:

XrML (eXtensible rights Markup Language) - a general-purpose XML-based specification grammar for defining digital rights and conditions to be associated with digital content, services, or other resources. The language is utilized as the basis for the MPEG-21 and Open eBook rights specifications.
XMCL (eXtensible Media Commerce Language) - a rights management system for media exchanged between businesses (such as web stores, customer tracking, etc.) and trusted delivery and playback systems. It is intended as a consumer-focussed schema that allows businesses to provide rental, subscription, ownership, and video on demand/pay-per-view services.

Summary

Digital rights are an important issue that allows an institution to establish intellectual property rights. However, it can be costly for small organizations that simply wish to protect their image collection. Therefore the choice of supportive or restrictive digital rights is likely to be influenced by value of data in relation to the implementation cost.

Further Information

Athens Access Management Systems
<http://www.athens.ac.uk/>
Directory for Social Issues in computing Copy Protection
<http://www.rajivshah.com/directory/Content/Protection_Schemes/Windows/>
How much is stronger DRM worth?, Lewis
<http://www.cpppe.umd.edu/rhsmith3/papers/Final_session1_lewis.pdf>
XMCL
<http://www.w3.org/TR/2002/NOTE-xmcl-20020919/>
XrML
<http://www.xrml.org/>

Briefing 23

Recording Digital Sound

Background

The digitisation of digital audio can be a complex process. This document contains quality assurance techniques for producing effective audio content, taking into consideration the impact of sample rate, bit-rate and file format.

Sample Rates

Sample rate defines the number of samples that are recorded per second. It is measured in Hertz (cycles per second) or Kilohertz (thousand cycles per second). The following table describes four common benchmarks for audio quality. These offer gradually improving quality, at the expense of file size.

**Table 1: Description of the various sample frequencies available**
Samples per second	Description
8kHz	Telephone quality
11kHz	At 8 bits, mono produces passable voice at a reasonable size.
22kHz	22k, half of the CD sampling rate. At 8 bits, mono, good for a mix of speech and music.
44.1kHz	Standard audio CD sampling rate. A standard for 16-bit linear signed mono and stereo file formats.

The audio quality will improve as the number of samples per second increases. A higher sample rate enables a more accurate reconstruction of a complex sound wave to be created from the digital audio file. To record high quality audio a sample rate of 44.1kHz should be used.

Bit-rate

Bit-rate indicates the amount of audio data being transferred at a given time. The bit-rate can be recorded in two ways - variable or constant. A variable bit-rate creates smaller files by removing inaudible sound. It is therefore suited to Internet distribution in which bandwidth is a consideration. A constant bit-rate, in comparison, records audio data at a set rate irrespective of the content. This produces a replica of an analogue recording, even reproducing potentially unnecessary sounds. As a result, file size is significantly larger than those encoded with variable bit-rates.

Table 2 indicates how a constant bit-rate affects the quality and file size of an audio file.

**Table 2 Indication of audio quality expected with different bit-rates**
Bit rate	Quality	MB/min
1411	CD quality	10.584
192	Good CD quality	1.440
128	Near CD quality	0.960
112	Near CD quality	0.840
64	FM quality	0.480
32	AM quality	0.240
16	Short-wave quality	0.120

Digital Audio Formats

The majority of audio formats use lossy compression to reduce file size by removing superfluous audio data. Master audio files should ideally be stored in a lossless format to preserve all audio data.

**Table 3 Common Digital Audio Formats**
Format	Compression	Streaming support	Bit-rate	Popularity
MPEG Audio Layer III (MP3)	Lossy	Yes	Variable	Common on all platforms
Mp3PRO (MP3)	Lossy	Yes	Variable	Limited support
Ogg Vorbis (OGG)	Lossy	Yes	Variable	Limited support
RealAudio (RA)	Lossy	Yes	Variable	Popular for streaming
Microsoft wave (WAV)	Lossless	Yes	Constant	Primarily for Windows
Windows Media (WMA)	Lossy	Yes	Variable	Primarily for Windows

Conversion between digital audio formats can be complex. If you are producing audio content for Internet distribution, a lossless-to-lossy (e.g. WAV to MP3) conversion will significantly reduce bandwidth usage. Only lossless-to-lossy conversion is advised. The conversion process of lossy-to-lossy will further degrade audio quality by removing additional data, producing unexpected results.

What Is The Best Solution?

Whether digitising analogue recordings or converting digital sound into another format, sample rate, bit rate and format compression will affect the resulting output. Quality assurance processes should compare the technical and subjective quality of the digital audio against the requirements of its intended purpose.

A simple suite of subjective criteria should be developed to check the quality of the digital audio. Specific checks may include the following questions:

Can listeners understand voices in recording?
Can listeners hear quiet sounds?
Can listener hear loud sounds without distortion?
Can the listener distinguish between digitised audio and original recording?

Objective technical criteria should also be measured to ensure each digital audio file is of consistent or appropriate quality:

Is there a documented workflow for creating the digital audio files?
Is the file format and software used to compress the audio documented?
Is the bit rate equal to or less than the available bandwidth?
Does the sample and bit-rate of the digital audio match or exceed that of the original analogue recording (or is the loss of quality acceptable, see subjective tests above)?
For accurate reproduction of an original analogue recording, is the digital audio master file stored in a lossless format?
For accurate reproduction of the original sound is the sample rate at least twice that of the highest frequency sound?

Further Information

MP3Pro Zone,
<http://www.mp3prozone.com/>
Measuring Audio Quality,
<http://www.itworld.com/AppDev/1469/NWW1204revside4/>
Ogg Vorbis,
<http://www.vorbis.com/>
PC Recording,
<http://www.pcrecording.com/>
Real Networks,
<http://www.real.com/>
Slicing and Dicing MP3 bit rates,
<http://www.digitalprosound.com/Htm/WebAudio/2000/Oct/MP3bitrates.htm>
Xorys' MP3 FAQ,
<http://webhome.idirect.com/~nuzhathl/mp3-faq.html>

Briefing 24

Handling International Text

Background

Digital text is one of the oldest description methods, but remains divided by differing file format, encoding methods, schemas, and encoding methods. When choosing a digital text format it is necessary to establish the project needs. Is plain text suitable for the task and are text markup and formatting required? How will the information be displayed and where? This document describes these issues and provides some guidelines for their use.

What is the Best Tool for the Job?

Digital text has existed in one form or another since the 1960s. Many computer users take for granted that they can quickly write a letter without restriction or technical considerations. A commercial project, however, requires consideration of long-term needs and goals. To avoid complications at a later date, the developer must ensure the tools in use are the most appropriate for the task and, if not, what can be used in their place. To achieve this three questions should be answered:

How will textual information be viewable for the user?
What problems may I encounter if textual information is stored incorrectly?
How will textual information be organized?

File Formats

It is often assumed that everyone can read text. However, this is not always the case. Digital text imposes restrictions upon the content that can have a significant impact upon the project.

In particular, there are two main issues:

File format
Character encoding

The choice of format will be dependent upon the following factors:

The platform/application for which the work is intended - A complex recipe stored in MS Word XP will only be useful for Word XP users. Any attempt to open it in earlier MS Word iterations or popular alternatives (such as Open Office) may result in formatting or layout issues, fonts being unavailable and substituted for an alternative, or binary being introduced into the document, appearing as random characters on the screen.

Special formatting required to enhance the document - Does the document require specific formatting, such as headings, bullet points, or tables to be understood? If so, a file format that supports these capabilities (RTF, PDF) can be used. If not, plain text may be useful for maximising the potential audience.
Editing - Does the document require editing by the user? If so, an editable format, such as Rich Text Format (RTF) or text is recommended. If not, the designer can protect their document using PDF.

Character Encoding

For allowing universal information access, plain text remains useful. It has the advantage of being simple to interpret and small in file size. However, there are some differences in the method that is used to encode text characters. The most common variations are ASCII (American Standard Code for Information Interchange) and Unicode.

ASCII - ASCII is a 7-bit code that assigns 128 decimal numbers (0-127) to letters, numbers, punctuation marks and other common characters. The limited character set restricts characters that can be displayed, preventing the use of foreign descriptions within the same document.
Unicode - Unicode resolves the ASCII restrictions by supporting a 16-bit character set. This enables it to store multiple languages in a standard format and display them in a single document. At the time of writing there are three encoding forms that can be used to represent 1,000,000+characters.

Problems

Several problems may be encountered when storing textual information. For text files it is a simple process to convert the file to Unicode. However, for more complex data, such as databases, the conversion process will become more difficult. Problems may include:

Corrupted characters - Foreign or exotic characters saved in ASCII (used by older applications) are likely to be missing when reloading the file. To resolve the issue, install the correct language and save the file to Unicode in a later version of the application.
Layout - Inter-format conversion can cause numerous layout issues. To avoid these problems save the document in the dissemination format from the beginning of the project. For example, avoid the default MS Word format and choose Rich Text or HTML. For existing documents, the editor will be required to manually restructure the converted document so it resembles the original.

Structural Mark-up

Although ASCII and Unicode are useful for storing information, they are only able describe each character, not the method they should be displayed or organized. Structural mark-up languages enable the designer to dictate how information will appear and establish a structure to its layout. For example, the user can define a tag to store book author information and publication date.

The use of structural mark-up can provide many organizational benefits:

Easier to maintain - allows modification to document structure without the need to directly edit the content. An entire site can be updated by changing a single CSS file.
Code reduction - by abstracting the structural element to a separate file, the structural information can be used by multiple documents, reducing the amount of code required.
Portable - The creation of well-formed documents will ensure the document will display correctly on browsers/viewers that support the markup language.
Interoperable - Structural data can be utilized to access information stored in a third party database.

The most common markup languages are SGML and XML. Based upon these languages, several schemas have been developed to organize and define data relationships. This allows certain elements to have specific attributes that define its method of use (see Digital Rights document for more information). To ensure interoperability, XML is advised due to its support for contemporary Internet standards (such as Unicode).

SGML (Standard Generalized Markup Language) - One of the earliest markup languages that enables content to be structured through an external DTD (Document Type Definition). Similar to XML, SGML is not a markup language in the true sense. Instead, it provides the foundation for specialists to create their own markup language that is customized to their area of study. SGML is efficient in design, but unreadable unless you have learnt the language. A 20-line XML document can be expressed in 5 lines using SGML.
XML - Extensible Markup Language - Promoted as the SGML successor, XML offers improved portability and simpler syntax. It offers improved support for Unicode and other internet protocols, enhancing interoperability between resources. Unlike SGML, tags can be understood by non-experts through the use of HTML-like tags and standard English.

Further Information

Dublin Core Metadata Initiative
<http://dublincore.org/>
TEI Home site
<http://www.tei-c.org/>

Briefing 25

Choosing A Suitable Digital Video Format

Background

Digital video can have a dramatic impact upon the user. It can reflect information that is difficult to describe in words alone, and can be used within an interactive learning process. This document contains guidelines to best practice when manipulating video. When considering the recording of digital video, the digitiser should be aware of the influence of file format, bit-depth, bit-rate and frame size upon the quality of the resulting video.

Composition of a Digital Video File

Digital video consists of a series of images played in rapid succession to create the illusion of movement. It is commonly accompanied by an audio track. Unlike graphics and sound that are relatively small in size, video data can be hundreds of megabytes, or even gigabytes, in size.

The visual and audio information are individually stored within a digital 'wrapper' an umbrella structure consisting of the video and audio data, as well as information to playback and resynchronise the data.

What is the Best Solution?

Digital video remains a complex area that combines the problems of audio and graphic data. When choosing to encode video the designer must consider several issues:

Are there any existing procedures to guide the encoding process?
What type of delivery method will be used to distribute the video?
What video quality is acceptable to the user?
What type of problems are likely to be encountered?

Distribution Methods

The distribution method will have a significant influence upon the file format, encoding type and compression used in the project.

Removeable media - Video distributed on CD-ROM or DVD are suited to progressive encoding methods that do not conduct extensive error checking. Although file size is not as critical in comparison to Internet streaming, it continues to have some influence.

The compression type is dependent upon the need of the user and the type of removeable media:

Editing - Video that requires editing should be stored using MJPEG spatial compression on a CD-ROM or, preferrably, a DVD-ROM.
Playback - Video intended for playback only have a more diverse range of options. If the intent is to create video for playback on DVD players, the MPEG-2 encoder and DVD-ROM is the only option. For computer playback, the designer can use a range of file formats. The suitability of each format is shown in Figure 1.

Windows user - Microsoft formats (ASF and WMV) are primarily aimed at Windows users, with limited Mac and Linux support. If providing content intended for Windows users exclusively, these formats are useable. However, they will limit the potential market.
Multiple-platforms - Alternative formats have cross platform support, providing players for Apple MacOS, Windows and Linux users. These include QuickTime, QuickTime Pro and RealMedia. The choice of these formats will be dependent upon the platform used by the organisation and licence costs.

NAME	PURPOSE OF MEDIA			Compression
NAME	Streaming	Progressive	Media	Compression
Advanced Streaming Format (ASF)	Y			Temporal
Audio Video Interleave (AVI)		Y		Temporal
MPEG-1		Y	VideoCD	Temporal
MPEG-2		Y	DVD	Temporal
QuickTime (QT)	Y	Y		Temporal
QuickTime Pro	Y	Y		Temporal
RealMedia (RM)	Y	Y		Temporal
Windows Media Video (WMV)	Y	Y		Temporal
DivX		Y	Amateur CD distribution	Temporal
MJPEG		Y		Spatial

Table 1: A comparison list of the different file formats, highlighting their intended purpose and compression method.

Video Quality

The provision of video data for an Internet-based audience places specific restrictions upon the content. Quality of the video output is dependent upon three factors:

Frame size - the height and width of the video window according to the number of pixels. Higher resolutions produce an equivalent increase in file size and require a greater amount of bandwidth to download.
Frame rate - The number of frames per second. Video encoded at a low frame rate (particularly below 15 frames per second) will appear jerky and unprofessional to the eye.
Bit Depth - determines the number of colours that will be used to view the movie. The balance between image quality and file size should be considered.

Screen Size	Pixels per frame	Bit depth (bits)	Frames per second	Bandwidth required (megabits)
640 x 480	307,200	24	30	221.184
320 x 240	76,800	16	25	30.72
320 x 240	76,800	8	15	9.216
160 x 120	19,200	8	10	1.536
160 x 120	19,200	8	5	0.768

Table 2: Indication of the influence of screen size, bit-depth and frames per second has upon required bandwidth

When creating video, the designer must balance the video quality with the facilities available to the end user. As an example, an 8-bit screen of 160 x 120 pixels, and 10-15 frames per second is used for the majority of content found on the Internet.

Problems

Video presents numerous problems for the designer caused by the complexity of formats and structure. Problems may include:

Synchronicity - Audio and video is stored as two separate data streames and may become out of sync- a character will move their mouth, but the words are delayed by two seconds. To resolve the problem, editing software must be used to resynchronise the data.
Unable to decode video/audio stream - the rapid update of video/audio codecs often results in the user encountering videos they are unable to play. Characteristics include error messages, audio playback without the video, and corrupted treacle-like video. The only solution is to find the relevant decoder required to decompress the file.
File size - File size can be a significant problem when manipulating video data. When encoding large video files, a large hard disk and 700Mb+ memory is recommended.
Editing - A particular issue of current video formats is the inability to edit video files. The majority of video formats use temporal encoding (see definition) to compress video, which cannot be edited. Only the MJPEG format allows the storage of digital video that can be edited at a later date.

Definitions

Temporal Compression - Reduces the amount of data stored over a sequence of frames. Rather than describing every pixel in each frame, temporal compression stores a key frame, followed by descriptive information on changes.

Spatial Compression - Condenses each frame independently by mapping similar pixels within a frame. For example, two shades of red will be merged. This results in a reduction in image quality, but enables the file to be edited in its original form.

Progressive Encoding - Refers to any format where the user is required to download the entire video before they are allowed to watch it.

Internet Streaming - Enables the viewer to watch sections of video without downloading the entire thing, allowing users to evaluate video content after just a few seconds. Quality is significantly lower than progressive formats due to compression being used.

Further Information

Advanced Streaming Format (ASF)
<http://www.microsoft.com/windows/windowsmedia/format/asfspec.aspx>
Apple QuickTime
<http://www.apple.com/quicktime/>
DIVX
<http://www.divx.com/>
Macromedia Flash
<http://www.macromedia.com/>
MPEG Working Group
<http://www.chiariglione.org/mpeg/index.htm>
Real Networks
<http://www.real.com/>
Microsoft Windows Media
<http://www.microsoft.com/windows/windowsmedia/default.aspx>

Briefing 26

Intellectual Property Rights

Introduction

Internet IPR is inherently complex, breaking across geographical boundaries, creating situations that are illegal in one country, yet not in another, or contradict existing laws on Intellectual Property. Copyright is a subset of IPR, which applies to all artistic works. It is automatically assigned to the creator of original material, allowing them to control all public usage (copying, adaptation, performance and broadcasting).

Ensuring that your organization complies with Intellectual Property rights requires a detailed understanding of two processes:

Managing copyright on own work.
Establishing ownership of 3rd party copyright.

Managing Copyright on Own Work

Unless indicated, copyright is assigned to the author of an original work. When producing work it is essential that it be established who will own the resulting product the individual or the institution. Objects produced at work or university may belong to the institution, depending upon the contract signed by the author. For example, the copyright for this document belongs to the AHDS, not the author. When approaching the subject, the author should consider several issues:

Can I establish that I am the author of this work? - At this point the author should provide evidence they produced the work on a specific date. One commonly used method is to post a sealed envelope to yourself or request that a solicitor store evidence within a safe. If ownership is challenged at a later date, the document can be opened in the presence of a solicitor.
Am I using unaccredited copyrighted material produced by others? - Published work that contains unaccredited material infringe upon the intellectual property of others. The results of such discovery will vary: the unaccredited author may request they are credited or a correction is published; the author may request their work is removed; or they make take legal action against the author. To avoid such issues, document all research made during investigation.

When producing work as an individual that is intended for later publication, the author should establish ownership rights to indicate how work can be used after initial publication:

Ownership after publication - Authors are encouraged to retain as many rights as possible to enable the continued use of articles in hard copy and electronic form.
Ownership in different mediums - In addition, where publication in a specific form (e.g. hard-copy) is the intention, rights to publish in other forms (e.g. electronic) should, if possible, be retained.

Copyright Clearance

Copyright is an automatically assigned right. It is therefore likely that the majority of works in a digital collection will be covered by copyright, unless explicitly stated. The copyright clearance process requires the digitiser to check the copyright status of:

Published, unpublished and Web site articles
Photographs and illustrations
Dynamic media (sound, video)
Software components
Database usage

Copyright clearance should be established at the beginning of a project. If clearance is denied after the work has been included in the collection, it will require additional effort to remove it and may result in legal action from the author.

Maintain a negotiation log - A log will document all meetings, outlining subjects of discussion, objections and agreements by either party. This will enable the organization to refer to the relevant section to establish they have gained copyright clearance and refer to a detailed description of the meetings that took place.
Identify who the author is and when it was produced - Current copyright law indicates the author's lifespan plus 70 years as the limit for copyright. Therefore it is possible that a collection may consist of works that are outside current copyright laws (such as the entire works of Shakespeare, Conan Doyle, etc.). If the author is still alive, they must be contacted to gain permission to use their work.
Establish long-term access rights - Internet content may appear in a site archive for several years after it was published. When meeting the author, establish any time factors in use of their work, indicating the length of time that work can be used. If the goal of the project is to enable long-term preservation of work, persuade the individual/s to allow the repository to host work indefinitely and translate it to modern formats when required.

In the event that an author, or authors, is unobtainable, the project is required to demonstrate they have taken steps to contact them. Digital preservation projects are particularly difficult in this aspect, separating the researcher and the copyright owner by many years. In many cases, more recently the 1986 Domesday project, it has proven difficult to trace authorship of 1000+ pieces of work to individuals. In this project, the designers created a method of establishing permission and registering objections by providing contact details that an author could use to identify their work.

Indicating IPR through Metadata

If permission has been granted to reproduce copyright work, the institution is required by law to indicate intellectual property status. Metadata is commonly used for this purpose, storing and distributing IP data for online content. Several metadata bodies provide standardized schemas for copyright information. For example, IP information for a book could be stored in the following format.

<book id="bk112"> <author>Galos, Mike</author> <title>Visual Studio 7: A Comprehensive Guide</title> <publish_date>2001-04-16</publish_date> <publisher>Addison Press</publisher> <copyright>Galos, M. 2001</copyright> </book>

Access inhibitors can also be set to identify copyright limitations and the methods necessary to overcome them. For example, limiting e-book use to IP addresses within a university environment.

Further Information

PADI - Intellectual property rights management
<http://www.nla.gov.au/padi/topics/28.html>
TASI - Looking after Copyright, IPR, Ethics and Data Protection
<http://www.tasi.ac.uk/advice/managing/copyrights.html>

Briefing 27

Implementing Quality Assurance For Digitisation

Background

Digitisation often involves working with hundreds or thousands of images, documents, audio clips or other types of source material. Ensuring these objects are consistently digitised and to a standard that ensures they are suitable for their intended purpose can be complex. Rather than being considered as an afterthought, quality assurance should be considered as an integral part of the digitisation process, and used to monitor progress against quality benchmarks.

Quality Assurance Within Your Project

The majority of formal quality assurance standards, such as ISO9001, are intended for large organisations with complex structures. A smaller project will benefit from establishing its own quality assurance procedures, using these standards as a guide. The key is to understand how work is performed and identify key points at which quality checks should be made. A simple quality assurance system can then be implemented that will enable you to monitor the quality of your work, spot problems and ensure the final digitised object is suitable for its intended use.

The ISO 9001 identifies three steps to the introduction of a quality assurance system:

Brainstorm: Identify specific processes that should be monitored for quality and develop ways of measuring the quality of these processes. You may want to think about:
- Project goals: who will use the digitised objects and what function will they serve.
- Delivery strategy: how will the digitised objects be delivered to the user? (Web site, Intranet, multimedia presentation, CD-ROM).
- Digitisation: how will data be analysed or created. To ensure consistency throughout the project, all techniques should be standardized.
Education: Ensure that everyone is familiar with the use of the system.
Improve: Monitor your quality assurance system and looks for problems that require correction or other ways it may be improved.

Key Requirements For A Quality Assurance System

First and foremost, any system for assuring quality in the digitisation process should be straightforward and not impede the actual digitisation work. Effective quality assurance can be achieved by performing four processes during the digitisation lifecycle:

The key to a successful QA process is to establish a clear and concise work timeline and, using a step-by-step process, document on how this can be achieved. This will provide a baseline against which actual work can be checked, promoting consistency, and making it easier to spot when digitisation is not going according to plan.
Compare the digital copy with the physical original to identify changes and ensure accuracy. This may include, but is not limited to, colour comparisons, accuracy of text that has been scanned through OCR software, and reproduction of significant characteristics that give meaning to the digitised data (e.g. italicised text, colours).
Perform regular audit checks to ensure consistency throughout the resource. Qualitative checks can be performed upon the original and modified digital work to ensure that any changes were intentional and processing errors have not been introduced. Subtle differences may appear in a project that takes place over a significant time period or is divided between different people. Technical checks may include spell checkers and the use of a controlled vocabulary to allow only certain specifically designed descriptions to be used. These checks will highlight potential problems at an early stage, ensuring that staff are aware of inconsistencies and can take steps to remove them. In extreme cases this may require the re-digitisation of the source data.
Finally, measures should be taken to establish some form of audit trail that tracks progress on each piece of work. Each stage of work should be 'signed off' by the person responsible, and any unusual circumstances or decisions made should be recorded.

The ISO 9001 system is particularly useful in identifying clear guidelines for quality management.

Summary

Digitisation projects should implement a simple quality assurance system. Implementing internal quality assurance checks within the workflow allows mistakes to be spotted and corrected early-on, and also provides points at which work can be reviewed, and improvements to the digitisation process implemented.

Further Information

TASI Quality Assurance,
<http://www.tasi.ac.uk/advice/creating/pdf/qassurance.pdf>

Briefing 29

Choosing A Vector Graphics Format For The Internet

Background

The market for vector graphics has grown considerably, in part, as a result of improved processing and rendering capabilities of modern hardware. Vector-based images consist of multiple objects (lines, ellipses, polygons, and other shapes) constructed through a sequence of commands or mathematical statements to plot lines and shapes in a two-dimensional or three-dimensional space. For Internet usage, this enables graphics to be resized to ever increasing screen resolutions without concern that an image will become 'jaggy' or unrecognisable.

File Formats

Several vector formats exist for use on the Internet. These construct information in the same way yet provide different functionality. The table below provides a breakdown of the main formats.

Name	Developer	Availability	Viewers	Uses
Scalable Vector Graphics (SVG)	W3C	Open standard	Internet browser	Internet-based graphics
Shockwave/Flash	Macromedia	Proprietary	Flash plugin for browser	Video media and multimedia presentation
Vector Markup Language (VML)	W3C	Open standard	MS Office, Internet Explorer, etc.	XML-based format.

For Internet delivery of static images, the W3 recommend SVG as a standard open format for vector diagrams. VML is also common, being the XML language exported by Microsoft products. For text-based vector files, such as SVG and VML, the user is recommended to save content in Unicode.

If the vector graphics are to be integrated into a multimedia presentation or animation, Shockwave and Flash offer significant benefits, enabling vector animation to be combined with audio.

Creating Vector Graphics

A major feature of vector graphics is its ability to construct detailed objects that can be resized without quality loss. XML (Extensible Markup Language) syntax the basis of the SVG and VML languages is understandable by non-technical users who wish to understand the object being constructed. The example below demonstrates the ability to create shapes using a few commands. The circle, shown on the left, was created by the textual data on the right.

<svg width="8in" height="8in"> <desc>This is a red circle with a black outline</desc> <g><circle style="fill: red; stroke: black" cx="200" cy="200" r="100"/> <text x="2in" y="2in">Hello World</text></g> </svg>

Figure 1: SVG graphics and associated code

XML Conventions

Although XML enables the creation of a diversity of data types it is extremely meticulous regarding syntax usage. To remain consistent throughout multiple documents and avoid future problems, several conventions are recommended:

Lower case should be used through. Capitalisation can be used for tags if it is consistent throughout the document.
Indent buried tags to reduce the time required for a user to recognise groups of information.
Avoid the use of acronyms or other tags that will be unintelligible for anyone outside the project. XML is intended as a human readable format, so obvious descriptions should be used whenever possible.
Avoid the use of white space when defining tags. If two word descriptions are necessary, join them via a hyphen (-). Otherwise concatenate the words by typing the first word in lower case, and capitalising subsequent words. For example, a creation date property would be called 'fileDateCreated'.

The use of XML enables a high level of interoperability between formats. When converting for a target audience, the designer has two options:

Vector-to-Raster conversion - Raster conversion should be used for illustrative purposes only. The removal of all coordination data eliminates the ability to edit files at a later date.
Vector-to-Vector conversion - Vector-to-vector conversion enables data to be converted into different languages. The use of XML enables the user to manually convert between two different formats (e.g. SVG to VML).

At the start of development it may help to ask your team the following questions:

What type of information will the graphics convey? (Still images, animation and sound, etc.)
What type of browser/operating system will be used to access the content? (Older browsers and non Mac/PC browsers have limited or no support for XML-based languages.)

Further Information

Official W3 SVG site,
<http://www.w3.org/Graphics/SVG/Overview.htm8>
An Introduction to VML,
<http://www.infoloom.com/gcaconfs/WEB/chicago98/wu.HTM>
Flash and Shockwave,
<http://www.macromedia.com/>

Briefing 26

Intellectual Property Rights

Introduction

Ensuring that your organization complies with Intellectual Property rights requires a detailed understanding of two processes:

Managing copyright on own work.
Establishing ownership of 3rd party copyright.

Managing Copyright on Own Work

Can I establish that I am the author of this work? - At this point the author should provide evidence they produced the work on a specific date. One commonly used method is to post a sealed envelope to yourself or request that a solicitor store evidence within a safe. If ownership is challenged at a later date, the document can be opened in the presence of a solicitor.
Am I using unaccredited copyrighted material produced by others? - Published work that contains unaccredited material infringe upon the intellectual property of others. The results of such discovery will vary: the unaccredited author may request they are credited or a correction is published; the author may request their work is removed; or they make take legal action against the author. To avoid such issues, document all research made during investigation.

When producing work as an individual that is intended for later publication, the author should establish ownership rights to indicate how work can be used after initial publication:

Ownership after publication - Authors are encouraged to retain as many rights as possible to enable the continued use of articles in hard copy and electronic form.
Ownership in different mediums - In addition, where publication in a specific form (e.g. hard-copy) is the intention, rights to publish in other forms (e.g. electronic) should, if possible, be retained.

Copyright Clearance

Published, unpublished and Web site articles
Photographs and illustrations
Dynamic media (sound, video)
Software components
Database usage

Maintain a negotiation log - A log will document all meetings, outlining subjects of discussion, objections and agreements by either party. This will enable the organization to refer to the relevant section to establish they have gained copyright clearance and refer to a detailed description of the meetings that took place.
Identify who the author is and when it was produced - Current copyright law indicates the author's lifespan plus 70 years as the limit for copyright. Therefore it is possible that a collection may consist of works that are outside current copyright laws (such as the entire works of Shakespeare, Conan Doyle, etc.). If the author is still alive, they must be contacted to gain permission to use their work.
Establish long-term access rights - Internet content may appear in a site archive for several years after it was published. When meeting the author, establish any time factors in use of their work, indicating the length of time that work can be used. If the goal of the project is to enable long-term preservation of work, persuade the individual/s to allow the repository to host work indefinitely and translate it to modern formats when required.

Indicating IPR through Metadata

Access inhibitors can also be set to identify copyright limitations and the methods necessary to overcome them. For example, limiting e-book use to IP addresses within a university environment.

Further Information

PADI - Intellectual property rights management
<http://www.nla.gov.au/padi/topics/28.html>
TASI - Looking after Copyright, IPR, Ethics and Data Protection
<http://www.tasi.ac.uk/advice/managing/copyrights.html>

Briefing 37

Top 10 Quality Assurance Tips

The Top 10 Tips

1 Document Your Policies: You should ensure that you document policies for your project - remember that it can be difficult to implement quality if there isn't a shared understanding across your project of what you are seeking to achieve. For example, see the QA Focus policies on Web standards and link checking [1] [2].
2 Ensure Your Technical Infrastructure Is Capable Of Implementing Your Policies: You should ensure that your technical infrastucture which is capable of implementing your policies. For example, if you wish to make use of XHTML on your Web site you are unlikely to be able to achieve this if you are using Microsoft Word as your authoring tool.
3 Ensure That You Have The Resources Necessary To Implement Your Policies: You should ensure that you have the resources needed to implement your policies. This can include technical expertise, investment in software and hardware, investment in training and staff development, etc.
4 Implement Systematic Checking Procedures To Ensure Your Policies Are Being Implemented: Without systematic checking procedures there is a danger that your policies are not implemented in practice. For example, see the QA Focus checking procedures for Web standards and link [3] [4].
5 Keep Audit Trails: You should seek to provide audit trails which provide a record of results of your checking procedures. This can help to spot trends which may indicate failures in your procedures (for example, a sudden growth in the numbers of non-compliant HTML resources may be due to deployment of a new authoring tool, or a lack of adequate training for new members of the project team).
6 Learn From Others: Rather than seeking to develop quality assurance policies and procedures from scratch you should seek to learn from others. You may find that the QA Focus case studies [5] provide useful advice which you can learn from.
7 Share Your Experiences: If you are in the position of having deployed effective quality assurance procedures it can be helpful for the wider community if you share your approaches. For example, consider writing a QA Focus case study [6].
8 Seek 'Fitness For Purpose' - Not Perfection: You should seek to implement 'fitness for purpose' which is based on the levels of funding available and the expertise and resources you have available. Note that perfection is not necessarily a useful goal to aim for - indeed, there is a danger that 'seeking the best may drive out the good'.
9 Remember That QA Is For You To Implement: Although the QA Focus Web site provides a wide range of resources which can help you to ensure that your project deliverables are interoperable and widely accessible you should remember that you will need to implement quality assurance within your project.
10 Seek To Deploy QA Procedures More Extensively: Rather than seeking to implement quality assurance across your project, it can be beneficial if quality assurance is implemented at a higher level, such as within you department or organisation. If you have an interest in more widespread deployment of quality assurance, you should read about the ISO 9000 QA standards [7].

References

Policy on Web Standards, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/qa/policies/web/>
Policy on Linking, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/qa/policies/links/>
Procedures for Web Standards, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/qa/procedures/web/>
Procedures for Linking, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/qa/procedures/links/>
Case Studies, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/>
Contributing To Case Studies, QA Focus, UKOLN,
<http://www.ukoln.ac.uk/qa-focus/documents/case-studies/#contributing>
Selection and Use of the ISO 9000:2000 family of standards, ISO,
<http://www.iso.org/iso/en/iso9000-14000/understand/selection_use/selection_use.html>

Briefing 62

Digitising Data For Preservation

Background

Digitisation is a production process. Large numbers of analogue items, such as documents, images, audio and video recordings, are captured and transformed into the digital masters that a project will subsequently work with. Understanding the many variables and tasks in this process - for example the method of capturing digital images in a collection (scanning or digital photography) and the conversion processes performed (resizing, decreasing bit depth, convert file formats, etc.) - is vital if the results are to remain consistent and reliable. By documenting the workflow of digitisation, a life history can be built-up for each digitised item. This information is an important way of recording decisions, tracking problems and helping to maintain consistency and give users confidence in the quality of your work.

What to Record

What action was performed at a specific stage? Identify the action performed. For example, resizing an image.
Why was the action performed? Establish the reason that a change was made. For example, a photograph was resized to meet pre-agreed image standards.
When was the action performed? Indicate the specific date the action was performed. This will enable project development to be tracked through the system.
How was the action performed? Ascertain the method used to perform the action. A description may include the application in use, the machine ID, or the operating system.
Who performed the action? Identify the individual responsible for the action. This enables actions to be tracked and identify similar problems in related data.

By recording the answers to these five questions at each stage of the digitisation process, the progress of each item can be tracked, providing a detailed breakdown of its history. This is particularly useful for tracking errors and locating similar problems in other items. The actual digitisation of an item is clearly the key point in the workflow, and therefore formal capture metadata (metadata about the actual digitisation of the item) is particularly important.

Where to Record the Information

Where possible, select an existing schema with a binding to XML:

TEI (Text Encoding Initiative) and EAD (Encoded Archival Description) for textual documents
NISO Z39.87 for digital still images.
SMIL (Synchronized Multimedia Integration Language), MPEG-7 or the Library of Congress' METS A/V extension for Audio/Video.

Quality Assurance

To check your XML document for errors, QA techniques should be applied:

Validate XML against your schema or an XML parser
Check that free text entries follow local rules and style guidelines

Further Information

Encoded Archival Description,
<http://www.loc.gov/ead/>
A Metadata Primer,
<http://www.cmswatch.com/Features/TopicWatch/FeaturedTopic/?feature_id=85>
Dublin Core Metadata Initiative,
<http://dublincore.org/>
MARC Standards,
http://www.loc.gov/marc/>
MPEG- 7 Standard,
<http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm>
Synchronized Multimedia,
<http://www.w3.org/AudioVideo/>
TEI Consortium,
<http://www.tei-c.org/>
Three SGML metadata formats: TEI, EAD, and CIMI,
<http://hosted.ukoln.ac.uk/biblink/wp1/sgml/tei.rtf>
Z39.87: Technical metadata for still digital images,
<http://www.niso.org/standards/resources/Z39_87_trial_use.pdf>

Briefing 65

Audio For Low-Bandwidth Environments

Background

Audio quality is surprisingly difficult to predict in a digital environment. Quality and file size can depend upon a range of factors, including vocal type, encoding method and file format. This document provides guidelines on the most effective method of handling audio.

Factors To Consider

When creating content for the Internet it is important to consider the hardware the target audience will be using. Although the number of users with a broadband connection is growing, the majority of Internet users utilise a dial-up connection to access the Internet, limiting them to a theoretical 56kbps (kilobytes per second). To cater for these users, it is useful to offer smaller files that can be downloaded faster.

The file size and quality of digital audio is dependent upon two factors:

File format
Type of audio

By understanding how these three factors contribute to the actual file size, it is possible to create digital audio that requires less bandwidth, but provides sufficient quality to be understood.

File Format

File format denotes the structure and capabilities of digital audio. When choosing an audio format for Internet distribution, a lossy format that encodes using a variable bit-rate is recommended. Streaming support is also useful for delivering audio data over a sustained period without the need for an initial download. These formats use mathematical calculations to remove superfluous data and compress it into a smaller file size. Several popular formats exist, many of which are household names. MP3 (MPEG Audio Layer III) is popular for Internet radio and non-commercial use. Larger organisations, such as the BBC, use Real Audio (RA) or Windows Media Audio (WMA), based upon its digital rights support. Table 1 shows a few of the options that are available.

Format	Compression	Streaming	Bit-rate
MP3	Lossy	Yes	Variable
Mp3PRO	Lossy	Yes	Variable
Ogg Vorbis	Lossy	Yes	Variable
RealAudio	Lossy	Yes	Variable
Windows Media Audio	Lossy	Yes	Variable

Figure 1: File Formats Suitable For Low-Bandwidth Delivery

Once recorded audio is saved in a lossy format, it is wise to listen to the audio data to ensure it is audible and that essential information has been retained.

Finally, it is recommended that a variable bit-rate is used. For speech, this will usually vary between 8 and 32kbp as needed, adjusting the variable rate accordingly if incidental music occurs during a presentation.

Choosing An Appropriate Encoding Method

The audio quality required, in terms of bit-rate, to record audio data is influenced significantly by the type of audio that you wish to record: music or voice.

Music - Music data is commonly transmitted in stereo and will vary significantly from one second to the next. A sampling rate of 32-64khz is appropriate for low-bandwidth environments, allowing users to listen to streamed audio without significant disruption to other tasks.
Voice - Voice is less demanding than music data. The human voice has a limited range, usually reaching 3-4khz. Therefore, an 8-15khz sampling rate and 8-32kbps bit-rate is enough to maintain good quality. Mono audio, transmitted through a single speaker, will also be suitable for most purposes. Common audio players 'double' the audio content, transmitting mono channel data as stereo audio through two speakers. This is equivalent to a short-range or AM radio, providing a good indication of the audio quality you can expect. By using these methods, the user can reduce file size for voice content by 60%+ in comparison to recording at a higher bit-rate without loss of quality.

Assessing Quality Of Audio Data

The creation of audio data for low-bandwidth environments does not necessitate a significant loss in quality. The audio should remain audible in its compressed state. Specific checks may include the following questions:

Can listeners understand voices in recording?
Can listeners hear quiet sounds?
Can listener hear loud sounds without distortion?

Further Information

IaWiki: MP3, <http://www.infoanarchy.org/wiki/wiki.pl?MP3>
MP3Pro Zone, <http://www.mp3prozone.com/>
Measuring Audio Quality, <http://www.itworld.com/AppDev/1469/NWW1204revside4/>
Ogg Vorbis, <http://www.vorbis.com/>
PC Recording, <http://www.pcrecording.com/>
Real Audio: Producing Music, <http://service.real.com/help/library/guides/production/htmfiles/audio.htm>
Xorys' MP3 FAQ, <http://webhome.idirect.com/~nuzhathl/mp3-faq.html>

Briefing 66

Producing And Improving The Quality Of Digitised Images

Introduction

To produce high-quality digital images you should follow certain rules to ensure that the image quality is sufficient for the purpose. This document presents guidance on digitising and improving image quality when producing a project Web site.

Choose Suitable Source Material

Quality scans start with quality originals - high-contrast photos and crisp B&W line art will produce the best-printed results. Muddy photos and light-coloured line art can be compensated for, but the results will never be as good as with high-quality originals. The use of bad photos, damaged drawings, or tear sheets - pages that have been torn from books, brochures, and magazines - will have a detrimental effect upon the resultant digital copy. If multiple copies of a single image exist, it is advisable to choose the one that has the highest quality.

Scan at a Suitable Resolution

It is often difficult to improve scan quality at a later stage. It is therefore wise to scan the source according to consistent, pre-defined specifications. Criteria should be based upon the type of material being scanned and the intended use. Table 1 indicates the minimum quality that projects should choose:

Use	Type	Dots Per Inch (dpi)
Professional	Text	200
Professional	Graphics	600
Non-professional	Text	150
Non-professional	Graphics	300

Table 1: Guidelines To Scanning Source Documents

Since most scans require subsequent processing, (e.g. rotate an image to align it correctly) that will degrade image quality, it is advisable to work at a higher resolution and resize the scans later.

Once the image has been scanned and saved to in an appropriate file format, measures should be taken to improve the image quality.

Straighten Images

For best results, an image should lay with its sides parallel to the edge of the scanner glass. Although it is possible to straighten images that have been incorrectly digitised, it may introduce unnecessary distortion of the digital image.

Sharpen the Image

To reduce the amount of subtle blur (or 'fuzziness') and improve visual quality, processing tools may be used to sharpen, smooth, improve the contrast level or perform gamma correction. Most professional image editing software contains filters that perform this function automatically.

Correct Obvious Faults

Scanned images are often affected by many problems. Software tools can be used to remove the most common faults:

Remove "red-eye" from a picture.
Correct the colour balance
Repair a tear or crease in a picture, or
Remove a moiré pattern from a picture scanned from a book.

Be careful you do not apply the same effect twice. This can create unusual effects that distract the observer when viewer the picture.

Further Information

JPEG Image Compression FAQ, part 1/2,
<http://www.faqs.org/faqs/jpeg-faq/part1/preamble.html>
How to Design Like a Pro,
<http://www.prographicsllc.com/Digi/Scans.html>
Scanning 101: Getting Great-Looking Line Art from Not-So-Great Sources,
<http://www.creativepro.com/story/feature/6718.html>

Briefing 67

Implementing and Improving Structural Markup

Background

Digital text has existed in one form or another since the 1960s. Many computer users take for granted that they can quickly write a letter without restriction or technical considerations. This document provides advice for improving the quality of structural mark-up, emphasising the importance of good documentation, use of recognised standards and providing mappings to these standards.

Why Should I Use Structural Mark-Up?

The use of structural mark-up can provide many organizational benefits:

Easier to maintain - allows modification to document structure without the need to directly edit the content. An entire site can be updated by changing a single CSS file.
Code reduction - by abstracting the structural element to a separate file, the structural information can be used by multiple documents, reducing the amount of code required.
Portable - The creation of well-formed documents will ensure the document will display correctly on browsers/viewers that support the markup language.
Interoperable - Structural data can be utilized to access information stored in a third party database.

Improving The Quality Of Structural Mark-Up

For organisations that already utilise structural mark-up the benefits are already apparent. However, some consideration should be made on improving the quality of descriptive data. The key to improving data quality is twofold: utilise recognised standards whenever possible; and establish detailed documentation on all aspects of the schema.

Documentation Documentation is an important, if often ignored, aspect of software development. Good documentation should establish the purpose of structural data, examples, and the source of the data. Good documentation will allow others to understand the XML without ambiguity.

Use recognised standards Although there are many circumstances where recognised schemas are insufficient for the required task, the designer should investigate relevant standards and attempt to merge their own bespoke solution with the various standard. In the long-term this will have several benefits:

The project can take advantage of existing knowledge in the field, allowing them to cover areas where they have limited or no experience.
Improve access to content by supporting proven standards, such as SVG.
The time required to map their data to alternative schemas used by other organisations will be reduced significantly.

TEI, Dublin Core and others provide cross-subject metadata elements that can be combined with subject specific languages.

Provide mappings to recognised standards Through the creation of different mappings the developer will standardise and enhance their approach to schema creation, removing potential ambiguities and other problems that may arise. In an organisational standpoint, the mappings will also allow improved relations between cooperating organisations and diversify the options available to use information in new ways.

Follow implementation conventions In addition to implementing recognised standards, it is important that the developer follow existing rules to construct existing elements. In varying circumstances this will involve the use of an existing data dictionary, an examination of XML naming rules. Controlled languages (for example, RDF, SMIL, MathML and SVG) use these conventions to implement specific localised knowledge.

Further Information

Dublin Core Metadata Initiative,
< http://dublincore.org/>
TEI Home site,
<http://www.tei-c.org/>

Briefing 68

Techniques To Assist The Location And Retrieval Of Local Images

Summary

Use of a consistent naming scheme and directory structure, as well as controlled vocabulary or thesaurus improve the likelihood that digitised content captured by many people over an extended period will be organized in a consistent manner that avoid ambiguity and can be quickly located.

This QA paper describes techniques to aid the storage and successful location of digital images.

Storing local images

Effective categorization of images stored on a local drive can be equally as important as storing them in an image management system. Digitisation projects that involve the scanning and manipulating of a large number of images will benefit from a consistent approach to file naming and directory structure.

An effective naming convention should identify the categories that will aid the user when finding a specific file. To achieve this, the digitisers should ask themselves:

What type of information should be identified?
What is the most effective method of describing this information in shorthand?

This can be better described with an example. A digitisation project is capturing photographs taken during wartime Britain. They have identified location, year and photographer as search criteria for locating images. To organize this information in a consistent manner the project team should establish a directory structure, common vocabulary and shorthand terms for describing specific locations. Figure 1 outlines a common description framework:

A sample naming convention

Potential Problems

To avoid problems that may occur when the image collection expands or is transferred to a different system, the naming convention should also take account the possibility that:

Some or all of this information may not be available (e.g. the year may be unknown)
Several photographs are likely to exist that possess the same criteria, same location, year and photographer.
Operating systems (OS) and Content Management Systems (CMS) treat lower case, upper case, and filename spaces in a different manner. To maintain consistency, filenames should be written in lower case and spaces should be avoided or replaced with underscores.
Older operating systems or filing systems (e.g. ISO 9660) use the 8.3 DOS filename restrictions, which may cause problems when accessing these files.
Some characters are illegal on different operating systems. Mac OS cannot use a colon in a filename, while DOS/Windows identifies ?[]/\=+<>:;", as illegal.

Naming conventions will allow the project to avoid the majority of these problems. For example, a placeholder may be chosen if one of the identifiers is unknown (e.g. 'ukn' for unknown location, 9999 for year). Special care should be taken to ensure this placeholder is not easily mistaken for a known location or date. Additional criteria, such as other photo attributes or a numbering system, may also be used to distinguish images taken by the same person, in the same year, at the same location.

Identification of Digital Derivatives

Digital derivatives (i.e. images that have been altered in some way and saved under a different name) introduce further complications in how you distinguish the original from the altered version. This will vary according to the type of changes made. On a simple level, you may simply choose a different file extension or store files in two different directories (Original and modified). Alternatively you may append additional criteria onto the filename (e.g. _sm for smaller images or thumbnails, _orig and _modif for original and modified).

Further Information

FILTER - Focusing Images for Learning and Teaching,
<http://www.filter.ac.uk/>
MacWindows Tutorial, John Rizzo, 2001,
<http://www.macwindows.com/tutfiles.html>
Controlling your language - links to metadata vocabularies, TASI,
<http://www.tasi.ac.uk/resources/vocabs.html>
File Naming, TASI
<http://www.tasi.ac.uk/advice/creating/filenaming.html>

Briefing 71

QA Techniques For The Storage Of Image Metadata

Background

The archival of digital images requires the consideration of the most effective method of storing technical and life cycle information. Metadata is a common method used to describe digital resources, however the different approaches may confuse many users.

This paper describes QA techniques for choosing a suitable method of metadata storage that takes into account the need for interoperability and retrieval.

Choosing a Suitable Metadata Association Model

Metadata may be associated with an image in three ways:

Internal Model:: Metadata is stored within the image file itself, either through an existing metadata mapping or attached to the end of an image file in an ad hoc manner. Therefore, it is simple to transfer metadata alongside image data without special requirements or considerations. However, support for a metadata structure differs between file formats and assignment of the same metadata record to multiple images causes inefficient duplication in comparison to a single metadata record associated with a group of images.
External Model:: A unique identifier is used to associate external metadata with an image file e.g. an image may be stored on a local machine while the metadata is stored on a server. This is better suited to a repository and is more efficient when storing duplicate information on a large number of objects. However, broken links may occur if the metadata record is not modified when an image is moved, or visa versa. Intellectual Property data and other information may be lost as a result.
Hybrid Model:: Uses both internal and externally associated metadata. Some metadata (file headers/tags) are stored directly in the image file while additional workflow metadata is stored in an external database. The deliberate design of external record offers a common application profile between file formats and provides a method of incorporating format-specific metadata into the image file itself. However, it shares the disadvantages of internal & external models in terms of duplication and broken links.

When considering the storage of image metadata, the designer should consider three questions:

What type of metadata do you wish to store?
Is the file format capable of storing metadata?
What environment is the metadata intended to be stored and used within?

The answer to these questions should guide the choice of the metadata storage model. Some file formats are not designed to store metadata and will require supplementation through the external model; other formats may not store data in sufficient detail for your requirements (e.g. lifecycle data). Alternatively, you may require IP (Intellectual Property) data to be stored internally, which will require a file format that supports these elements.

Ensuring Interoperability

Metadata is intended for the storage and retrieval of essential information regarding the image. In many circumstances, it is not possible to store internal metadata in a format that may be read by different applications. This may be for a number of reasons:

The file format does not define metadata placeholders (e.g. BMP), or does not use a metadata profile that the application uses.
A standard image metadata definition and interchange format model does not exist (e.g. JPEG). As a result, the storage mechanism and metadata structure must be defined by each application.
The metadata is stored in a proprietary file format that is not publicly defined.

Before choosing a specific image format, you should ensure the repository software is able to extract metadata and that editing software does not corrupt the data if changes are made at a later date. To increase the likelihood of this, you should take one of the following approaches:

Convert image data to a file format that supports a known metadata structure (e.g. Exif, TIFF, SPIFF and Flashpix).
Use a vendor-neutral, and technology-independent, well-documented metadata standard, preferably one written in XML (e.g. DIG35, Z39.87 & MIX).
Investigate the solutions offered by the DIG35 [1] and the FILTER [2] projects, which are developing a set of templates for consistent description of images.

Although this will not guarantee interoperability, these measures will increase the likelihood that it may be achieved.

Structuring Your Image Collection

To organise your image collection into a defined structure, it is advisable to develop a controlled vocabulary. If providing an online resource, it is useful to identify your potential users, the academic discipline from which they originate, and the language they will use to locate images. Many repositories have a well-defined user community (archaeology, physics, sociology) that share a common language and similar goals. In a multi-discipline collection it is much more difficult to predict the terms a user will use to locate images. The US Library of Congress [3], the New Zealand Time Frames [4] and International Press Telecommunications Council (IPTC) [5] provide online examples of how a controlled vocabulary hierarchy may be used to catalogue images.

References

DIG35 Specification: Metadata for Digital Images, Version 1.0, August 30, 2000,
<http://xml.coverpages.org/FU-Berlin-DIG35-v10-Sept00.pdf>
FILTER,
<http://www.filter.ac.uk/>
Library of Congress Thesauri,
<http://www.loc.gov/lexico/servlet/lexico/>
New Zealand Time Frames, New Zealand National Library
<http://timeframes1.natlib.govt.nz/nlnz-browse>
International Press Telecommunications Council
<http://www.iptc.org/>

Briefing 74

Improving The Quality Of Digitised Images

Summary

A digitised image requires careful preparation before it is suitable for distribution. This document describes a workflow for improving the quality of scanned images by correcting faults and avoiding common errors.

Preparing your master image

The sequence in which modifications are made will have a significant contribution to the quality of the final image. Although conformance to a strict sequence is not always necessary, inconsistencies may be introduced if the order varies dramatically between images. The Technical Advisory Service for Images (TASI) recommends the following order:

Does the image require rotation or cropping?
In many circumstances, the digitiser will not require the entire image. Cropping an image to a specific size, shape or orientation will reduce the time required for the computer to manipulate the image and prioritise errors to those considered important.
Are shades and colours difficult to distinguish?
Scanners and digital cameras often group colours into a specific density range. This makes it difficult to differentiate shades of the same colour. Use the Histogram function with Photoshop (or other software) and adjust the different levels to best use the range of available tones.
Is the colour balance accurate in comparison to the original?
Some colours may change when digitised, e.g. bright orange may change to pink. Adjust the colour balance by modifying the Red, Green & Blue settings. Decreasing one colour increases its opposite.
Are there faults or artefacts on the image?
Visual checks should be performed on each image, or a selection of images, to identify faults, such as dust specks or scratches on the image.

Once you are satisfied with the results, the master image should be saved in a lossless image format - RGB Baseline TIFF Rev 6 or PNG are acceptable for this purpose.

Improving image quality

Subsequent improvements by resizing or sharpening the image should be performed on a derivative.

Store work-in-progress images in a lossless format
Digitisers often get into the habit of making modifications to a derivative image saved in a 'lossy' format, i.e. a format that simplifies detail to reduce file size. This is considered bad practice, will reduce quality and cause compression 'artefacts' to appear over subsequent edits. When repeatedly altering an image it is advisable to save the image in a lossless format (e.g. TIFF, PNG) until the image is ready for dissemination. Once all changes have been made it can be output in a lossy format.
Filter the image
Digitised images often appear 'noisy' or contain dust and scratches. Professional graphic manipulation (Photoshop, PaintShop Pro, etc.) possesses graphic processors that can be useful in removing these effects. Common filters include 'Despeckle' that subtly blurs an image to reduce the amount of 'noise' in an image and 'median' that blends the brightness of pixels and discards pixels that are radically different from adjacent pixels.
Remove distracting effect
If you are funded to digitise printed works, moiré (pronounced more-ray) effects may be a problem. Magazine or newspaper illustrations that print an image as thousands of small coloured dots produce a noticeable repeating pattern when scanned. Blur effects, such as the Gaussian blur, are an effective method of reducing noticeable moiré effects, however these also reduce image quality. Resizing the image is also an effective strategy that forces the image-processing tool to re-interpolate colours, which will soften the image slightly. Although these effects will degrade image to an extent, the results are often better than a moiré.

Further Information

Image Manipulation and Preparation, TASI,
<http://www.tasi.ac.uk/advice/using/dimpmanipulation.html>
Using Raster Images, QA Focus briefing document,
<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-28/>
Digital Imaging Basics, TASI,
<http://www.tasi.ac.uk/advice/using/basics.html>

Briefing 75

Digitisation Of Still Images Using A Flat-Bed Scanner

Preparing For A Large-Scale Digitisation Project

The key to the development of a successful digitisation project is to separate it into a series of stages. All projects planning to digitise documents should establish a set of guidelines to help ensure that the scanned images are complete, consistent and correct. This process should consider the proposed input and output of the project, and then find a method of moving from the first to the second.

This document provides preparatory guidance to consider when approaching the digitisation of many still images using a flatbed scanner.

Choose Appropriate Scanning Software

Before the digitisation process may begin, the digitiser requires suitable tools to scan & manipulate the image. It is possible to scan a graphic using any image processing software that supports TWAIN (an interface to connect to a scanner, digital camera, or other imaging device from within a software application), however the software package should be chosen carefully to ensure it is appropriate for the task. Possible criteria for measuring the suitability of image processing software include:

The ability to perform batch operations upon many different images
The ability to perform image processing upon an image.
The digitisers familiarity with the software package

A timesaving may be found by utilizing a common application, such as Adobe Photoshop, Paintshop Pro, or GIMP. For most purposes, these offer functionality that is rarely provided by editing software included with the scanner.

Check The Condition Of The Object To Be Scanned

Image distortion and dark shading at page edges are common problems encountered during the digitisation process, particularly when handling spine-bound books. To avoid these and similar issues, the digitiser should ensure that:

The document is uniformly flat against the document table.
The document is not accidentally moved during scanning.
The scanner is on a flat, stable surface.
The edges of the scanner are covered by paper to block external light, caused when the object does not lay completely flat against the scanner.

Scanning large objects that prevent the scanner lid being closed (e.g. a thick book) often causes discolouration or blurred graphics. Removing the spine will allow each page to be scanned individually, however this is not always an option (i.e. when handling valuable books). In these circumstances you should consider a planetary camera as an alternative scanning method.

Identification Of A Suitable Policy For Digitisation

It is often costly and time-consuming to rescan the image or improve the level of detail in an image at a later stage. Therefore, the digitiser should ensure that a consistent approach to digitisation is taken in the initial stages. This will include the choice of a suitable resolution, file format and filename scheme.

Establish a consistent quality threshold for scanned images

It is difficult to improve low quality scans at a later date. It is therefore important to digitise images at a at a slightly higher resolution (measured in pixels per inch) and scan type (24-bit or higher for colour, or 8-bit or higher for grey scale) than required and rescale the image at a later date.

Choose an appropriate image format

Before scanning the image, the digitiser should consider the file format in which it will be saved. RGB Baseline TIFF Rev 6 is the accepted format of master copies for archival and preservation (although PNG is a possible alternative file format). To preserve the quality, it is advisable to avoid compression where possible. If compression must be used (e.g. for storing data on CD-ROM), the compression format should be noted (Packbits, LZW, Huffman encoding, FAX-CCITT 3 or 4). This will avoid incompatibilities in certain image processing applications.

Data intended for dissemination should be stored in one of the more common image formats to ensure compatibility with older or limited browsers. JPEG (Joint Photographic Experts Group) is suitable for photographs, realistic scenes, or other images with subtle changes in tone, however its use of 'lossy' compression causes sharp lines or letterings are likely to become blurred. When modifying an image, the digitiser should return to the master TIFF image, make the appropriate changes and resave it as a JPEG.

Choose an appropriate filename scheme

Digitisation projects will benefit from a consistent approach to file naming and directory structure that allows images to be organized in a manner that avoids confusion and can be quickly located. An effective naming convention should identify the categories that will aid the user when finding a specific file. For example, the author, year it was created, thematic similarities, or other notable factors. The digitiser should also consider the possibility that multiple documents will have the same filename or may lack specific information and consider methods of resolving these problems. Guidance on this issue can be found in related QA Focus documents.

Further Information

Acquiring Images, MicroFrontier Online
<http://www.microfrontier.com/support_area/faq031/qa002.html>
Scanner Quality Problems, Epson
<http://support2.epson.net/manuals/english/scanner/perfection2450photo/REF_G/TRBLE_6.HTM>
TWAIN Standard, Twain
<http://www.twain.org/>
Resolving the Units of Resolution, TASI,
<http://www.tasi.ac.uk/advice/creating/dpi.html>
A Rough Guide to Image Sizes, TASI,
<http://www.tasi.ac.uk/advice/creating/roughguide.html>
Generic Image Workflow: TASI Recommended Best Practice for Digitisation Projects, TASI,
<http://www.tasi.ac.uk/advice/managing/workflow_generic.html>
The Book Scanning and Digisation Process
<http://www.rod-neep.co.uk/books/production/scan/>

Briefing 76

Choosing A Suitable Digital Watermark

Summary

Watermarking is an effective technology that solves many problems within a digitisation project. By embedding Intellectual Property data (e.g. the creator, licence model, creation date or other copyright information) within the digital object, the digitiser can demonstrate they are the creator and disseminate this information with every copy, even when the digital object has been uploaded to a third party site. It can also be used to determine if a work has been tampered with or copied.

This paper describes methods for establishing if a project requires watermarking techniques and criteria for choosing the most suitable type.

Purpose Of A Watermark

Before implementing watermarking within your workflow, you should consider its proposed purpose. Are you creating watermarks to indicate your copyright, using it as a method of authentication to establish if the content has been modified, or doing so because everyone else has a watermarking policy? The creation of a watermark requires significant thought and modification to the project workflow that may be unnecessary if you do not have a specific reason for implementing it.

For most projects, digital watermarks are an effective method of identifying the copyright holder. Identification of copyright is encouraged, particularly when the work makes a significant contribution to the field. However, the capabilities of watermarks should not be overstated. It is useful in identifying copyright, but is incapable of preventing use of copyrighted works. The watermark may be ignored or, given sufficient time and effort, removed entirely from the image. If the intent is to restrict content reuse, a watermark may not be the most effective strategy.

Required Attributes Of A Watermark

To assist the choice of a watermark, the project team should identify the required attributes of a watermark by answering two questions:

To whom do I wish to identify my copyright?
What characteristics do I wish the watermark to possess?

The answer to the first question is influenced by the skills and requirements of your target audience. If the copyright information is intended for non-technical and technical users, a visible watermark is the most appropriate. However, if the copyright information is intended for technical users only or the target audience is critical of visible watermarks (e.g. artist may criticise the watermark for impairing the original image), an invisible watermark may be the best option.

To answer the second question, the project team should consider the purpose of the watermark. If the intent is to use it as an authentication method (i.e. establish if any attempt to modify the content has been made), a fragile watermark will be a valued attribute. A fragile watermark is less robust towards modifications where even small change of the content will destroy embedded information. In contrast, if the aim is to reflect the owner's copyright, a more robust watermark may be preferential. This will ensure that copyright information is not lost if an image is altered (through cropping, skewing, warp rotation, or smoothing of an image).

Choosing A Resilient Watermark

If resilience is a required attribute of a digital watermark, the project team has two options: invisible or visible watermark. Each option has different considerations that make it suitable for specific purposes.

Invisible Watermarks
Invisible watermarks operate by embedding copyright information within the image itself. As a rule, watermarks that are less visible are weaker and easier to remove. When choosing a variant it is important to consider the interaction between watermark invisibility and resilience. Some examples are shown in Table 1:

Name	Description	Resilience
Bit-wise	Makes minor alterations to the spatial relation of an image	Weak
Noise Insertion	Embed watermark within image noise	Weak
Masking and filtering	Similar to paper watermarks on a bank note, it provides a subtle, though recognisable evidence of a watermark.	Strong
Transform domain	Uses dithering, luminance, or lossy techniques (similar to JPEG compression) on the entire or section of an image.	Strong

Table 1: Indication of resilience for invisible watermarks

'Bit-wise' & 'noise insertion' may be desirable if the purpose is to determine whether the medium has been altered. In contrast, 'transform domain' and 'masking' techniques are highly integrated into the image and therefore more robust to deliberate or accidental removal (caused by compression, cropping, and image processing techniques) in which significant bits are changed. However, these are often noticeable to the naked eye.

Visible Watermarks
A bird A visible watermark is more resilient and may be used to immediately identify copyright without significant effort by the user. However, these are, by design, more intrusive to the media. When creating a visible watermark, the project team should consider its placement. Projects funded with public money should be particularly conscious that the copyright notice does not interfere with the purpose of the project. A balance should be reached between the need to make the watermark difficult to remove and its use to the user.

Both watermarks make them suitable for specific situations. If handling a small image collection, it may be feasible (in terms of time and effort) to use both watermarks as a redundant protection measure - in the event that one is removed, the second is likely to remain.

Information Stored within the Watermark

If the project is using a watermark to establish their copyright, some thought should be made on the static information you wish to provide. For example:

Creator: The forename and surname of the person who created the image, either as full text or their initials.
Organisation: The project or organisation that holds copyright for the work.
Creation date: The date of creation, either the exact date (e.g. 24/03/2004) or year (2004)
Identifiers: A unique identifier to distinguish the image, distributor, creator, transaction, and other attributes.

Some content management systems are also able to generate dynamic watermarks and embed them within the image. This may record the file information (file format, image dimensions, etc.) and details about the download transaction (transaction identifier, download date, etc.). This may be useful for tracking usage, but may annoy the user if the data is visible.

Implementing Watermarks in the Project Workflow

To avoid unnecessary corruption of a watermark by the digitiser/creator themselves, the watermark creation process should be delayed until the final steps of the digitisation workflow. Watermarks can be easily removed when the digitiser is modifying the image in any way (e.g. through cropping, skewing, adjustment of the RGB settings, or through use of lossy compression). If an image is processed to the degree that the watermark cannot be recognized, then reconstruction of the image properties may be possible through the use of an original image.

Further Information

DigiMarc,
<http://www.digimarc.com/>
Digital Image Archive User Issues: Digital Watermarking, TASI,
<http://www.tasi.ac.uk/advice/using/uissues.html#ui6>
Paint Shop Pro 5 - Watermarking Photographs,
<http://www.grafx-design.com/24psp.html>

QA Focus Briefing Documents: Print All - Digitisation

Image QA in the Digitisation Workflow

Introduction

Image QA

1 Strategic QA

Local Search Engine

2 Process QA

3 Sign-off QA

4 On-going QA

QA in the Digitisation Workflow

Acknowledgements:

QA Procedures For The Design Of CAD Data Models

Background

Establish CAD Layout Standards

Be Consistent With Layers And Naming Conventions

Ensure Tolerances Are Consistent

Check For Illegal Geometry Definitions

Further Information

Documenting Digitisation Workflow

Background

What to Record

Where to Record the Information

Quality Assurance

Further Information

QA for GIS Interoperability

Background

Ensure Content Is Available In An Appropriate GIS Standard

Resolve Differences In The Data Structures

Ensure Data Meet The Required Standards

Compensate For Different Measurement Systems

Ensure Precise Measurements Are Accurate

Further Information

Choosing A Suitable Digital Rights Solution

Background

Why do I need Digital Rights Management?

What types of DRM Methodologies Exist?

1) Supportive digital rights

2) Restrictive digital rights

Ensuring Interoperability

Summary

Further Information

Recording Digital Sound

Background

Sample Rates

Bit-rate

Digital Audio Formats

What Is The Best Solution?

Further Information

Handling International Text

Background

What is the Best Tool for the Job?

File Formats

Character Encoding

Problems

Structural Mark-up

Further Information

Choosing A Suitable Digital Video Format

Background

Composition of a Digital Video File

What is the Best Solution?

Distribution Methods

Video Quality

Problems

Definitions

Further Information

Intellectual Property Rights

Introduction

Managing Copyright on Own Work

Copyright Clearance

Indicating IPR through Metadata

Further Information

Implementing Quality Assurance For Digitisation

Background

Quality Assurance Within Your Project

Key Requirements For A Quality Assurance System

Summary

Further Information

Choosing A Vector Graphics Format For The Internet

Background

File Formats