UKOLN AHDS Digitisation of Wills and Testaments by the Scottish Archive Network (SCAN)



Background

Scottish Archive Network (SCAN) is a Heritage Lottery Funded project. The main aim of the project is to open up access to Scottish Archives using new technology. There are three strands to the project:

  1. Creating collection level descriptions of records held by 52 archives throughout Scotland [1].
  2. Provide all the resources that a researcher will need when accessing archive resources over the Internet [2].
  3. Digitisation of the Wills and Testaments registered in Scotland from 1500s to 1901 [3].

The digitisation of the Wills and Testaments are the focus of this case study.

Problem Being Addressed

The digitisation of the testaments is an ambitious undertaking. The main issues to be considered are:

The Approach Taken

Document Preparation

As digital objects, images of manuscript pages lack the obvious information given by a physical page bound in a volume. It is important for completeness and for sequence that the pages themselves are accurately paginated. This gives a visual indication of the page number on the image as well as being incorporated into the naming convention used to identify the file. As a result quality is improved by reducing the number of pages missed in the digitisation process and by ensuring that entire volumes are captured and in the correct sequence.

Image Capture

The image capture program (dCam) automated the file naming process thereby reducing operator error and automatically capturing metadata for each image. This included date, time, operator id, file name, camera id and so on which helped in identifying whether later problems related to operator training or to a specific workstation. The program also included simple options for retakes.

Post Image Capture

We have instituted a secondary quality assurance routine. This involves an operator (different to the one who captured the images) examining a selection of the images for any errors missed by the image capture operator. Initially, 100% of the images were checked, but a 30% check was soon found to be satisfactory. The quality control is carried out within 24 hours of a volume being digitised, which means that the volume is still available in the camera room should the any retakes be necessary. The QA operators have a list of key criteria to assess the image - completeness, colour, consistency, clarity and correctness. When operators finds a defective image they reject it and select the reason from a standardised list. Although the images are chosen at random, whenever an error is found the QA program will present the next sequential image, as it is more likely for errors to be clustered together. A report is produced by the QA program which is then used to select any retakes. The reports are also analysed for any recurring problems that may be corrected at the time of capture. Further QA criteria: the quality of the cameras had been specified in terms of capacity (i.e. number of pixels), and we found that it is also possible to specify the quality of the CCD in terms of an acceptable level of defective pixels. This, however, does have a bearing on cost.

Problems Experienced

Preparation

This was a time consuming process, which was slower than capture itself. It was important to build up sufficient material in advance of the digitisation getting underway.

Capture

We chose to capture colour images. The technique used was to take three separate colour images through red, green and blue filters and then to combine them into a single colour image. This worked well and produced very high quality colour images. However, it was very difficult to spot where there had been slight movement between the three colour shots. At a high level of magnification this produced a mis-registration between the 3 colour planes. The QA process sometimes caught this but it was far more costly for this to be trapped later on. We discovered that where there had been slight movement, the number of distinct colours in an image was almost double the average. We used this information to provide a report to the QA operators highlighting where potential colour shift had taken place. In addition the use of book cradles helped reduce this problem as well as enabling a focused image to be produced consistently.

Things We Would Do Differently

The project has achieved successful completion within budget. For the digital capture program it proved possible to capture an additional 1 million pages as the capture and quality control workflow worked well. It is clear that the process is well suited to high throughput capture of bound manuscript material. Loose-leaf material took far more conservation effort and a much longer time to capture.

References

  1. Online Catalogues Microsite, Scottish Archive Network,
    <http://www.scan.org.uk/aboutus/indexonline.htm>
  2. Scottish Archive Network,
    <http://www.scan.org.uk/>
  3. Scottish Documents, Scottish Archive Network,
    <http://www.scottishdocuments.com/content/>

Contact details

Rob Mildren
Room 2/1
Thomas Thomson House
99 Bankhead Crossway N
Edinburgh
EH11 4DX

for SCAN Business:
Tel: 0131-242-5802
Fax: 0131-242-5801
Email: rob.mildren@scan.org.uk
URL: http://www.scan.org.uk/

for NAS Business:
Tel: 0131-270-3310
Fax: 0131-270-3317
Email: rob.mildren@nas.gov.uk
URL: http://www.nas.gov.uk/

QA Focus Comments

This case study describes a project funded by the Heritage Lottery Fund. Although the project has not been funded by the JISC, the approaches described in the case study may be of interest to JISC projects.