UKOLN AHDS Transcribing Documents



Digitising Text by Transcription

Transcription is a very simple but effective way of digitising small to medium volumes of text. It is particularly appropriate when the documents to be digitised have a complex layout (columns, variable margins, overlaid images etc.) or other features that will make automatic digitisation using OCR (Optical Character Recognition) software difficult. Transcription remains the best way to digitise hand written documents.

Representing the Original Document

All projects planning to transcribe documents should establish a set of transcription guidelines to help ensure that the transcriptions are complete, consistent and correct.

Key issues that transcription guidelines need to cover are:

It is generally good practice to not correcting factual errors or mistakes of grammar or spelling in the original.

Avoiding Errors

Double-entry is the best solution - where two people separately transcribe the same document and the results are then compared. Two people are unlikely to make the same errors, so this technique should reveal most errors. It is, however often impractical because of the time and expense involved. Running a grammar and spell checker over the transcribed document is a simpler way of finding many errors (but assumes the original document was spelt and written according to modern usage).

Transcribing Structured Documents

Structured documents, such as census returns or similar tabular material may be better transcribed into a spreadsheet package rather than a text editor. When transcribing tables of numbers, a simple but effective check on accuracy is to use a spreadsheet to calculate row and column totals that can be compared with the original table. Transcriber guidelines for this type of document will need to consider issues such as:

It is good practice to record values, such as weights, distances, money and ages as they are found, but also to include a standardised representation to permit calculations (e.g. 'baby, 6m' should be transcribed verbatum, but an addition entry of 0.5, the age in years, could also be entered)

Further Information

Many genealogical groups transcribe documents, and provide detailed instructions. Examples include: