CAT-ASSESS

Choosing the Sample for the Catalogue Audit Tool

The population

The set of catalogue records to be evaluated is the population. You may wish to evaluate your whole catalogue, or only parts of it (sub-populations).

Examples of sub-populations are: non-fiction, material in languages other than English, grey literature, records created by outsourced cataloguing or retrospective conversion, or records created before or after a certain date. You can sample more than one sub-population and pool the results to produce overall statistics provided there is no overlap of material.

The sample size

The number of records to audit depends on the acceptable margin of error in the result. Notice that it does not depend on the size of the collection.

Acceptable margin of error	Required sample size
4.9%	400 records
3.5%	800 records
2.5%	1500 records

In practice mistakes are often rare in library catalogues. This means that if resources restrict the size of the sample then the estimate obtained will probably still be close to the true frequency of errors.

Constructing the sample

A random sample of catalogue records must be obtained, either from the computerised library system (method A) or by systematic sampling of the shelves or a shelf-list (method B).

Ideally, the library's software can produce a random sample from the catalogue. One method is to generate random numbers with another program and use these to select records by control number. Check with your system or technical librarian as to what the system can produce. If your system cannot generate this type of sample, you will need to look at other options.

A systematic sample

One practical alternative is to take a systematic sample by choosing records from the population at a fixed interval. The interval is chosen by dividing the size of the population (estimated if necessary) by the sample size. Avoid starting with the first record in the population. Systematic samples can be chosen in a number of ways.

A simple option is to take a systematic sample from the shelves, for instance the 5th item from the left on the 3rd shelf from the top of each bay. This method does restrict the population to items not on loan. This is not a problem in a non-lending collection or when few items are on loan (for example during closed periods or in academic libraries during the summer vacation). However if large numbers of items are on loan at the time of audit, then bias will be introduced into the sample. Additionally, if the specific shelves of a bay are always used for specific materials (outsize books or report literature) then bias will again be introduced into the sample, either by excluding these materials when the shelf is not chosen or by excluding all other materials if the shelf is chosen.

Sequential control numbers can be used to generate systematic samples, providing that there are no gaps in the run of numbers (where material has been withdrawn and control numbers not re-used). If the number picked no longer corresponds to an item, another number should be drawn independently and at random. Selection can be performed either by computer or by hand with the aid of a printout. If using a printout, select records a fixed number down a page. Some bias will be introduced if pages contain different numbers of records, especially if the last page is not full.

What should be excluded from the sample?

The sample should not include records for items on order as there will be no physical item to check. Do not try to achieve this by eliminating records to which no copies are attached, as these 'ghost' records should be considered as errors unless they are order records. The sample should not include records for items which are being bound or repaired or marked as missing.

Content by: Ann Chapman of UKOLN.
Page last revised on: 03-Jun-2005
Email comments to: web-support@ukoln.ac.uk