UKOLN AHDS Improving Interoperability Between Multiple Databases



About This Document

A relational database is a set of structured data, organised according to a data model. When exporting data from one application to another, it is a simple process to export the data as an ASCII text file that will describe every field within a table. However, many problems can be encountered that will increase the amount of effort and time required to import the data. This paper describes specific quality-based techniques that should be used in the development process to minimise the difficulty encountered at a later date.

Documenting The Database Structure

The key to continued access of a digital resource is documentation. This avoids the problems that arise when an administrator leaves the project and essential knowledge is lost. Before exporting data you should make a note of the table relationships and primary keys. This will allow the data to be recombined using the same structure in an alternative package. You should also identify specific requirements of each field. For example, the field size, import mask, validation rules, default value, indexing, etc.

Use Appropriate Descriptors

Two problems relating to the database organisation can be avoided by the use of appropriate descriptors. The first is to understand the importance of table and field names when identifying information. A row of numbers has little meaning until we identify the context, i.e. payroll numbers, lottery numbers, etc. This will make it easier to interpret and recombine the data at a later date.

The second issue to consider when choosing fields names is the possibility that this data will become corrupt at a later date or will be misinterpreted by the application. This is caused when specific reserved characters used for distinguishing between field (commas, semi-colons, tabs, quotations, etc.), or system-illegal characters (ampersands, asterisks, hash, or other mathematical symbols) are used.

It is important to avoid such issues by restricting yourself to the English alphabet or numerical values, and avoiding other symbols.

Ensure Consistency

When handling data from multiple databases it is good practice to standardise the responses so that they can be understood and manipulated more easily. This may involve a simple process of replacing all reference to one value with another (e.g. changing 1,2,3 to Mr, Mrs, Miss). In other circumstances you may need to write a query to split the postcode from the main address field.

You should also ensure that date and time are referenced correctly. These can be easily confused when exporting database fields in a text file. For example, a date that reads '12/04/2003' can be interpreted as April 12th or December 4th, 2003. To avoid ambiguity always enter and store dates with a four-digit century and times of day using the 24 hour clock. The ISO format (yyyy-mm-dd) is useful for absolute clarity, particularly when mixing databases at a later date.

Proprietary Extensions

Care should be taken when using proprietary extensions, as their use will tie your database to a particular software package. Unlike SQL commands, these application-specific elements cannot be exported to other applications without extensive work to convert or recreate the resource. Examples of proprietary extensions include the user interface and application-specific commands.

Further Information