UKOLN AHDS Implementing and Improving Structural Markup



Background

Digital text has existed in one form or another since the 1960s. Many computer users take for granted that they can quickly write a letter without restriction or technical considerations. This document provides advice for improving the quality of structural mark-up, emphasising the importance of good documentation, use of recognised standards and providing mappings to these standards.

Why Should I Use Structural Mark-Up?

Although ASCII and Unicode are useful for storing information, they are only able describe each character, not the method they should be displayed or organized. Structural mark-up languages enable the designer to dictate how information will appear and establish a structure to its layout. For example, the user can define a tag to store book author information and publication date.

The use of structural mark-up can provide many organizational benefits:

The most common markup languages are SGML and XML. Based upon these languages, several schemas have been developed to organize and define data relationships. This allows certain elements to have specific attributes that define its method of use (see Digital Rights document for more information). To ensure interoperability, XML is advised due to its support for contemporary Internet standards (such as Unicode).

Improving The Quality Of Structural Mark-Up

For organisations that already utilise structural mark-up the benefits are already apparent. However, some consideration should be made on improving the quality of descriptive data. The key to improving data quality is twofold: utilise recognised standards whenever possible; and establish detailed documentation on all aspects of the schema.

Documentation Documentation is an important, if often ignored, aspect of software development. Good documentation should establish the purpose of structural data, examples, and the source of the data. Good documentation will allow others to understand the XML without ambiguity.

Use recognised standards Although there are many circumstances where recognised schemas are insufficient for the required task, the designer should investigate relevant standards and attempt to merge their own bespoke solution with the various standard. In the long-term this will have several benefits:

  1. The project can take advantage of existing knowledge in the field, allowing them to cover areas where they have limited or no experience.
  2. Improve access to content by supporting proven standards, such as SVG.
  3. The time required to map their data to alternative schemas used by other organisations will be reduced significantly.

TEI, Dublin Core and others provide cross-subject metadata elements that can be combined with subject specific languages.

Provide mappings to recognised standards Through the creation of different mappings the developer will standardise and enhance their approach to schema creation, removing potential ambiguities and other problems that may arise. In an organisational standpoint, the mappings will also allow improved relations between cooperating organisations and diversify the options available to use information in new ways.

Follow implementation conventions In addition to implementing recognised standards, it is important that the developer follow existing rules to construct existing elements. In varying circumstances this will involve the use of an existing data dictionary, an examination of XML naming rules. Controlled languages (for example, RDF, SMIL, MathML and SVG) use these conventions to implement specific localised knowledge.

Further Information