UKOLN AHDS Deploying XHTML On The QA Focus Web Site



Background

This case study describes the deployment of XHTML 1.0 on the QA Focus Web site and the proposed approaches taken to changing the MIME type.

Note that this case studies will be updated once the changes described in the document have been made.

The QA Focus Web Site

The QA Focus Web site [1] is based primarily on the XHTML 1.0 standard. The decision to use XHTML was taken for several reasons:

The Web site is based on simple use of the PHP scripting language. Key resources are stored in their own directory. A intro.php file is used to include various parameters (title and author details, etc.), navigational elements of the page and the main content of the page, which is managed as a separate XHTML 1.0 fragment.

HTML-kit [2] is used as the main authoring tool.

The ,validate, ,rvalidate and ,cssvalidate tools [3] [4] are used to validate resources when they are created or updated.

Problem Being Addressed

Following comments on W3C's QA list [5] it was recognised that use of the text/html MIME type, which was used when both HTML and XHTML resources were served, did not represent best practice. Although the XHTML resources could be displayed by Web browsers the MIME type used meant that XHTML resources would be processed as HTML. Use of the application/xhtml+xml MIME type would mean that browsers which can process XML would process the resource more quickly, as the XML renderer would only have to process a well-structured XML tree, rather than parse HTML and seek to process HTML errors (as HTML browsers are expected to do).

It was also noted that use of the text/html MIME type required compliance with an additional set of guidelines documented in Appendix C of the XHTML 1.0 specification [6] and the XHTML Media Types document [7].

In addition to providing potential benefits to end users, use of the application/xml MIME type will help with the growth of a better-structured Web environment for the benefit of everyone.

Potential Problems With Proposed Solution

Although deployment of the application/xhtml+xml MIME type for use with XHTML resources on the QA Focus Web site would reflect best practices for XML resources, this change does have some potential pitfalls. Before making any changes it is important to be aware of potential problem areas.

The Need For Compliance

The XML standard insists that XML resources must comply with the standard. Conforming XML application should not attempt to process non-compliant resources. This means that if a XHTML resource is defined as XML using the application/xhtml+xml MIME type Web browsers would not be expected to display the page if the resource contained XHTML errors.

Although it is perfectly reasonable that a program will not process a resource if the resource does not comply with the expected standard, this behaviour is not normally expected on the Web. The HTML standard expects Web browsers to attempt to render resources even though they do not comply with the standard. This led to a failure to appreciate the importance of compliance with standards which has resulted in many Web resources being non-compliant. Unfortunately this makes it very difficult for Web resources to be repurposed or to be processed as efficiently as they should be.

Ensuring Compliance

The move to a compliant XHTML environment clearly has many advantages. However there are several potential deployment difficulties:

Ideally a workflow system which can guarantee that the resources are compliant would be used. This could be based on use of a Content Management System (CMS) or processing of resources by software such as Tidy [8] prior to publishing on the main Web site. However due to lack of resources, we are not in a position currently to move to this type of publishing environment.

We therefore intend to ensure that documents are XHTML compliant when they are published. The information providers for the Web site will be made aware that ensuing compliance is now mandatory rather than highly desirable.

Managing MIME Types

MIME types are often associated with resources by mapping a file extension with a MIME type in the server configuration file. For example, files with a .html extension are normally given a text/html MIME type. It would be very simple to give files with a .xhtml extension an application/xhtml+xml MIME type. However in our environment most files have the extension .php; these PHP scripts are processed in order to create the XHTML resource. Fortunately it is possible for the PHP script to define the MIME type to be used. This is the approach we intend to deploy.

However in order to allow us to migrate back to use of the text/html MIME type if we experience problems, we will ensure that the MIME type is defined in a single location. This has the advantage that if we wish to deploy an alternative XML MIME type in the future it can be done relatively easily.

Limitations In Browser Support

Unfortunately some browsers do not understand the application/xhtml+xml MIME type - including Internet Explorer [9]. In order to support such browsers it is necessary to use content negotiation to serve the XHTML 1.0 resource as text/html to Internet Explorer with application/xhtml+xml being sent to other browsers.

Deployment

As described above we intend to implement XHTML and the application/xhtml+xml MIME type in the following way:

We will probably use the following PHP code:

<?php
if ( stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml") ) {
  header("Content-type: application/xhtml+xml");
}
else {
  header("Content-type: text/html");
}
?>

which was documented in the "The Road to XHTML 2.0: MIME Types" article [10], which also provides very useful background information on this topic.

Prior to making the proposed changes we will seek advice on our approach by inviting comments on this document.

We will then validate the Web site to ensure that all XHTML resources are compliant.

Once the changes are implemented we will check the Web sites using a number of browsers which are available locally. This will include Internet Explorer, Netscape, Mozilla, Opera, Avant and Lynx on a Microsoft platform. We will invite others who have additional browsers of browsers on other platforms to confirm that the Web site is still functional.

Things We Would Do Differently

If we were to start again, we would ensure that the PHP template contained allowed HTTP headers to be sent. Currently the template does not allow this.

We would also ensure that there was a technical review meeting of members of the QA Focus team which would discuss the advantages and disadvantages of XHTML and procedure a document given our choice of HTML formats and the reasons for the choice.

We would explore the possibilities of running Tidy on the server as part of the publishing process.

References

  1. QA Focus,
    <http://www.ukoln.ac.uk/qa-focus/>
  2. HTML-kit, >
    <http://www.chami.com/html-kit/>
  3. A Proposal For Consistent URIs For Checking Compliance With Web Standards, Brian Kelly, Internet/WWW 2003 Conference
    <http://www.ukoln.ac.uk/qa-focus/documents/papers/iadis-2003/poster/>
  4. Procedures For Web Standards, QA Focus
    <http://www.ukoln.ac.uk/qa-focus/qa/procedures/web/>
  5. www-qa@w3.org Mail Archives, W3C
    <http://lists.w3.org/Archives/Public/www-qa/>
  6. XHTML 1.0, W3C
    <http://www.w3.org/TR/xhtml1/#guidelines>
  7. XHTML Media Types, W3C
    <http://www.w3.org/TR/xhtml-media-types/#text-html>
  8. HTML Tidy Library Project,
    <http://tidy.sourceforge.net/>
  9. WaSP Asks the W3C, WaSP
    <http://www.webstandards.org/learn/askw3c/sep2003.html>
  10. The Road to XHTML 2.0: MIME Types, XML.com
    <http://www.xml.com/pub/a/2003/03/19/dive-into-xml.html>

Contact Details

Brian Kelly
UKOLN
University of Bath
BATH
UK
BA2 7AY

Email: B.Kelly AT ukoln.ac.uk

QA Focus Comments

For QA Focus use.