Collection Level Description

A review of existing practice

...an eLib supporting study

[contents]
[previous] [next]


3. Collection Description

3.11 Web Collections

This section describes a number of mechanisms for grouping collections of Web-based resources.

3.11.1 Simple linking

An early attempt to provide a mechanism for grouping Web resources was the HTML <LINK> element. For example:

<LINK NAME="next" HREF="slide-06.html">
<LINK NAME="previous" HREF="slide-04.html">

In this case the <LINK> element provides the address of the next and previous files. Although this information is readily available to the end user using normal <HREF> hypertext links, there is no mechanism for an application to automatically process this information. The <LINK> element provides a means of providing the information in a machine-understandable way.

The NCSA Mosaic for Windows browser makes use of the <LINK> element by providing buttons that can be used to move to the next and previous files. A document which has been split into several HTML pages could be described by using a series of 'Next' and 'Previous' LINK attributes. This information could be used to print the collection in a single operation, without having to display and then print each page individually.

Although the <LINK> element provides a simple mechanism for describing simple relationships between Web resources, alternative mechanisms are needed to describe more complex relationships.

3.11.2 Microsoft Web Collections

In March 1997 Microsoft submitted a proposal to the World Wide Web Consortium on Web Collections using XML. This proposal uses the Extensible Markup Language (XML) to provide a hierarchical structure for relationships between Web resources. This proposal defined the term collection as a grouping of metadata, a property as a description of metadata in the form of a name value pair, and a profile as a contract between the creator and reader of a Web Collection that specifies the properties that the reader can expect.

Anticipated applications of Web Collections included site maps, e-mail threading, scheduling, content labelling and distributed authoring. However, following feedback on the proposal Microsoft is no longer pursuing this proposal.

3.11.3 Channel Definition Format (CDF)

The Channel Definition Format (CDF) provides a mechanism to group Web resources together for various purposes. A Channel is defined as a set of documents or a grouping of content that can be 'pushed', pulled, or operated on as a unit. In today's applications, the types of operations on a channel primarily involve automatic scheduled download for off-line use ('smart pull'), or multicast delivery for later use. However, CDF may also provide the underlying mechanism to facilitate searching, indexing, profiling, filtering, and personalising content independent of the publishing mechanism.

The Microsoft 'Active Channel Technology Overview' says:

CDF (channel definition format) is an XML vocabulary, or XML-based data format, that can be used to organize a set of related Web documents into a logical hierarchy. CDF enables developers to describe the structure and logically present various structured views of their HTML-based sites. Individual Web pages can be described by a CDF file to specify a hierarchy of associated Web pages. Like HTML files, CDF files are structured text made up of various elements, each enclosed within opening and closing tags. CDF files provide an index of the resources available in the channel-a hierarchy of the channel's Web pages-and can include a recommended schedule for when the channel should be updated on the user's computer. Using CDF files, channel developers and end users can schedule content updates, deliver personalized, password-protected content, log page hits, set up Active Channel screen savers, and categorize content. A typical CDF file contains a top-level CHANNEL element to define the channel itself, along with ITEM elements to specify the actual contents of the channel. Subsequent occurrences of the CHANNEL element define subchannels and allow publishers to create a hierarchy for the channel. The TITLE and ABSTRACT elements can be used to describe the contents of each item or channel element. Publishers may also want to use the LOGO element to associate an image with each item in the channel, as well as with the channel itself.

[MSCDF]

The current status of CDF is unknown.

3.11.4 Meta Content Framework (MCF)

The Meta Content Framework [MCF] provides a system for representing a wide range of information about content. The content targeted includes Web pages, gopher and ftp files, desktop files, e-mail and structured (i.e., relational and object oriented) databases, etc. MCF is not intended to be an extension of markup languages such as HTML that can be used to hold embedded metadata. Instead it provides a format for holding the metadata externally to the content described. It is possible that metadata embedded in content will be extracted automatically by robots that use the MCF to represent the results of their activities.

MCF can be used to describe collections of Web resources. MCF was originally developed by Apple Computers Inc., in association with a 3D Web-site visualisation tool known as 'Hot Sauce'.

The current status of MCF is unknown.

3.11.5 Site maps

Web-sites often provide a 'site map', an overview of the main content on a site, as a simple method for end-users to gain access to resources that might otherwise be quite difficult to find. Site maps are typically written in HTML and, because there is no widespread agreement about how these maps should be structured, are not suitable for automatic processing by software (automated agents or Web robots).

Recently there have been several attempts to provide structured site maps using the Resource Description Framework [RDF], including:

3.11.6 WebDAV

The Internet Engineering Task Forse (IETF) WebDAV Working Group [WEBDAV] are developing an architecture that will support distributed authoring and versioning across the Web in a way that is independent of particular client and server implementations. The functional requirements identified by the group operate on a 'resource', where a resource may be a 'network data object or service that can be identified by a URI' or a 'collection' of such objects. They define a collection as:
A collection is a resource that contains other resources, either directly or by reference.

...

A collection is a resource that is a container for other resources, including other collections. A resource may belong to a collection either directly or by reference. If a resource belongs to a collection directly, name space operations like copy, move, and delete applied to the collection also apply to the resource. If a resource belongs to a collection by reference, name space operations applied to the collection affect only the reference, not the resource itself.

[RFC2291]

The group also notes that:

There are many instances where there is not a strong correlation between a URL hierarchy level and the notion of a collection. One example is a server in which the URL hierarchy level maps to a computational process which performs some resolution on the name. In this case, the contents of the URL hierarchy level can vary depending on the input to the computation, and the number of resources accessible via the computation can be very large. It does not make sense to implement a directory feature for such a name space. However, the utility of listing the contents of those URL hierarchy levels which do correspond to collections, such as the large number of HTTP servers which map their name space to a filesystem, argue for the inclusion of this capability, despite not being meaningful in all cases. If listing the contents of a URL hierarchy level does not makes sense for a particular URL, then a '405 Method Not Allowed' status code could be issued.

The ability to create collections to hold related resources supports management of a name space by packaging its members into small, related clusters. The utility of this capability is demonstrated by the broad implementation of directories in recent operating systems. The ability to create a collection also supports the creation of 'Save As...' dialog boxes with 'New Level/Folder/Directory' capability, common in many applications.

[RFC2291]

Andy Powell, UKOLN