Beyond the Beginning: The Global Digital Library

Previous Paper Next Paper



University of California, United States of America
Executive Director Designate, CNI


This paper highlights some relevant issues as the transition occurs from talking about metadata to implementing it. Particular attention is paid to the possible rôle of the Dublin Core and the requirement for new Web indexing services.


There are already a large number of players in the metadata arena. There is however an inherent difficulty in talking about implementing metadata — namely that it is pointless to talk about it in the abstract, it be in the context of services if some objective is to be served. In workshops and discussions, this lesson has been learned the hard way. A key question is: what metadata is needed to support progress in resource discovery? In this paper, this issue is discussed in some specific contexts.


Some well-established frameworks exist, for example for resource discovery on the internet. This means that it is difficult to introduce changes except through "toy" experimentation. But little is learned through this kind of activity. The need for cataloguing exists only where there is a large number of objects to catalogue: if not, it would be better to look at the objects themselves. At the large end of the scale it is a question of not "drowning" the users; at the smaller end, the problem is to give the user something useful. Strategies are needed to deal with this problem.

There is a current tendency to treat metadata as an elaborate synonym for descriptive cataloguing, descriptive practices for electronic resources or for specific disciplines such as the geospatial or biomedical areas.

In the process of developing metadata, there is however an implicit search for ways to go beyond that which is provided by current descriptive practices. Important areas for this extended rôle include rights management and the integrity and provenance of resources. This involves describing where resources come from, how they have been modified and includes aspects such as digital signatures, within a complex infrastructure.

Major questions currently being asked include those about the metadata needed to support filtering, often defined as parental controls, for example approved sex and violence levels and rating systems which are usable in the context of collaborative filtering applications. The processes involved are broader than resource discovery and retrieval. THE DUBLIN CORE: PROGRESS AND RELATED ACTIVITY

Meanwhile the Dublin Core process is continuing with various drafts available. The most recent workshop (March 1997) was held in Canberra, Australia. The Dublin Core is consensual because it provides a very simple language for document-like objects. However, many communities want to extend this into other kinds of descriptive metadata. Recent activity has been concerned with working toward this goal by extending the minimum in the Dublin Core. The "Dublin Core 2" meeting placed the Core in a broader context, that of different kinds of metadata in the service of different types of functions. A framework is planned to follow in the next year and work is proceeding on this. There may be a "Dublin 5" meeting in Europe in the autumn or winter of 1997, although plans have not yet been finalised.

The United States National Science Foundation recently sponsored a meeting on terms and conditions languages, a report of which is available on the NASDA Web pages. The area is difficult one because of the large number of proprietary interests. If it operates within a framework of trust systems, who — if anyone — is allowed to exploit outside the trust system?

There is a substantial ongoing discussion under the W3C/Dublin Core about extensions to PICS (Platform for Internet Content Selection) that can carry Dublin Core information in addition to a ratings system. PICS support is being made available in Microsoft Explorer and possibly in Netscape, offering hope that this route may offer the support needed in browsers.


In the resource discovery area, there is also growing awareness among the internet user community of the limitations of Web indexing services such as AltaVista, which were greeted enthusiastically when they first appeared (especially as a contrast to the complete absence of indexing which existed previously). The need is becoming recognised to augment these systems with additional metadata rather than rely on computational text indexing, stop-list handling and a few other features.

These services do not provide comprehensive indexing of all sites they cover; some provide only very superficial indexing. There is a need to move to a different architecture for gathering material, for example harvesting systems. However, a "chicken and egg" situation exists. These systems may be used in corporate Intranets where efficiency is a recognised issue, but they are far less available on the wider internet.

It is possible that the community represented at this conference, as it is concerned with higher quality networked information and retrieval, should develop its own Web indexing system, thereby providing an incentive for information providers to carry some metadata in their objects.

Some computed metadata services are now appearing for objects, for example indexing the colour and texture of objects and covering some fairly large sets. These will probably appear on the "commodity Web" during the next 6 months. It is possible to discover images containing a lot of yellow on a green background and to identify people, flowers and horses. The question is: how to make this accessible to a broad user community?

Finally, in the context of Web indexing, there are currently two Webs in existence: visible pages that can be indexed; and an entire "subterranean Web" of databases etc., which needs to integrate metadata with static page indexing to describe these resources. For example, LEXIS has 15 static pages describing how to sign up, help facilities and other relatively superficial aspects. The need to get deeper access to networked information resources is one of the main driving forces for metadata to describe them.


Some difficult problems remain to be explored. There has not yet been sufficient progress on developing metadata about aggregated information, for example how to describe a database. A further gap is the integration of evaluative information: an intellectual framework beyond simple parental control binary applications is required. Relationships among digital objects remain a very open question. There is also a need for work on uniform resource names.

[64] This account was drafted for this report by The Marc Fresko Consultancy. It is based on notes taken during the presentation.

Previous Paper Next Paper