Arts and Humanities Data Service Banner AHDS Icon
Dividing Line (Red)

Discovering Online Resources. Unifying Resource Discovery Metadata for the Humanities:
An Application Based Upon the Dublin Core

Paul Miller, Archaeology Data Service (collections@ads.ahds.ac.uk)

Dividing Line (Red)

Contents

  1. Introduction
  2. General issues
  3. Element issues
  4. The Dublin Core element set
  5. Dublin Core elements required in all AHDS records
  6. Fitting it all together
Dividing Line (Red)

1 Introduction

This chapter aims to integrate the findings from each of the resource discovery workshops summarised in Chapter 2. Although written specifically to meet the needs of the Arts and Humanities Data Service, the findings expressed herein are more widely applicable, both as one of the earliest examples of a complete implementation of the Dublin Core (Weibel, this volume), and as a model for an integrated solution to diverse problems from a broad subject base. As with many things, this guidance is but one stage along an evolutionary path, rather than a definitive edict outlining 'the Truth'. As such, the findings outlined below are subject to change in the light of experience gained, both within the AHDS and elsewhere. The World Wide Web site of the Arts and Humanities Data Service (AHDS 1997) will always provide pointers to the most recent information on AHDS implementation of resource discovery procedures, as well as links to catalogues which actually implement these procedures.

1.1 An AHDS catalogue

The specification laid out below is that of a core element set, suitable for implementation across the humanities and beyond. Within the AHDS, each of the five constituent Service Providers has clear resource discovery requirements of a less generic nature, and mechanisms for including these are also addressed, below. It should be reiterated at this point that requirements other than those of resource discovery are beyond the scope of Dublin Core as implemented by AHDS or anyone else, and other mechanisms exist for handling such information. The proposed Warwick Framework (Lagoze et al. 1996) offers an effective means by which resource discovery information (managed within the Dublin Core) might be related to other information forms and to the electronic resource itself.

Before discussing the detail of each Dublin Core element's implementation by the AHDS, it is worth highlighting a few general issues that underpin the whole.

2 General issues

2.1 Cross-domain applicability versus domain-specific resolution

A fundamental conflict underlies the current deliberations of diverse scholarly communities; namely the contrasting needs for an element set capable of adequately describing 'my' subject (whatever that may be) and an element set capable of providing inter-disciplinary interoperability.

Every addition of a discipline/interpretation/subject-specific element, SCHEME or TYPE to the Core serves to make it more effective within the discipline making the change, and consequently less effective everywhere else. A narrow line must be steered between the excesses of either approach, as we neither want a non-interoperable catalogue nor one too generalised to be of use to anyone.

Whilst it rapidly became clear that the implementation of 'pure' Dublin Core (Miller and Gill 1997), without any element qualifiers whatsoever, would result in a core unsuitable for use across the humanities, there is the ever-present danger that creation of an implementation too reliant upon detailed qualification and sub-qualification will prove equally unsuitable, for exactly the opposite reasons.

2.2 Combining the previously uncombined

Many of the resources potentially encompassed by a humanities-spanning catalogue were conceived in isolation, with few envisaged as ever directly interfacing to other resources from the same discipline and fewer still intended for use across the disciplinary divide. Although surmountable, this poses the problem of ensuring that similar terms are similarly understood between domains and that acceptable assumptions in one discipline are equally acceptable elsewhere or else avoided altogether. It is important, for example, that the metadata for an archaeological excavation archive includes somewhere the important (but perhaps 'obvious' to an archaeologist) facts that it is archaeology and an excavation!

2.3 A language for all

To be utilised effectively by non-specialists (whether in a discipline or metadata/cataloguing context), the terminology of Dublin Core must be clear and unambiguous. The diversity of interpretation over many of the elements during the workshop programme suggests this not to be the case at present. AHDS should make generating an unambiguous version of the current definitions a priority, and a first attempt at this is made below. These definitions are being fed back into the wider Dublin Core process and may result in a further iteration of the core definitions. Ultimately there is no reason why each implementation should not have its own definitions of the Dublin Core element set, providing that some central authority takes responsibility for ensuring that the essence of Dublin Core remains unaltered in each case. The addition of numerous SCHEMEs and TYPEs exacerbates the problem of interpretability, and the Core will consequently become less intuitive. Steps should be taken to explain the options clearly and plainly, and to avoid frightening potential users with the apparent morass of jargon-rich options.

It is suggested that any implementation (such as that proposed here) should recommend a small number of preferred SCHEMEs and TYPEs for implementation-wide adoption, with individual user communities obviously also able to add to the list for their own requirements. Where all other issues are equal, a recommended SCHEME or TYPE should always be chosen in preference to any other, in order to maximise interoperability.

2.4 Scope of the Dublin Core

A frequent misapprehension about the Dublin Core is that it is in some way intended as a replacement for existing methods of cataloguing such as the library world's MARC (Library of Congress 1997b) or the Federal Geographic Data Committee's Content Standards for Digital Geospatial Metadata (FGDC 1997). This is not the case. Dublin Core, as has been stressed before, is intended only to facilitate the process of resource discovery. Delivery of more detailed metadata information - or of the data themselves - is outwith the scope of the Core, although may be integrated with it by means of the Warwick Framework model (Lagoze et al. 1996) and existing detailed metadata structures.

A related issue is that of the degree to which an aggregated resource - or collection of resources - should be broken down for description within the catalogue. Taking the holdings of a national museum, for example, should these be described by means of a single entry, or should the museum's holdings be subdivided into major collections, each of which warrants a record of its own? Assuming the latter, further issues arise relating to the manner in which the relationship between museum records and collections records are handled, and these are yet to be fully addressed more broadly than at the level of individual organisations or communities.

3 Element issues

3.1 SCHEMEs and TYPEs

Given the different needs of each potential user community, it is impossible to define a single, prescriptive, list of SCHEMEs and TYPEs for use across all communities, or even across the humanities. In order to ensure interoperability, however, the following steps are recommended for implementation.

3.1.1

SCHEMEs used by individual implementers (such as the five AHDS Service Providers) should be registered with one central repository for each implementation (the AHDS as a whole, for example, is considered as one implementation). This list will hold the name of the SCHEME (Anglo-American Cataloguing Rules, for example), its agreed abbreviation (AACR2), and a reference to where a copy of the SCHEME might be found (a bibliographic citation in this case). For the system to work, agreed abbreviations must be unique, implementation-wide.

3.1.2

Recommended SCHEMEs should be offered by a responsible agency within each implementation. Where all other considerations are equal, individual implementers should use the recommended SCHEME in preference to any other. Where they have reason to use another SCHEME they may, of course, do so, so long as the other guidelines stated herein are observed. Individual implementers may submit additional SCHEMEs for this recommended core list, and their submissions will be reviewed by a 'registry panel' - comprising representatives of each relevant service, plus a small number of external validators - to check for duplication, unnecessary extension of core scope, etc. before being approved and added.

3.1.3

A responsible agency within each implementation should define a list of permissible top-level TYPEs (i.e. DC.creator. personalName, DC.date. lastModified, etc.). Individual implementers may submit additional TYPEs for this top-level list, and their submissions will be reviewed by the registry panel to check for duplication, unnecessary extension of core scope, etc. before being approved and added.

3.1.4

Individual implementers may extend the top-level list of TYPEs as they see fit (DC.creator.personalName.surname) for their own specific requirements, provided that the contents of the element continue to make sense at an implementation-wide level were the additional sub-type to be removed or not fully understood. Within the Arts and Humanities Data Service's implementation, for example, if the Visual Arts Data Service added the sub-type surname for their own use, the sub-type would only be permitted if the Dublin Core element's contents ('Miller', for example) continued to make sense across the AHDS, including in those cases where other Service Providers only used DC.creator.personalName, without the extra sub-type qualification.

4 The Dublin Core element set

Here, each Dublin Core element is listed in turn, along with any pertinent comments drawn from the AHDS/ UKOLN resource discovery workshop reports. Where it is felt that the existing definition is particularly confusing or unsuitable, an AHDS interpretation of the meaning of Dublin Core elements is offered. The existing definitions reproduced here are drawn directly from Stuart Weibel's message to the Dublin Core implementation mailing list, meta2, in December of 1996.

Contents

4.1 DC.title

Current Dublin Core definition:

The name given to the resource by the CREATOR or PUBLISHER.

Proposed AHDS definition:

The name given to this resource by its CREATOR. This name need not necessarily be unique.

This element would appear to be uncontentious across the humanities, simply referring to the given name of the resource, whether this be an archaeological excavation archive, a digital representation of a novel, or whatever.

Where possible, a name authority should be used and identified via a SCHEME code in order to avoid potential multiple occurrences of the same resource. AHDS advocates service-wide use of naming guidance such as that enshrined within the Anglo-American Cataloguing Rules, AACR2 (Gorman and Winkler 1988).

AHDS-wide SCHEMEs

None at present.

AHDS-wide TYPEs

TYPEs useful to an AHDS-wide interpretation of this element provisionally include:

main - The title most commonly associated with the resource. Where no TYPE is specified, main is assumed. A main title is required for every resource.

subtitle - Ancillary title information for the resource. A main title is required before subtitle(s) may be used.

alternate - Where a resource is known by more than one name, or where it has a formal name and a vernacular name, the alternate or vernacular name may be recorded here. A main title recording the full and formal name is required before an alternate title may be given.

series - Where the resource is part of a series (CBA Research Reports, Star Trek, or whatever), the series name may be given here.

4.2 DC.creator

Current Dublin Core definition:

The person(s) or organisation(s) primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources.

Proposed AHDS definition:

The person(s) or organisation(s) responsible for creation of the original resource, its source, surrogates, or metadata pertaining to the above, whose involvement is considered as worthy of inclusion for the purpose of discovering said resource.

The concept of primary intellectual responsibility is difficult to define. The element should be merged with at least DC.contributors, and possibly with the other elements dealing with named individuals. A controlled list of responsibility statements is essential, with generic terminology such as 'project leader' used wherever possible in preference to specific terms such as 'excavation director'. Name authority files such as the Union List of Artists' Names (Getty Information Institute 1997b) should be used wherever possible, and identified with the appropriate SCHEME label.

In order to allow metadata elements relating to one individual or organisation to be grouped, a numeric tag may be assigned to each, and applied as a label to all relevant elements. Thus DC.creator.personalName.1 and DC.creator.postcode.1 refer to the same individual, providing his/her name and postcode respectively. As with most other elements of the Dublin Core as employed by AHDS, these numbers are optional, and need only be used where the cataloguer feels they add clarity to the record. No assumption should be made as to level of involvement based upon these numbers. DC.creator.personalName.1, for example, is not necessarily more important to the resource than DC.creator.personalName.2.

AHDS-wide SCHEMEs

SCHEMEs useful to an AHDS-wide interpretation of this element provisionally include:

ULAN - The Union List of Artists' Names (Getty Information Institute 1997b)

AHDS-wide TYPEs

TYPEs useful to an AHDS-wide interpretation of this element provisionally include:

personalName - The name of an individual associated with creation of the resource.
Where no TYPE is specified, personalName is assumed. A personalName or corporateName is required for every resource.

corporateName - The name of an organisation or corporation associated with creation of the resource. corporateName is useful in situations where no one individual or group of individuals within an organisation is identified with the resource. The Getty Information Institute, for example, might be cited as the creator of the Art and Architecture Thesaurus using this TYPE.

A personalName or corporateName is required for every resource.

affiliation - The organisation with which an individual (identified by means of personalName) is associated for the purposes of dealing with this resource. A personalName is required before affiliation is used, and use of affiliation (defining the institutional affiliation of a named individual) should not be confused with use of corporateName (assigning corporate authorship or responsibility to a resource).

role - The role played by the individual (named with personalName) or institution (named with corporateName) with respect to this resource. Each named unit's actual function may, optionally, be appended to the generic role label, thus:
DC.creator.role = projectLeader.excavationDirector, DC.creator.role = majorContributor.composer, etc. The roles must be selected from this controlled list, to which additions may be proposed:

contact - Details for a contact within the depositing organisation capable of taking notional responsibility for the deposition of the resource. Where no role is specified, contact is assumed. A contact name is required for every resource.

metadata - The individual(s) responsible for providing metadata on this resource for the Service Provider's catalogue. A metadata name is required for every resource.

projectLeader - The individual(s) primarily responsible for preparation of the resource in its present form.

majorContributor - Any individual(s) considered to have made a major contribution to the preparation of the resource in its present form including, perhaps, the creator of any 'original' from which the resource is derived.

otherContributor - Any individual(s) who, whilst not responsible for the resource in a major fashion, are still considered as worthy of inclusion for the purpose of resource discovery.

email - The electronic mail address of the person or organisation in question.

postal - A standard postal address for the individual or organisation in question.
Where possible, this address should be BS 7666 compliant (British Standards Institute 1994).

town - The postal town of the individual or organisation in question.

country - The country in which the individual or organisation in question is located. For the purposes of this service, the United Kingdom should be subdivided into England, Scotland, Wales, etc.

postcode - The postal or Zip code of the individual or organisation in question.

phone - The full international telephone number of the individual or organisation in question.
This number should take the form +country-code local-area-code telephone-number; i.e. +44 1904 433954 rather than 01904 433954 or merely 433954.

fax - The full international facsimile number of the individual or organisation in question.
This number should take the form +country-code local-area-code facsimile-number; i.e. +44 1904 433939 rather than 01904 433939 or merely 433939.

url - A World Wide Web URL for the home page of the individual or organisation in question.

4.3 DC.subject

Current Dublin Core definition:

The topic of the resource, or keywords or phrases that describe the subject or content of the resource. The intent of the specification of this element is to promote the use of controlled vocabularies and keywords. This element might well include scheme-qualified classification data (for example, Library of Congress Classification Numbers or Dewey Decimal numbers) or scheme-qualified controlled vocabularies (such as Medical Subject Headings or Art and Architecture Thesaurus descriptors) as well.

Proposed AHDS definition:

Standard Dublin Core definition is suitable.

The DC.subject element has the potential to be greatly overloaded in day-to-day use of any system, especially in provision of a service such as that envisaged by the AHDS where this element may be perceived as the only suitable repository for discipline-specific terminology of sufficient detail for the subject specialist.

Evident confusion between the role of DC.subject and DC.coverage - and even DC.format and DC.source - is likely to cause problems, and the usage of each of these therefore requires tighter guidance and illustration across the AHDS. It is assumed, even more so than with the majority of other elements, that entries in DC.subject will be qualified by the use of SCHEME tags, identifying controlled terminologies from which values have been drawn.

AHDS-wide SCHEMEs

SCHEMEs useful to an AHDS-wide interpretation of this element provisionally include:

AACR2 - Anglo-American Cataloguing Rules (Gorman and Winkler 1988)

AAT - Art and Architecture Thesaurus (Getty Information Institute 1997a)

DDC - Dewey Decimal Classification (OCLC 1996a, 1996b)

HASSET - Humanities and Social Science Electronic Thesaurus, used by the UK's Data Archive at the University of Essex (Data Archive 1997b)

LCSH - Library of Congress Subject Headings (Library of Congress 1997c)

TGN - Thesaurus of Geographic Names (Harpring 1997)

ULAN - Union List of Artists' Names (Getty Information Institute 1997b)


AHDS-wide TYPEs

None at present.

4.4 DC.description

Current Dublin Core definition:

A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. Future metadata collections might well include computational content description (spectral analysis of a visual resource, for example) that may not be embeddable in current network systems. In such a case this field might contain a link to such a description rather than the description itself.

Proposed AHDS definition:

Standard Dublin Core definition is suitable.

As defined more widely within the Dublin Core community, this element holds a free-text description of the resource in question.

The main issue related to this element is that of length; just how detailed should descriptions be before it is more logical to relegate information to the next more detailed level of metadata for a resource?

Importantly, DC.description should hold sufficient information for the user to reach an informed decision as to whether or not proceeding further with the described resource is likely to prove worthwhile, whilst remaining concise enough for the user to absorb this and other, similar description(s) in a reasonable period of time.

Cataloguers may wish to repeat the DC.description element, with one short description for initial display and a long description for optional detail.

AHDS-wide SCHEMEs

SCHEMEs useful to an AHDS-wide interpretation of this element provisionally include:

URL - Where the descriptive text is stored elsewhere and referred to by means of a URL, the address should be included here in a computer-parseable form.

AHDS-wide TYPEs

TYPEs useful to an AHDS-wide interpretation of this element provisionally include:

long - A full description of the resource, normally provided by means of a URL link to an
external file.

short - A brief description of the resource, normally suitable for display in response to a user query.

4.5 DC.publisher

Current Dublin Core definition:

The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity. The intent of specifying this field is to identify the entity that provides access to the resource.

Proposed AHDS definition:

The entity(s) responsible for facilitating availability of the resource, such as a publisher, a distributor, or a corporate entity. The intent of specifying this element is to identify those entities providing access to the resource or those responsible for facilitating such access to a degree considered relevant for the purposes of resource discovery.

With data accessioned by Service Providers of the Arts and Humanities Data Service, the Service Provider concerned, or the AHDS as a whole, may be considered in some fashion to be 'responsible for making the resource available in its present form'.

Who is the publisher of an electronic resource? An AHDS Service Provider may well be disseminating it although, for the Archaeology Data Service and others, this 'dissemination' may merely take the form of a catalogue entry and a link to someone else's server. Notions of publication might usefully be associated with the management of rights, intellectual property, and 'sweat of the brow', rather than the mere act of providing access; a role which is becoming increasingly easy with the growth of World Wide Web-based resources and access tools.

In order to clarify the role of publication, a controlled list of TYPEs might be proposed for use.

AHDS-wide SCHEMEs

None at present.

AHDS-wide TYPEs

TYPEs useful to an AHDS-wide interpretation of this element provisionally include:

personalName - The name of an individual associated with facilitating access to the resource.
Where no TYPE is specified, personalName is assumed.

corporateName - The name of an organisation or corporation associated with facilitating access to the resource. corporateName is useful in situations where no one individual or group of individuals within an organisation is identified with the resource. The Getty Information Institute, for example, might be cited as the publisher of the Art and Architecture Thesaurus using this TYPE.

affiliation - The organisation with which an individual (identified by means of personalName) is associated for the purposes of dealing with this resource. A personalName is required before affiliation is used, and use of affiliation (defining the institutional affiliation of a named individual) should not be confused with use of corporateName (assigning corporate responsibility to a resource).

role - The role played by the individual (named with personalName) or institution (named with corporateName) with respect to this resource. The roles must be selected from this controlled list, to which additions may be proposed:

disseminator - The individual(s) or organisation(s) responsible for disseminating the resource to users.

email - The electronic mail address of the person or organisation in question.

postal - A standard postal address for the individual or organisation in question. Where possible, this address should be BS 7666 compliant (British Standards Institute 1994).

town - The postal town of the individual or organisation in question.

country - The country in which the individual or organisation in question is located. For the purposes of this service, the United Kingdom should be subdivided into England, Scotland, Wales, etc.

postcode - The postal or Zip code of the individual or organisation in question.

phone - The full international telephone number of the individual or organisation in question.
This number should take the form +country-code local-area-code telephone-number; i.e.
+44 1904 433954 rather than 01904 433954 or merely 433954.

fax - The full international facsimile number of the individual or organisation in question.
This number should take the form +country-code local-area-code facsimile-number; i.e. +44 1904 433939 rather than 01904 433939 or merely 433939.

url - A World Wide Web URL for the home page of the individual or organisation in question.

4.6 DC.contributor

Current Dublin Core definition:

Person(s) or organisation(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specified in the CREATOR element (for example, editors, transcribers, illustrators and convenors).

The function of this element should be merged with DC.creator.

4.7 DC.date

Current Dublin Core definition:

The date the resource was made available in its present form. The recommended best practice is an 8 digit number of the form YYYYMMDD as defined by ANSI X3.30-1985. In this scheme, the date element for the day this is written would be 19961203, or December 3, 1996. Many other schemes are possible, but if used, they should be identified in an unambiguous manner.

Proposed AHDS definition:

Dates associated with the creation and dissemination of the resource. These dates should not be confused with those related to the content of a resource (AD 43, in a database of artefacts from the Roman conquest of Britain) which are dealt with under COVERAGE or its subject (1812, in relation to Tchaikovsky's eponymous overture) which are dealt with under SUBJECT.

The existing definition of date is insufficient for requirements and needs to be extended. Useful dates include creation of the original work, publication of the version of that work later digitised, release date of the electronic version, dates associated with the maintenance of the catalogue record, and dates related to update cycles where holdings represent snapshots of 'live' databases. The recommendation of a national rather than an international SCHEME as default is also, perhaps, unsuitable.

AHDS-wide SCHEMEs

SCHEMEs useful to an AHDS-wide interpretation of this element provisionally include:

ISO8601 - Data elements and interchange formats - Information interchange - Representation of dates and times, ISO 8601:1988(E), International Organisation for Standardisation, June, 1988. Where the option is available, this is the preferred SCHEME for citation of dates across AHDS, and takes the form YYYY or YYYY-MM-DD. It is also capable of handling time if required.

AHDS-wide TYPEs

TYPEs useful to an AHDS-wide interpretation of this element provisionally include:

accessioned - The date this resource was formally entered into the catalogue of an AHDS Service Provider. An accessioned date is required for every resource.

projectStart - The start date of the project resulting in creation of the resource. This is, perhaps, most useful for resources such as an excavation archive, where the project from which the archive results may have run for several years. Where only one project-related date is to be recorded, it should be recorded here rather than in projectEnd.

projectEnd - The completion date of the project resulting in creation of the resource.

lastUpdate - Where the catalogue entry describes a 'snap-shot' from a live database, lastUpdate records the date on which the most recent snap-shot was made available.

nextUpdate - Where the catalogue entry describes a 'snap-shot' from a live database, nextUpdate records the date on which the next snap-shot is expected to be made available.

publicationOriginal - The date on which the resource - in any form - was originally published, where this information is considered relevant to the process of resource discovery.

publicationEdition - The date on which the edition of the resource from which any electronic surrogate arose was published, where this differs from publicationOriginal. For example, although Domesday Book was compiled in 1086, a digital surrogate might be created from an imprint of 1993. Where the resource being described is non-digital, and where the presence of multiple editions prevents publicationOriginal being sufficient, this TYPE is used to record the date that the resource was made available in the form being described.

publicationElectronic - The date on which the resource was made available in its present digital form. Unless otherwise specified, this is assumed to be the default date.

metadataLastModified - The date on which the catalogue record (or 'metadata') was
last changed.

4.8 DC.type

Current Dublin Core definition:

The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. It is expected that RESOURCE TYPE will be chosen from an enumerated list of types.
A preliminary set of such types can be found at the following URL:
http://www.roads.lut.ac.uk/Metadata/DC-ObjectTypes.html

Proposed AHDS definition:

The general form of a resource, such as text, image, etc. Where greater precision is required, the form may be refined hierarchically, for example, text.thesis.

Each of the AHDS workshops found the list of proposed DC.type values (Knight and Hamilton 1997) to be unsatisfactory for their requirements. It was also noted that the proposed list appeared in cases
to conflate different notions of 'type' in a confusing manner.

The new proposal for DC.type taking shape at URL: http://sunsite.Berkeley.EDU/Metadata/types.html appears to better capture the issues raised by AHDS Service Providers, allowing as it does the definition of a relatively small number of broad resource types, as well as greater definition where appropriate. Unless otherwise specified by means of a SCHEME, types are assumed to be drawn from the list at Berkeley.

AHDS-wide SCHEMEs

SCHEMEs useful to an AHDS-wide interpretation of this element provisionally include:

AACR2 - General Material Descriptors, as defined by the Anglo-American Cataloguing Rules (Gorman and Winkler 1988).

AHDS-wide TYPEs

None at present.

4.9 DC.format

Current Dublin Core definition:

The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. The intent of specifying this element is to provide information necessary to allow people or machines to make decisions about the usability of the encoded data
(what hardware and software might be required to display or execute it, for example). As with RESOURCE TYPE, FORMAT will be assigned from enumerated lists such as registered Internet Media Types (MIME types). In principle, formats can include physical media such as books, serials, or other non-electronic media.

Proposed AHDS definition:

Standard Dublin Core definition is suitable.

As well as storing basic IMT-type information (text/ascii), (application/dbf) etc. this element has the potential to be (over?) used to record details such as file size, page length, film playing times, etc.

AHDS-wide SCHEMEs

SCHEMEs useful to an AHDS-wide interpretation of this element provisionally include:

IMT - Internet Media Types, such as text/html, application/postscript, etc.

SI - Système International d'Unités, the internationally agreed set of units for measurement; metres for length, seconds for time, etc.

AHDS-wide TYPEs

TYPEs useful to an AHDS-wide interpretation of this element provisionally include:

fileType - The file format of a digital expression of the resource. fileType must use the IMT SCHEME, is required for every record of a digital resource, and is the default where no TYPE
is specified.

fileSize - The size of a computer file, in Kilobytes.

length - The length of a book-like resource (in pages) if printed. Also the shelf space occupied by an archive (in metres), and the running time of a film or piece of music (in seconds).

medium - The medium on which a resource is held, e.g. Compact Disc, vellum, etc.

4.10 DC.identifier

Current Dublin Core definition:

String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers, such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element.

Proposed AHDS definition:

Standard Dublin Core definition is suitable.

Use of the DC.identifier element is uncontentious, and is required for all records in the collection. Each record will require internal identification as well as any number of identifiers that uniquely define it within its host resource or environment.

AHDS-wide SCHEMEs

SCHEMEs useful to an AHDS-wide interpretation of this element provisionally include:

ADS - Archaeology Data Service internal identification number, uniquely identifying any resource within the ADS catalogue. An ADS identifier is required for every resource in the ADS catalogue, along with at least one other identifier.

HDS - History Data Service/Data Archive internal identification number, uniquely identifying any resource within the HDS/ Data Archive catalogue. An HDS identifier is required for every resource in the HDS catalogue.

ISBN - International Standard Book Number.

ISSN - International Standard Serials Number.

OTA - Oxford Text Archive internal identification number, uniquely identifying any resource within the OTA catalogue. An OTA identifier is required for every resource in the OTA catalogue, along with at least one other identifier.

PADS - Performing Arts Data Service internal identification number, uniquely identifying any resource within the PADS catalogue. A PADS identifier is required for every resource in the PADS catalogue, along with at least one other identifier.

URL - Uniform Resource Locator.

VADS - Visual Arts Data Service internal identification number, uniquely identifying any resource within the VADS catalogue. A VADS identifier is required for every resource in the VADS catalogue, along with at least one other identifier.

AHDS-wide TYPEs

None at present.

4.11 DC.source

Current Dublin Core definition:

The work, either print or electronic, from which this resource is derived, if applicable. For example, an HTML encoding of a Shakespearean sonnet might identify the paper version of the sonnet from which the electronic version was transcribed.

Proposed AHDS definition:

Standard Dublin Core definition is suitable.

This element, like many others, has the potential to be over-used for storing source information of little or no value to the process of resource discovery. Here, as elsewhere, cataloguers should consider whether or not the information they are recording will actually help users to find a resource and evaluate its fitness for purpose.

Digital resources derived from (rather than transcribed from) other resources - whether paper or digital - will be described using DC.relation rather than here.

AHDS-wide SCHEMEs

SCHEMEs useful to an AHDS-wide interpretation of this element provisionally include:

ISBN - International Standard Book Number.

ISSN - International Standard Serials Number.

SP - An identification of the AHDS Service Provider making access to the resource possible.
An SP is required for every resource, and must be selected from this list:

ADS - Archaeology Data Service.

HDS - History Data Service.

OTA - Oxford Text Archive.

PADS - Performing Arts Data Service.

VADS - Visual Arts Data Service.

URL - Uniform Resource Locator.

AHDS-wide TYPEs

None at present.

4.12 DC.language

Current Dublin Core definition:

Language(s) of the intellectual content of the resource. Where practical, the content of this field should coincide with the Z39.53 three character codes for written languages.

See: <URL: http://www.sil.org/sgml/nisoLang3-1994.html>.

Proposed AHDS definition:

Language(s) of the intellectual content of the resource. Where possible, the content of this field should be drawn from the International Standard for defining language, ISO 693.

As with DC.date, it is preferred that this element be specified using an international rather than national standard.

DC.language refers to human-readable language, and is not intended to cover computer languages such as C++, Java, or Pascal.

AHDS-wide SCHEMEs

SCHEMEs useful to an AHDS-wide interpretation of this element provisionally include:

ISO639 - ISO 639, Code for the representation of names of languages.

AHDS-wide TYPEs

None at present.

4.13 DC.relation

Current Dublin Core definition:

Relationship to other resources. The intent of specifying this element is to provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves. For example, images in a document, chapters in a book, or items in a collection. A formal specification of RELATION is currently under development. Users and developers should understand that use of this element should be currently considered experimental.

Proposed AHDS definition:

Standard Dublin Core definition is suitable.

This element is used to define links to obviously related resources where such a relationship is considered as relevant to the process of resource discovery. It is also a core aspect of the method by which hierarchical relationships may be established between individually described elements of a larger collection such as a museum's holdings or a corpus of texts.

Until such time as models capable of handling the more complex aspects of the Warwick Framework concept (Lagoze et al. 1996, Lagoze 1997), this element also offers a means by which different 'packages' of metadata may be crudely associated.

AHDS-wide SCHEMEs

SCHEMEs useful to an AHDS-wide interpretation of this element provisionally include:

ADS - Archaeology Data Service internal identification number, uniquely identifying any resource within the ADS catalogue. A RELATION using the ADS identifier to relate to the top-level catalogue record for the ADS catalogue itself is required for every resource in the ADS catalogue.

HDS - History Data Service internal identification number, uniquely identifying any resource within the HDS catalogue. A RELATION using the HDS identifier to relate to the top-level catalogue record for the HDS catalogue itself is required for every resource in the HDS catalogue.

ISBN - International Standard Book Number.

ISSN - International Standard Serials Number.

OTA - Oxford Text Archive internal identification number, uniquely identifying any resource within the OTA catalogue. A RELATION using the OTA identifier to relate to the top-level catalogue record for the OTA catalogue itself is required for every resource in the OTA catalogue.

PADS - Performing Arts Data Service internal identification number, uniquely identifying
any resource within the PADS catalogue. A RELATION using the PADS identifier to relate to
the top-level catalogue record for the PADS catalogue itself is required for every resource in the PADS catalogue.

URL - Uniform Resource Locator.

VADS - Visual Arts Data Service internal identification number, uniquely identifying any resource within the VADS catalogue. A RELATION using the VADS identifier to relate to the top-level catalogue record for the VADS catalogue itself is required for every resource in the VADS catalogue.

AHDS-wide TYPEs

TYPEs useful to an AHDS-wide interpretation of this element provisionally include:

isParentOf - The resource being described is hierarchically above the resource to which this RELATION points. Although it is possible to record this information, routinely doing so is not to be encouraged, as there is the potential for recording vast quantities of information unnecessarily; the top-level record for a museum, for example, could potentially include isParentOf information for every collection, sub-collection and artefact in the museum!

isChildOf - The resource being described is hierarchically below the resource to which this RELATION points. This is undoubtedly the easiest RELATION to record, and perhaps the most useful. Users should be careful to ensure that they require isChildOf as opposed to isMemberOf, described below.

isMemberOf - The resource being described is hierarchically below the resource to which this RELATION points, and may be considered to be a part of the higher-level resource. Where both are considered to be appropriate, isMemberOf should be used in preference to isChildOf,
as membership implies a parent-child hierarchical relationship whilst the parent-child relationship does not necessarily imply membership.

isSiblingOf - The resource being described is hierarchically adjacent to the resource to which this RELATION points, and is a part of the same collection.

otherMetadata - The content of the DC.relation element when qualified with TYPE otherMetadata provides a computer-parseable identifier for the location of further metadata pertaining to the resource. This might, for example, be the URL for a complete Federal Geographic Data Committee or MARC record on the resource, and provides a simple solution to the implementation of some of the concepts of the Warwick Framework (Lagoze et al. 1996) until more elegant models are routinely possible.

4.14 DC.coverage

Current Dublin Core definition:

The spatial locations and temporal durations characteristic of the resource. Formal specification of COVERAGE is currently under development. Users and developers should understand that use of this element should be currently considered experimental.

Proposed AHDS definition:

The spatial and temporal extent(s) pertaining to the resource. In both cases, COVERAGE relates to the content of the resource, rather than to its collection or management. Likely COVERAGEs include the spatial location [whether a grid reference, place name (e.g. Skara Brae), or more ephemeral locator] and temporal period [whether a date, date range, or period label (e.g. Neolithic)] of the Skara Brae village and exclude the country-of-origin of the film, The English Patient.

Notions of spatial and temporal location are fundamental to much of the arts and humanities and, as such, this element has a core role to play for several AHDS Service Providers. To others, such as the Performing Arts Data Service, spatio-temporal concepts are more localised to such details as the country of origin and running time of a film, and these are better handled outwith this element.

The implementation of DC.coverage outlined here is directly evolved from the wider Dublin Core community's Coverage Working Party, with which the AHDS was involved and, as that working party has not yet been wound up, the definition may still evolve further as experience is gained.

It is vitally important to retain the possibility of defining both spatial and temporal coverage in a variety of fashions, including one or more co-ordinate systems, place names, numeric dates, and temporal periods.

AHDS-wide SCHEMEs

SCHEMEs useful to an AHDS-wide interpretation of this element provisionally include:

AAT - Art and Architecture Thesaurus (Getty Information Institute 1997a), which includes period and cultural labels.

DD - Latitude and Longitude expressed in decimal degrees, in the form DD.XXXX, where XXXX represents decimal portions of a degree. The number is preceded by a minus sign (-) for locations south of the equator or west of Greenwich.

DMS - Latitude and Longitude expressed in degrees, minutes and seconds, in the form DD-MM-SSX, where X is a direction north or south of the Equator and east or west of Greenwich.

ISO8601 - International Organisation for Standardisation's standard 8601, detailing the expression of dates and times.

OS10K - Ordnance Survey of Great Britain 1:10,000 scale map sheet code. For those resources lacking more detailed locational information, the Ordnance Survey's unique map codes will place any feature within a 5 x 5 kilometre box.

NGR - Ordnance Survey of Great Britain National Grid Reference. A full 12-figure (or 13 in northern Scotland) reference based upon the Ordnance Survey grid. References should be fully numeric, with the two-letter 100km square codes converted to leading numbers on the reference, and take the form xxxxxx[.xx] [y]yyyyyy[.yy], where [.xx] and [.yy] allow for the optional expression of a centimetre reference, and [y] is used for locations in the far north of Scotland more than 999999 metres north of the grid origin. Where full metre precision is not available for some reason, zeroes should be added to the end of both easting and northing to create a reference of the correct length. Units of measure for x, y and z are metres.

TGN - Thesaurus of Geographic Names (Harpring 1997).

UTMXX - Universal Transverse Mercator, with XX specifying the appropriate UTM zone.
Units of measure are metres.

AHDS-wide TYPEs

TYPEs useful to an AHDS-wide interpretation of this element provisionally include:

x - The x co-ordinate, or 'easting', element of a grid reference.

min - The minimum value of x.

max - The maximum value of x.

y - The y co-ordinate, or 'northing', element of a grid reference.

min - The minimum value of y.

max - The maximum value of y.

z - The z co-ordinate, or elevation.

min - The minimum value of z.

max - The maximum value of z.

t - A temporal descriptor.

min - The minimum value of t.

max - The maximum value of t.

point - A single point, expressed as a pair of x and y co-ordinates of form xxxxxx yyyyyy.

line - A linear feature, expressed as a string of x and y co-ordinates of form xxxxxx yyyyyy xxxxxx yyyyyy ...

polygon - A polygon, expressed as a string of x and y co-ordinates where the first x,y pair is identical to the last.

exclude - Used in conjunction with include to define 'holes' or gaps in a distribution or feature.

include - Used in conjunction with exclude to define areas where a particular feature or distribution is to be found.

placeName - The name of a place, preferably drawn from a controlled SCHEME such as those outlined above.

This subtype may be further refined, for example, to, allow DC.coverage.placeName.authority = 'name of a Local Authority' or even DC.coverage.placeName.authority.1974.region = 'name of a post-1974 Scottish regional council'.

As can be seen, a great deal of precision can be offered where necessary, but the detail can be removed where less capable search engines are used or where it is less relevant. i.e. the highly precise DC.coverage.placeName.authority.1974.region = Strathclyde can be used and understood by search systems designed to handle such detail, allowing both engine and user to know that 'Strathclyde' is a post-1974 Scottish regional council. Where such detail is either not required or not understood, 'authority.1974.region' may be removed and both user and engine will still know that 'Strathclyde' is a place.

periodName - The name of a temporal period, preferably drawn from a controlled SCHEME such as those outlined above.

locationalPrecision - A measure of the precision with which x and y co-ordinates have been recorded, based upon the same unit of measure as used for the co-ordinates themselves. Where one co-ordinate is less precise than the other, the coarsest is recorded.

Rather than recording exact values (e.g. 23.5), ranges might more usefully be given of the form 1-10 (metres) = locationalPrecision 10 [i.e. precision is better than 10], 10-100 (metres) = locationalPrecision 100, 100-1000 (metres) = locationalPrecision 1000, etc.

4.15 DC.rights

Current Dublin Core definition:

The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way. The intent of specifying this field is to allow providers a means to associate terms and conditions or copyright statements with a resource or collection of resources. No assumptions should be made by users if such a field is empty or not present.

Proposed AHDS definition:

This element is intended to provide information about rights management issues (such as statements of copyright) pertaining to the resource. Detailed statements of rights are expressed by means of a link
(a URL or other suitable URI as appropriate) to the relevant details, but scope is also available for the description of simple classes of rights within the element itself. Although it should always be present, no assumptions should be made by users if such a field is empty or not present.

Under AHDS usage, this element would always contain a link to a general statement of AHDS (or a specific Service Provider's) rights with respect to the resource, as well as (optional) links to similar statements from the data creator and/or depositor, where relevant.

As well as these links to legal statements of control, a simple coding scheme is also being developed whereby users can quickly discover both whether they can access data at all and what they will be allowed to do with it once downloaded. Such a coding scheme need not cover the intricacies of individual data use conditions, but rather is intended as a further aspect of initial resource discovery in that it helps the user to assess a resource's fitness for purpose; the data may appear perfect for their needs, but if they are not allowed access to the resource, there is no point in them proceeding further.

AHDS-wide SCHEMEs

SCHEMEs useful to an AHDS-wide interpretation of this element provisionally include:

URL - A Uniform Resource Locator (URL) for rights management details pertaining to the resource.

AHDS - A basic expression of usage restrictions for the resource. An AHDS code is required for all resources, along with a URL pointing to a generic statement of AHDS rights and, if relevant, a URL for more detailed resource-specific rights definitions. The exact terms and definitions await completion of the AHDS-wide Rights Management Framework, and those offered below will be modified accordingly.

free - The resource may be freely used after user registration, providing that adequate citation is included with any use. See the World Wide Web page providing a statement of rights for this resource to discover any special wordings for the citation, or other conditions.

notProfit - The resource may only be utilised for legitimate not-for-profit purposes. See the World Wide Web page providing a statement of rights for this resource to discover the details of allowable usage and citation.

education - The resource may only be used by members of educational establishments for legitimate not-for-profit purposes. See the World Wide Web page providing a statement of rights for this resource to discover the details of allowable usage and citation.

UK_HE - The resource may only be used by members of the UK's Higher Education community. See the World Wide Web page providing a statement of rights for this resource to discover the details of allowable usage and citation.

restricted - Use of the resource is restricted in some other fashion. See the World Wide Web page providing a statement of rights for this resource to discover the details of allowable usage and citation.

AHDS-wide TYPEs

None at present.

5 Dublin Core elements required in all AHDS records

The following elements are required to be present in some form in all records prepared in line with this implementation of the Dublin Core. Such a requirement deviates from the Dublin Core's precept that all elements remain optional (Weibel, this volume), but is felt to be necessary if efforts in individual subject, media, and curatorial groups are to form a usefully coherent interoperable whole.

DC.title
An entry for DC.title.main is required for all resources. Where a sub-TYPE (main) is not specified, it is assumed to be the default.

DC.creator
Several elements of information are required to be completed within the DC.creator element.
These include either DC.creator.personalName or DC.creator.corporateName, and entries for the roles of both contact and metadata creator (DC.creator.role = contact and DC.creator.role = metadata).

DC.date
The date on which a resource is accessioned into the catalogue (DC.date.accessioned) is required for
all resources.

DC.format
For digital resources, DC.format.fileType is required.

DC.identifier
An identifier for the resource is required to be drawn from an external identification SCHEME, as well as an internal identification number in DC.identifier.ads/hds/ota/pads/vads.

DC.source
All resources must include information on the catalogue provider making access to them possible. In the case of AHDS, this information should use the SP SCHEME.

DC.relation
All resources must include a link to a metadata record describing the Service Provider of whose collection they are a part. This information should be recorded using the internal numbering SCHEME of the relevant Service Provider such as, for the AHDS, ADS, HDS, etc.

DC.rights
All resources must include links to rights management information, both for the collection as a whole and for the resource itself.

6 Fitting it all together

6.1 Element repeatability

As with the wider implementation of Dublin Core internationally, all elements of the Core are repeatable, and may be utilised as often as necessary in order to record the relevant information (Weibel, this volume).

Where a string of keywords, for example, are to be recorded, they may be stored in a single occurrence of the element so long as they are drawn from the same SCHEME. A keyword string comprising terms from multiple SCHEMEs requires the element to be repeated for each SCHEME. For example, five keywords for the DC.subject element all drawn from the Art and Architecture Thesaurus may be recorded in a single entry for DC.subject. Were one of the terms drawn from the Library of Congress Subject Headings, however, it would require a separate DC.subject entry.

6.2 The Warwick Framework

The Warwick Framework (Lagoze et al. 1996) represents a powerful conceptual model for handling both different levels of metadata and the relationship between resource and metadata. The AHDS will continue to observe - and hopefully participate actively in - work in this area until the concepts reach fruition. Until such time, a crude implementation of part of the concept is facilitated by this document's recommendations for the element, DC.relation.

6.3 Monitoring and evaluation

As stated at the beginning of this chapter, the AHDS implementation of Dublin Core espoused here is a step on an evolutionary process towards an effective resource discovery system. As such, the guidelines laid out above will be implemented across the AHDS and monitored by a group drawn from within and without the AHDS in order to evaluate their effectiveness. The model will therefore change with time and experience, with the latest implementation documentation always available on the AHDS World Wide Web site. Comments on this implementation are welcome, both now and in the future, and should be addressed to info@ahds.ac.uk.

A structure such as that recommended, above, is thought to be capable of storing the core elements of metadata required for resource discovery of diverse distributed resources such as those collected by agencies like the AHDS. The structure in itself is effectively implementation-neutral, and equally suitable for insertion into web-based HTML resources (using the <META> tag) or storage apart from the resource in some database configuration. In the next chapter, Greenstein and Murray explore the particular problems faced in implementation across the distributed Service Providers of the Arts and Humanities Data Service, and offer the solution currently under development as a potential model for wider adoption.

Dividing Line (Red)
Return to table of contents

Send comments or questions to info@ahds.ac.uk
Last modified: Monday, 17-Nov-97 16:52:01 GMT by D. Greenstein
URL: http://www.ahds.ac.uk/public/arlist.html


This page was originally part of the Arts and Humanities Data Service (AHDS) Website: http://ahds.ac.uk/public/metadata/disc_05.html
Rescued (courtesy of the Internet Archive) and migrated to the UKOLN Website: 08-Apr-2011; Last updated: 06-May-2011.
The content is identical, but changes have been made to the HTML in an attempt to make it validate, and some links have been updated or deactivated.

Valid XHTML 1.0 Transitional

UKOLN logo