A review of metadata: a survey of current resource description formats
Work Package 3 of Telematics for Research project DESIRE (RE 1004)
Table of Contents
The Inter-university Consortium for Political and Social Research (ICPSR) established a committee in May 1995 to develop a structured standard to describe social science data sets. The committee was a response to a perceived need amongst the social science archive community for an international codebook standard (a codebook generally contains information on the structure, contents, and layout of a datafile or data set).
Information documenting the proposed SGML DTD (Documentation Type Definition) and content for the codebook standard can be found at <URL:http://www.lib.umich.edu/codebook.html>.
The standard is still being formulated, the committee will be meeting in October 1996 to agree on a final draft with the intention that implementations will begin before the end of the year.
The ICPSR is an international organisation with membership from 325 colleges and universities in North America and several hundred institutional members in Australia, Denmark, France, Germany, Great Britain, Hungary, Israel, the Netherlands, Norway, South Africa and Sweden. The codebook committee was established to be representative of all the archives and includes a representative from CESSDA (Council of European Social Science Data Archives), as well as representatives from Canada, Denmark, Norway and Germany. The elements for the codebook were chosen by reviewing a series of guidelines and standards in use by the social science survey, research, archive, and technical communities. The lists below include some of the materials that were examined:
Guidelines that prescribe what the codebook itself should contain (content standards):
Standards that define how to describe the study:
Standards that establish rules for producing records for cataloguing:
Descriptions of codebook elements produced as a by-product of computerised interviewing software:
Standards that establish rules for tagging the contents of the codebook text:
The standard is still in the development phase but the indications are that the initiative has wide support amongst the social science data archives, the ICPSR also hope that data producers and granting agencies will adopt and support the standard.
There are 5 main sections in the proposed structure:
Each of the 5 main sections contain further sub-sections and elements.
The basic bibliographic elements of the data set are described in section 2 Study description under the sub-section Citation:
The description of subject is dealt with in section 2 - Study description under the sub-section Study scope:
The format of the data set is dealt with in section 3 Data files description:
These are provided for in section 2 - Study description under the sub-section Citation:
All administrative information is provided in section 1 - Codebook header. Sub-sections here include:
The source of the data set is provided in section 2 Study description under sub-section Citation, elements include:
This information is provided in Section 2 - Study description under sub-section Data access:
An SGML DTD has been proposed. Codebooks encoded into SGML could also be used for the production of data definition statements for use by statistical analysis software such as SAS or SPSS. There is also a proposal to produce a TEI compliant base tag set.
Details of language can be found in Section 2 Study description:
There are fields for citing bibliographic information about and/or links to related materials and studies.
Full - provides a very rich and comprehensive description of data sets.
There are no specified protocols associated with this format as yet but the committee are looking at the possibilities of using Z39.50.
This is a proposed standard, the developers have applied the DTD to some sample codebooks but they are not in use as yet.
|Next||Table of Contents|