4th DELOS Summer School on Preservation in Digital Libraries, Grand Hotel Continental, Tirrénia, Pisa, Italy, 8-13 June 2008

Thursday, 12 June, 2008: Current and emerging scientific data curation practices

Session led by: Michael Day, UKOLN, University of Bath

UKOLN logo | Link to home page

 -

Exercise

Pick two or three of the following brief scenarios to help explore the requirements of researcher team in 2028 trying to make use of a curated dataset. All of the scenarios broadly follow the same pattern, a research team finds a potentially useful data resource and they need to understand certain aspects of it before they can start integrating it into their research workflow. There is no need for specific knowledge of the research domains, just to think about what might be the most appropriate “significant properties” from the point of view of that particular research team.

Given more time, you could do this better using tools like PLATO, but for the purpose of this exercise, you need only evaluate the relative importance of the following high-level categories (mostly taken from Andrew Wilson's Significant Properties Report, v. 2 (AHDS, 2007), p. 8:

For each of these categories you should first discuss what might be essential for the 2028 research project. After your discussion, you should then rank the categories by giving them a score out of ten. So if, for example, you think that Appearance (look and feel) is really quite important for Scenario 1, you could give it nine or ten.

Scenario 1: A research project in 2028 is trying to explore how the first generation Internet was used by European political parties in the 1990s to promote citizen participation in policy formation. The investigators know that a large amount of Web material from this period is held by an organisation called the Internet Archive, and they have begun to use data mining tools to explore the extent of their holdings. What will they need to know about the collection in order to be able to do their work properly?

Scenario 2: Art curators in 2028 are trying to put together an exhibition of digital art in a public gallery. They have found that a university art department retains a collection of digital art resources (chiefly multimedia) produced by their undergraduate students between 2000-2005, some of which have gone on to become extremely important figures in the art establishment. When evaluating the collection for use in the exhibition, what would they consider to be the most important object characteristics?

Scenario 3: Healthcare researchers in 2028 are trying to trace the historical incidence certain lung abnormalities and have access to a massive database of medical images (X-rays) that they intend to submit to the most up to date content-based image retrieval techniques. The database is made up of imaging output from more than one hospital and the researchers are worried that certain parameters essential to their research (e.g., the age and sex of patient, imaging dates, etc.) may be missing. What else need they know about the database before they can start running their search algorithms?

Scenario 4: A research project in 2028 is trying to find links between climate records and biological species diversity in south-west England. The principal investigator has found a promising dataset of geographically-relevant biodiversity information in a local history museum. What more does she need to know about this dataset before she can get her team to try to integrate this dataset (and others like it) with historical climate models?

[The presentation slides from this session are linked from: http://www.ukoln.ac.uk/preservation/presentations/2008/delos-summer-school/]

[UKOLN Preservation Presentations] [UKOLN]