LinkingDataAndPapers

From DigiRepWiki

Linking data and papers

This page is a forum for discussion about the best way forward in terms of linking data and papers, and what JISC might usefully fund in this area in 07-08 (limited funds available!).


Relevant projects / people (can be expanded)

  • Neil Jacobs - JISC
  • Graham Pryor - StORe
  • John MacColl - StORe
  • Sam Pepler - CLADDIER
  • Simon Coles - R4L
  • Alan Tonge - SPECTRa
  • someone from DCC

Notes from a telcon with StORe and CLADDIER, 20070523

Following up StORe

  • StORe middleware should be trialled in another domain beyond social science (UKDA used as source repository for initial pilot demonstrator).
  • It should also be tested with a publisher (output) repository to reflect the interest expressed by researchers during the StORe survey in links to source data from published papers. A learned society publisher could prove more amenable and easier to address than a large and purely commercial publishing house.
    • Suggested second domain - chemistry/crystallography, where headway has already been achieved in linking publications and data and where the key presence is a learned society publisher.
  • It should also be of benefit to trial the middleware, which functions within federations of repositories, where there are established federations.
    • Suggestions - IRI Scotland; Ecrystals federation (eBank phase 3); OJIMS (but must not compromise that project)
  • Some join-up is required with data curation principles, taking the opportunity to provide researchers with a basic guide to the essentials of data curation, complementary to the introduction of a bi-directional link. This would include an explanation of 'how to' and 'why' they should ensure their research data is curated. Obvious synergies here with DCC R&R (especially SCARP) suggest some kind of joint activity/liaison.


Following up CLADDIER

  • OJIMS project, exploring feasibility of a data journal with RMS
  • Having datasets listed alongside publications in NORA gateway
  • Ping architecture - still in development - see Brian Matthews

Other points

  • the ORE work may define standards for link syntax
  • it seems there are a number of ways to get datasets and papers linked, these are not mutually exclusive:
    • StORe middleware as a relatively independent Web 2.0 layer introduced when researchers are depositing datasets
    • OJIMS approach of publishing datasets, therefore rendering them citeable and we can just use existing citation tools, DOIs, etc
    • Enhancing researchers' authoring environments to ensure their papers contain well-structured reference lists including references to datasets
    • Using the CLADDIER ping architecture perhaps
  • We don't know yet which approach(es) will be acceptable within researchers' workflows, need to experiment, but what form should the experiments take?

Discussion

please add your thoughts...

From NJ- It occurs to me that there are implications re Federated Access Management (Shibboleth) in what you might call the "StORe/CLADDIER programme of work" outlined above and to be discussed on 28/9. Datasets are often not freely and openly available in the same way as are (some) research papers. Therefore, for some scenarios, you'd need an access control step in going from the latter to the former. I wonder whether you have any thoughts on that?

...........

At the UKDA we protect the download of data to registered users. The registration process involves authentication, at present via Athens but later this year via Shibboleth. The additional information is primarily for two reasons a) To inform funders and b) to inform depositors. Some depositors only deposit data at the UKDA on the understanding that the precise usages of their data are recorded.

The StORe demonstrator is best viewed as a front-end to an IR being used as a collaborative working space for a researcher's project. The underlying Fedora system's protection is based on the notion of private and public status for collections (the StORe working space), folders, items and files. The metadata containing the linkages between data and publications is associated with elements at all of these 4 levels. Hence when data is moved to a source repository the link to publications is maintained and when publications are moved to a output repository the link to the data is also maintained.

The Store extension will make the system more like a prototype with authentication through Athens and Shibboleth rather than allocated login names and passwords, contextual help and increased functionality. Also an investigation will be made into ways in which the middleware can be separated from the Fedora IR system and work with other systems such as Dspace, ePrints, etc.

In-house the UKDA is developing a self-deposit archive based on the Store software that links via OAI to the ESRC Society Today and hence can automatically link the outputs recorded there to the data submitted to the UKDA as part of the "End of Award" process. This also means data not accepted by UKDA will not be lost and the process of deposit for those that are speeded up and made less arduous.

Ken Miller 14/09/07

...........



Also from NJ- I contacted Paula Callan at QUT in Australia about a paper she recently gave on the "researcher / librarian nexus". Unfortunately her presentation is not written up, but her reply is perhaps useful for this discussion:

"My feeling is that, while our institutional repository could never become the main storage location for all our research data, it should be included in the range of storage options on offer. It is able to accommodate data files, provided they are completed and not huge in size. As the metadata will often be quite complex, it may have to be gathered 'by consultation' rather than by simple self-deposit.

As our research data will be stored in many locations (local and external) what we really need is a system for tracking where it is. The institutional repository is the obvious candidate for this task as it could also store all the relevant metadata (provenance,descriptive, rights, presevation etc). It could link the data (wherever it is)to related publications stored in the eprint repository. International, national and disciplinary data discovery services could harvest the metadata from the IR. This is a personal vision only at this stage  :-)

Here at QUT, we are about to commence a project that will progress the development of a University Data Management Policy. While the policy will probably spell out the principles underlying best practice in data management, the supporting materials will need to include tools such as templates for a data management plans, information about storage options, preservation, open access, copyright, IP, privacy and exemplar clauses about data that could included in collaborative research agreements etc etc

We will be commencing the process by surveying and interviewing many of researchers about their current practices in terms of managing their research data. We will also be asking what sort of assistance, information and training they would interested in.

The OAK law Report on eResearch is full of useful information and will certainly be a resource we will draw upon. http://www.oaklaw.qut.edu.au/node/33 "


Notes from meeting, HEFCE Offices, 20070928. Brian Matthews, John MacColl, Neil Jacobs

StORe basically manages the deposit process of both data and papers, to establish links between them as they are combined in 'collections'. Some reference to OAI-ORE is likely in expressing these collections, once the ORE specifications are available. StORe is therefore concerned with changes to researcher behaviour associated with deposit. StORe takes a science process PoV.

CLADDIER is complementary, inasmuch as its trackback tool enables repositories, acting as a P2P federation, to update each other in real time on what papers / datasets cite other papers / datasets, based on information either manually entered in the record of the citing object, or (eventually) automatically parsed from the reference lists in research papers. Claddier takes a repositories PoV.

These two tools, together with others (Citebase perhaps) might together make up a toolset for repositories to enable cross-reference between objects.

Taking things forward:

  • Both StORe and CLADDIER teams are involved in the ecrystals federation project recently funded by JISC.

JISC has £100k to be spent taking forward work in this area, which we discussed and agreed the following (2-3):

  • Implementing the CLADDIER TrackBack tool in a social science federation, to include UKDA and LSE's IR, and possibly ESRC Society Today. This work may involve the following workpackages
  1. further work to strengthen the Trackback protocol for this purpose (eg against spam, retraction)
  2. implementing the Trackback functionality as one or more EPrints 3.0 plugins
  3. requirements and scenarios for Trackback within Social Science
    1. implementing the EPrints Trackback at LSE EPrints IR as a reference implementation
    2. implementing the Trackback functionality at UKDA
    3. discussing the approach with ESRC Society Today
  4. merging StORe and Claddier approach within STFC process - to support data ingest - possibly on secondary data
  5. reporting on lessons learned.

Note: LSE, UKDA and ESRC have not been contacted about this work. It is also noted that this is essentially preparatory work, and that testing CLADDIER Trackback in a social science domain will need further support beyond this period. JISC cannot commit to provide this support. Brian Matthews will contact Ken Miller to arrange meeting.

Should there be a link with SWORD on deposit APIs and Atom?

  • Implementing the StORe middlware in an STFC or associated environment. Brian Matthews will investigate the options for this and will report back.
  • A report reviewing the experiences of StORe, CLADDIER, the work described above and perhaps other relevant activity (OAI-ORE?), to then describe a possible repository toolset (including versions of Trackback and StORe m/w) for cross reference, noting what might be useful (for particular research communities), viable, sustainable and achievable. Neil Jacobs will ask for JISC authorisation for a further £15 for this report, which however will be rolled into a single project proposal with (2-3) above.

Neil Jacobs, mods Brian Matthews