OAI4 informal report

CERN workshop on Innovations in Scholarly Communication (OAI4)

20-22 October 2005

The following is my informal trip report from OAI4:

NB: Abstracts, slides for all presentations and webcasts for many are available from the 'agenda' section of the web site (http://oai4.web.cern.ch/OAI4/)

The event was divided into three themes: Technical (Thursday), Library and publishing (Friday), and Supporting the research process (Saturday AM).

Thursday 20th October: Tutorials and technical session

Thursday opened with four parallel tutorials. I attended 'Advocacy and IPR', convened by Morag Greig from the University of Glasgow. This tutorial covered many of the advocacy/IPR issues discussed at the DRP meeting in Glasgow (10th October), such as mandating deposit, author reactions, success factors and copyright. The tutorial involved two group sessions where we discussed specific issues. These sessions were interesting as an opportunity to speak to people from different countries and to hear about international repository work.

The remainder of the day was taken up with technical presentations. Briefly, these were the topics covered:

Herbert Van de Sompel, LANL: extending OAI-PMH for content as well as metadata, including simple and complex digital objects, using MPEG-21 DIDL.

Michael Nelson, Old Dominion University: mod-oai, an Apache software module expose content accessible from Apache Web servers via OAI-PMH; for web harvesting.

Simeon Warner, Cornell: OAI-rights - expressing rights information for metadata in OAI-PMH.

Stu Weibel, OCLC: a layered analysis of the characteristics of identifier systems, covering functional, technological, policy, business and social. Looked at the positives and negatives of http and non-http URIs.

Herbert Van de Sompel: interoperability across repositories using the aDore digital library architecture; federation of heterogeneous repositories using existing tools and standards, including OpenURL technology.

Eric Morgan, Ockham Library Network: demonstrated the use of lightweight OSS tools with OAI-PMH to deliver a range of collections and services. These included alerting services, enhanced search results, digital library service registry and a harvest-to-query service. Some of the tools mentioned were SRW, SRU and Z39.50 for searching; swish-e, plucene and zebra for indexing; and the aspell spellchecker and Princeton's Wordnet synonym tool. This presentation included a live demo, a welcome respite from powerpoint.

Stu Weibel, on behalf of Jeff Young: WikiD, a wiki that supports structured data, combining OSS open-standard protocols, OSS database, a set of "bootstrap collections" and XSL stylesheets to render protocol responses into HTML. The OCLC Dewey browser uses wikiD.

Johan Bollen, LANL: a system to determine impact and prestige rankings on the basis of aggregated usage event data as OpenURL ContextObjects. This usage data is harvested using OAI-PMH and, when compared to the Institute for Scientific Information's Impact Factor, the resulting impact rankings correlate significantly. The scope of this usage data, though, offers a basis for a more comprehensive assessment of scholarly impact.

Tim Brody, Southampton: using OpenURL for citations, demonstrating current work to use the <dc:relation> element for OpenURL citations. Tim presented three scenarios based on this to show how an OpenURL resolver can be used to link indirectly and directly to the cited resource.

Friday 21st October: Library and publishing session

Alma Swann from Key Perspectives Ltd looked at publishers and open access, covering the gold, the gold-ish, the greens and the going grey. She also widened her scope to repositories and data. Andrea Powell from CABInternational made a case for the continuing place of secondary databases, but did admit there were logistical concerns about OA sources, including versions, permanence, tracking changes and copyright issues. Wilma Mossink talked about the JISC-SURF copyright toolkit.

For the DRP programme and repositories generally, there were sessions from Frances Shipsey (Versions) and Bill Hubbard (OpenDOAR), both very good. Bill's was more general as the project is in it's very early stages but Frances talked in more depth about the VERSIONS project and her talk tied in very nicely to the focus of the day. It was also interesting to note how wide an impact the Sherpa Romeo work has had. Jessie Hey presented on the University of Southampton's repository development and Leo Waaijers talked about the DARE project and the Cream of Science initiative in the Netherlands. Susanne Dobratz and Frank Scholze talked about the DINI certification scheme which hopes to tackle the fragmented repository landscape in Germany.

The last presentation of the day was from Marlon Domingus from the Open Access Leiden project who looked at the idea of a "collabatory" - community approaches to science and scholarship.

Saturday 22nd October: Supporting the research process

Three presentations this morning looked at different branches of science and research data. Peter Murray-Rust looked at how publication ignores core research data. For chemistry, 80% of data is never published and few publishers support the publication of semantically-rich data. Some tools were demonstrated for finding and exploiting chemistry information within text, including the Oscar java tool to parse chemical information in text and a project with Nature to mark-up chemical information and to link to molecular details. He also mentioned SPECTRa and their proposed use of tools for the ingestion of chemical data into repositories. Liz Lyon talked about eResearch and eBank. Hans Pfeiffenberger talked about earth system science and its data-centric output, e.g. observational data. This data needs to be managed, but there is currently no reward for doing this. The ISO 19115 metadata standard for georeferenced data has around 1000 attributes, so is very complex. He mentioned the Pangaea digital library which uses OAI-PMH. He also advocated the eduPerson object class specification and discussed the range of objects needing identifiers, including publications, people, data and experiments/projects.

The final presentation was by Marko Rodriguez from LANL and centred on using OAI-PMH to develop an automatic peer review mechanism, where a user submits a paper and an algorithm determines the best person to peer review that paper. A schema for encoding this information was proposed (<pr:review></pr:review>). An interesting concept.

The keynote was by Robert Terry of the Wellcome Trust. A few points from this:

  • Part of the mission of the funder is to ensure access to is it provided. This information must also be preserved.
  • Problem in current publication model: researchers give away their IPR to publishers for free; publishers *sell* that back to the library; researchers get access to it for free. Money is leaving the research environment and no money goes to the peer reviewers.
  • "Under one roof" - keeping material together, i.e. subject-based.
  • "It's all about the data" - need to be able to get from the data to the literature and from the literature to the data [PubMed Central already does this].
  • Wellcome Trust provide funding to cover cost of OA publishing; they require electronic copies of all Trust-funded research papers in a peer-reviewed journal or in PubMed Central.
  • Is OA the death of publishing? free access can have positive impacts, e.g. televised football and increased gates.
  • All research is funded at some level; therefore the dissemination of that research should be included in the way that research is paid for.

Overall OAI4 was a good opportunity to gain a better insight into the scholarly publishing process, the OA movement and the role of repositories. The technical presentations showed a real enthusiasm for OAI-PMH, with much work to extend it's use going on, particularly at Los Alamos. There was some reference to the lack of humanities publishing and source data in the programme - this came through in questions to the speakers, and also in the breakout group attended on the Friday, entitled 'Our authors are central. Populating repositories and building on partnerships between libraries and researchers. Economists Online as a case in point'. The lower cost of humanities journals and the reduced need for 'current' information were cited as reasons for this, although much of the science-centred discussion could apply to humanities disciplines.

Repositories were very much central to the entire workshop, being mentioned in every presentation.

--JulieAllinson 16:43, 6 December 2005 (GMT)