Deposit API

From DigiRepWiki

The JISC Digital Repository Programme as part of its work is exploring the interaction between repositories and other systems.

Contents

Background

At the recent JISC/CETIS conference [1], the Repositories Themed Workshop [2] included discussion of the need for a common remote Deposit API [3]. A summary of the discussion of the Deposit API follows:

By and large developers tend not to be creating repository systems and software from scratch, rather they are considering how repositories interface with other applications within institutions and the wider information landscape. A single repository, or multiple repositories, might interact with other framework components, such as VLEs, authoring tools, packaging tools, name authority services, classification services and research systems.

There are a wide range of existing specifications and bindings relevant to a deposit service. These include following, though there are probably others:

  • PANS
  • Etc.

It would be beneficial to reach consensus between developers as to which default standard to use for a remote Deposit API.

Resources may be deposited in a repository by both human and software agents, e.g. packaging tools that push content into repositories. The type of resource being deposited will also influence the choice of API. If the resources are complex packaged objects then the API will need to support multiple packaging standards.

In Andy Powell’s recent briefing paper A 'service oriented' view of the JISC Information Environment [4] he considers the set of 'service components' shown in the JISC IE architecture diagram, extrapolating a list of the 'abstract services' that are expected to be offered and then listing a set of candidate 'service bindings' (that correspond with the protocols and standards listed in the JISC IE technical standards document.

The following is the extract from that briefing paper relevant to the 'Repository' as a service component:

Service component (service consumer)

Repository

Stores, manages and makes available content and metadata

Deposit interface

Delete interface

Search interface

Harvest interface

Obtain interface

Abstract services

This section briefly describes each of the abstract services listed above. Each abstract service is described in terms of its overall function, the intelligence it requires (i.e. what business entities does it need to know about) and its inputs and outputs:

Deposit interface

Provides an interface through which content and metadata (possibly in the form of a ‘complex object’, i.e. a package) can be deposited and initiates ingest process for local storage.

Intelligence: Data format, packaging standard.

Data in: Deposit request (content, metadata, etc.)

Data out: Deposit status (success, failure, pending, etc.) and content identifier

Note that the subsequent ingest process may include both automated andmanual procedures including format checking, editorial control, quality assurance mechanisms, etc.

Survey of Repository Software Developers

A number of repository developers were contacted in January 2006 to get feedback on their plans for a remote Deposit API within their products. The replies are summarised below. Developers were asked whether they would be willing to collaborate on defining a common interface to Deposit, all answers were positive and the intention is to organise a meeting of developers in February 2006 to take this forward.

EPrints Les Carr

A Web Services interface to ePrints will be released early 2006 that provides all the facilities of "remote deposit" (create a new eprint, change all the metadata, upload a document, submit the new eprint to the editor). If there is convergence on a standard for deposit interface following the planned discussion then this could be adopted either (a) as core functionality or (b) as a plugin library.

DSpace Jim Downing

Although DSpace does not currently offer a remote deposit API (there is a simple local deposit API), work is being done in this area by the CWSpace project at MIT/ CARET (Cambridge). The best entry point for the information they've published is http://wiki.dspace.org/LightweightNetworkInterface. There is interest in investigating how far the WebDAV standard could be used to standardize repository APIs. Within DSpace there is interest in pursuing the development of repository interoperability standards within JISC, especially since repository interoperation is likely to be an important aspect of the JISC funded SPECTRa project which uses DSpace.

Intrallect Martin Morrey

Intrallect does not currently offer a remote Deposit API although workflow functionality through our Java API has recently been exposed. Development of a remote Deposit API is imminent in our current release schedule and there is interest in working with other developers towards a common specification. A recommended approach is to review the existing specifications, to adopt or create a profile of one, that all implementers would be happy implementing.

A comprehensive set of use cases for automated deposit would be very helpful in making this decision

Intrallect customers who are keen to use a deposit API include: - JORUM - LORE at the University of Edinburgh - The Spoken Word Service at Glasgow Caledonian - LOREnet in the Netherlands

ARNO Thomas Place

Tilburg University is working on ZiNG Update (SRW Update). Our code is integrated in the YAZ-library of Index Data and is available since version 2.1.10 of YAZ. We use YAZ as a protocol layer on top of Oracle. This gives us a machine interface for depositing (ZiNG Update) and searching (SRU/SRW) XML documents with Oracle. ARNO is also an Oracle application. Besides using OAI-PMH for collecting metadata from ARNO, it is also possible to search the metadata via SRU/SRW (and Z39.50). This last feature is only used by Tilburg University and is not yet part of the ARNO distribution. The next step is to use ZiNG/SRW Update for depositing metadata+files packaged in XML. In a proposal for an European project we have included this idea.

Fedora Sandy Payette

Fedora is interested in exploring solutions for a common deposit API. The Pathways project (Los Alamos National Laboratory, Cornell University HP Labs) is working on common dissemination interfaces/protocols, this work is looking at the access side of things. The JISC effort is complementary since we need to understand interoperability among heterogeneous repositories at both the front end (deposit) and back end (access/dissemination).

The Fedora Management web service has SOAP-based operations to ingest digital objects in different XML wrapper formats (METS, FOXML, and in the future MPEG21-DIDL). This same web service has other SOAP-based operations to add datastream content to an object that is already in the Fedora repository.In addition, Fedora also a separate "Directory Ingest" service that runs as a web application.This service will accept a zip file that contains a hierarchical directory of files along with a METS manifest file. The service will open the zip file and call the Fedora Management web service to ingest each file as a digitial object, preserving the hierarchical directory relationships.

Since we already have a remote Deposit API, our goal is to focus on more efficient bulk loading of large archives.We have plans to enhance the Fedora Management API to be able to directly accept a compressed archive.

Fedora would be willing to work with other developers towards a common specification for a Deposit API.

Greenstone Ian Witten

The Greenstone software is not repository management software so much as a general digital library system. However, it has facilities for document deposit, and some are using it as a repository system. Greenstone would be very interested in learning the outcome of your proposal for a standard Deposit API, but given our remote location in New Zealand it is probably impractical for us to be involved with meetings. We are in general very keen on adapting Greenstone to fit new and existing standards and would pobably implement any standard Deposit API that was developed.

HarvestRoad John Townsend

HarvestRoad has a Java external API which can be used with Hive. HarvestRoad are also working with the OKI people at MIT to develop OKI OSID interfaces to Hive. Being a high level service definition, it seems ideal for the purposes described in the Repository Reference Models report from the JISC CETIS Conference. “It would shield implementers from needing to know all the different interfaces used by different repositories (some would use their private SOAP delivered APIs, some Java delivered JSR-170 etc) but the OSID sits atop those layers".

aDORe - LANL Herbert Van de Sompel

Thanks for getting in touch! I was not aware of this effort, and it is indeed very good if we can align. I may get back to you at some point to ask you to present a few ideas about the API you envision. I hope API in the Web-service sense is meant, not in the programming language sense, because in our meeting, the focus is clearly on protocol-based repository interfaces (REST-full and/or SOAP-based Web services).

OCLC Digital Archive Leah Houser

A Web Services interface for the OCLC Digital Archive will be in use for internal clients in Spring 2006. Content transfer for this service will be WebDAV based. This interface will be the 3rd ingest method for the Digital Archive, the others being traditional batch ingest based on METS and embedded packaging of web-harvested information.

Scenarios and Use Cases

We would like to build up a collection of scenarios and use cases for Deposit API.

In order to establish the functional requirements for repositories, use cases and scenarios provide useful information on how people and systems interact with repositories. For example a single repository, or multiple repositories, might interact with VLEs, authoring tools, packaging tools, and research systems.

This DigiRep wiki is set up to support gathering of scenarios and use cases from JISC projects. We would also welcome links to scenarios and use cases located elsewhere. Anyone wishing to submit Scenarios and Uses for Deposit, or to provides links to scenarios andd use cases documented elsewhere, should submit via the Scenarios and Use Cases page.

Meetings

Deposit API meeting London

JISC/UKOLN Repository Deposit meeting, 27th February 2006

Report from this meeting

Report from this meeting (as a word document): JISC UKOLN Repository Deposit meeting (84KB, MIME type: application/msword)

Deposit API meeting Warwick

JISC Deposit API Working Group, Warwick Conference Centre, Coventry, 11-12th July 2006

Agenda
Minutes

US Interoperability Workshop

Augmenting interoperability across scholarly repositories, New York, April 20-21 2006 [5] Informal report 5 May 2006 Rachel Heery

A meeting sponsored and supported by Microsoft, the Andrew W. Mellon Foundation, the Coalition for Networked Information, the Digital Library Federation, and JISC.

There were 30 plus attendees gathered at the Mellon Foundation in New York from a variety of backgrounds: repository software developers, content providers, digital library technical experts as well as representatives of funding organisations.

The meeting focused on facilitating richer cross- repository services and scholarly communication workflows across repositories.

There were a number of presentations on key aspects of functionality to enable inter-repository working. Prior to the meeting a ‘strawman’ proposal for enabling interoperability between repositories was put together by a group including Carl Lagoze, Andy Powell, Jeremy Frumkin and myself led by Herbert Van de Sompel.

Herbert introduced the strawman proposal by suggesting that there needs to abstract definitions of repository interfaces that can be instantiated on the basis of various technologies as time goes by.

He proposed:

  • An appropriate data model to be supported across repositories
  • Three core repository interfaces supported across repositories: Obtain, Harvest, Put
  • A Surrogate format (a representation of the digital object compliant with the data model) supported across the repository interfaces
  • Some shared infrastructure: a service registry, semantic ontologies, format registries,etc.

Carl Lagoze’s presentation on the interoperable data model proposed that full asset transfer between repositories is not always necessary, and can be undesirable because of IPR, file size etc. The proposed data model requires expression of lineage, identity and semantics of the digital object. A surrogate was defined as a serialized representation of a digital object according to the data model which could be

  • Accessed via obtain and harvest
  • Deposited via put

Herbert and Carl demonstrated a prototype that serialized the model using RDF/XML. They proposed that serializations in other formats would also be possible e.g.,MPEG DIDL, METs

Andy Powell went on to consider Harvest functionality, Herbert to consider Obtain functionality which enables richer services to be negotiated, and I talked about ‘put’ functionality. There was quite a bit of discussion round these presentations and the wrap up presentation was on the role of service registry to support inter-repository services.

Overall the ‘put’ function was seen as most problematic as an interoperable ‘core repository interoperability service’. The initial discussion was quite atagonistic to the term ‘put’ and eventually after thinking about ‘submit for ingest’, either ‘deposit’ or ‘queue for deposit’ were preferred.

There was some discussion around the lack of homogeneity in the various deposit scenarios that I presented (deposit to repository from desktop application, deposit to repository from experimental equipment, deposit from one repository to another). Also it was agreed that repository ingest will vary greatly across different data types, and depending on different workflows.

There was discussion on how deposit would connect with repository policy and authentication.

- relation between ‘put’ and ingest (where is the boundary?)

- what is the workflow as regards expression of data object as ‘surrogate’? is that expression responsibility of depositing application or repository?

There was a move to exclude ‘deposit’ from further work that might come out of the workshop but this was resisted by myself and others as we considered it vital to work towards making deposit easier in order to populate repositories.

Further discussion went on to consider whether ‘search’ and ‘publish/subscribe’ should be included as core functionality

The meeting ended with a consideration of next steps. Tony Hey, Microsoft, indicated he was keen to take forward some experimentation of interoperability in various eScience areas, possibly across international boundaries: chemistry, environmental data and space/astronomy data were mentioned. Herbert proposed initial work on specifications to capture consensus on a preferred approach.

Notes of the meeting are to go to the sponsors before being circulated.

Presentations: [6]

Herbert Van de Sompel: Context, Motivating Examples, and Introduction to Other Presentations

Carl Lagoze: Interoperable Data Model

Herbert Van de Sompel: Journal Overlay Demonstration (QuickTime Movie)

Andy Powell: Harvest Functionality

Herbert Van de Sompel: Obtain Functionality

Rachel Heery: Put Functionality

Jeremy Frumkin: Infrastructure and Applications


Also of interest is a Panel submission to JCDL:

[7]

ORE - http://www.openarchives.org/ore/

Open Repositories 2007

Rachel Heery, Julie Allinson, Jim Downing, Christopher Gutteridge and Martin Morrey, Repository Deposit Service Description Presented by Julie Allinson at Open Repositories 2007, 23-26 January 2007, San Antonio, Texas. (MIME type: application/pdf)