Deposit API Strawman

From DigiRepWiki

Contents

Deposit API Strawman (v1)

This page is part of Deposit API.

This page outlines some thinking on a remote deposit API, following on from discussions a few months ago, and in the light of comments made by Rachel Heery from the US meeting. Read this in the context of some thoughts to provoke a further discussion. Remember that ultimately, your software will have to implement something like this (or not!)

The current situation as I understand it is that some of the vendors have a remote deposit API, some do not. There is something of a consensus that having a common way of doing remote deposit is a good thing.

PJN 07/07/2006

Deposit Components

In London, it was agreed that there were three main areas of functionality that were being brought together to form the Deposit API. - "PUT" - a component to move the 'thing' from one machine to another. - "RECEIPT" - a component to inform the 'putter' that the 'thing' has been received - "EXPLAIN" a component that informs potential 'putters' of the policies that will be applied to the 'thing' once it is 'put'.

Also in London, it was proposed that there should be a "level based model" whereby lower levels present more basic functionality.

The US meeting introduced the notion of the "Surrogate Object", a representation of a specific asset in a repository. There might be reasons why the asset itself cannot be transferred from one repository to another; moving a surrogate of the asset (i.e. the metadata, control data and identifiers back to the original) is desirable. The US meeting also noted some hostility towards the use of the word "put", preferring instead "deposit". Additionally, the notion of the service registry was explored.

Bringing all of this together illustrates the functionality that is likely to be required by the various components.

Data Types

Asset: This is the actual digital asset (for example, a digital photograph, or a word document, or a Learning Object).
Metadata: This is the metadata that relates to the asset.
Surrogate: This is a surrogate for the asset (as described above).
Receipt: This is a data object which confirms that the data being deposited has been received
Policy: This is a data object which represents a policy for a repository.

Assets / Surrogates / Metadata might be packed in some format. It might be the case that the Metadata is packaged in with the Asset / Surrogate (such as in IMS Content Packaging), or the Metadata might be encoded independently.

Policy objects could consist of a machine readable part, a human readable part, and a representation of the 'value' of the policy. A policy in this sense is either a statement about a 'permission', or a description of a process that will happen to deposited data.

Receipts are generated by the service and then returned to the user. Receipts will need to include data such as the date and the id assigned to the deposit. Additionally, the receipt could also include a statement of the policies that have been applied to the deposit.

... Are other data types needed? What are the properties of the data types described?


Services

Deposit: This is a service that is offered by a repository, allowing remote users (machines or people) to upload data.

Explain: This is a service that is offered by a repository, allowing remote users (machines or people) to inspect the repository for policy and/or other data.

Deposit Service Behaviours

Level 0:
DepositAsset: upload of an asset, plus associated metadata, into the repository.
DepositSurrogate: upload of a surrogate, plus associated metadata, into the repository.

Level 1:
DepositIntoCollection: upload of 'data item' into a collection on the repository.

...others?

All deposit behaviours should return a receipt.

Explain Service Behaviours

Level 0:
GetRepositoryDescription: return a human readable high level description (e.g. such as the welcome message from an ftp server).
GetHRPolicyList: return a Human Readable list of all policy objects that exist in the repository to the user.

Level 1:
GetPolicyLevel: return the level of policy complexity that this repository goes to.
GetMRPolicyList: return a Machine Readable list of all policy objects that exist in the repository to the user.

...others?

Type Descriptions

Assets:
//For now being treated as a binary lump.

Metadata:
//Dublin Core, IMS MD etc

Surrogate:
id - Identifier in the current system
Provider - url to the provider repository
ProvideId - id within the provider repository
ProviderIdVersion - version within the provider repository

//See Lagoze et al...

Policy:
id - Identifier
Applies to: {"repository", "deposit"} //others?
Description - Human readable description of the policy

e.g.
id: 1
Applies to: "repository"
Description: "This repository accepts IMS Content Packages v1.1.4"

id: 2
Applies to: "deposit"
Description: "Any extension based metadata is discarded"


At level 1..n
The policy descriptions would need to be machine readable properties. There is a fairly big job to be done in defining all of the different properties, and identifying types for how the property is supported.

Different levels could the introduce different types of property, in increasing complexity.

For example:
Level 1 policy (true/false statements)
"Force Overwrite" - True||False
"Anonymous Deposit" - True||False

Level 2 policy (enumerated lists)
"Accepted Formats" - IMS CPv1.x||METS||...

Level 3 policy (collection level policies)
"insert into collection" - {collection_id; level 1 or level 2 policy}

level 4 policy (very complex returns)
"Metadata is altered" - {complex object describing the changes}

Receipt:
Level 0-
id: machine generated receipt id
date: timestamp of receipt
deposit id: identifier of the deposit within this system

Level 1- (as above +)
policy ids: list of ids of all policies that were applied to the deposit.

Some Issues

Authorisation and Authentication
- Policies declaring existing behaviour with respect to both of these.
- Is there a separate service for handling Authentication?
- How do current protocols (SRW update etc) handle these?
- If a policy relates to a collection, and the user is not authorised to see the collection, are they authorised to see the policy?

Versioning
- If a deposit overwrites an existing item, is the current item preserved?
- Supporting different versions of specifications
- Are there issues with Surrogates being copied around?

Other commands
- Delete?

Collections
- What are the issues around collections?
- Can we deposit a collection?

Maintenance
- If there are central lists of properties, who maintains them?

State of the Art
- Is all of this (or something like it) Feasible?
- How well could the current protocols/services/products handle these ideas?

Others...????