SWORD APP evaluation

From DigiRepWiki

SWORD Home | SWORD Wiki | SWORD Project Background

Contents

The Standards

Evaluation of the Atom Publishing Protocol (APP - AtomPub) and Atom Syndication Format (ATOM) against SWORD parameters for repository deposit

TO EVALUATE: extensions mechanism; whether it will be constraining due to its application-specific nature and/or too in-depth for our purposes, requiring unnecessary implementation; mediated deposit possibilities, extensibility, namespaces, service description, atom tools, need to define new headers for atom, http headers; how does atom handle metadata?

<tbody> </tbody>

SWORD Parameter

APP / ATOM

Possible extensions

Notes / questions





ExplainRequest

     GET to Service Document

-

?conditional GET to return a particular set of information in the service document? (e.g. related to authentication)

--onBehalfOf TargetUser

no


is it possible to include this in a GET? where does authentication fit in here?

ExplainResponse

Service Document

-


-Wrapper

<app:service>

-


--ServerLevel

no

<sword:level>0|1


--Version

no

-

not required if using APP

-Repository

<workspace>

-

workspace = repository

--ID (M)

<atom:title>

<dc:identifier> or <baseURL> (http://www.openarchives.org/OAI/2.0/provenance.xsd)

<atom:title> is mandatory in atom (human-readable name for the workspace); <app:workspace> can be extended, <dc:identifier> or <baseURL> from the oai-pmh provenance schema could be used to identify the repository if necessary.

--Policy

no

<dcterms:accrualPolicy>

<app:workspace> could be extended with <dcterms:accrualPolicy> with text and/or URI for a policy statement.

--VerboseSupported

no

<sword:verboseSupport>true|false


--NoOpSupported

no

<sword:noOpSupport>true|false


--ChecksumTypeSupported

no

<sword:checksumType>true|false

Recommend MD5?

--MediationAllowed true||false

no

<sword:mediation>true|false


-Collections

<app:collection>



--ID

<app:collection href”atomURI”>


Collection URI is captured as an attribute of the <app:collection> element and is mandatory; is this enough? An additional <dc:identifier> could be created as an extension if necessary.

--Name

<atom:title>


<atom:title> is mandatory in atom

--Description

no

<dcterms:abstract>

<app:collection> could be extended with a <dcterms:abstract> element

--Default

no

<sword:defaultCollection>true|false

Presence of one collection could indicate default? Is an extension necessary?

--DescribeFormat

<accept>

<sword:format>

Specifies a comma-separated list of media-ranges; is this enough, do we need a vocabulary of formats and an element extension to <app:collection>? Do we need to distinguish between different xml documents (didl, mods, ims etc.)

---FormatID

media type

<dc:format> (with vocab)

as mime media type only

---FormatDescription

No

<sword:formatDescription>

possible extension to allow more detailed description/identification of accepted formats, see note about; could extend this to support namespace and schema (see oai-pmh)

--TreatmentDescription

no

<sword:treatment>


--CollectionPolicy

no

<dcterms:accrualPolicy>

<app:collection> could be extended with <dcterms:accrualPolicy> with text and/or URI for a policy statement.

Deposit

     POST to URI of Collection



--TargetCollection

Collection URI



--Format

Content-Type in POST; <atom:content type=””>

<dc:format> (using vocab)

mime media type in either case; <atom:content> can also contain the content (e.g. xml); if extending with <sword:format> elements, some kind of description of what a zip file might be useful?

--TransactionID

<atom:id>


There is some confusion between <atom> and <app> regarding the <atom:id>. Atom defines is as 'permanent universally unique identifier for an entry or feed'; whereas APP states that 'The Entry created and returned by the Collection might not match the Entry POSTed by the client. A server MAY change the values of various elements in the Entry, such as the atom:id, atom:updated and atom:author values' – this requires some clarification in the SWORD profile.

--Verbose

no

<sword:verbose>true|false


--NoOp

no

<sword:noOp>true|false


--Checksum

no

<sword:checksum>

Or use content-MD5 http header value?

--ChecksumType

no

<sword:checksumType>

Recommend MD5?

--TargetOwner

<atom:author>


Possible foaf extensions for a username (for both author and contributor)

atom:contributor could be used for depositor; with atom:author for the target 'owner' (will this always be the 'author'? Can we assume/profile our use of author/contributor in this way); or might extend this with dcterms:mediator?

Receipt

HTTP Response:

201 Created

Location: Member Entry URI



-Wrapper

No


not used; response would be a HTTP response

--ServiceLevel 0||1

no


is this necessary here?

--Version

no


not necessary if using APP

-Receipt

Atom Entry



--TransactionID

<atom:id>


See notes above about confusion wrt <atom:id>

--IdentifierURI (M)

Location: (MemberURI) in response


In app, the URI of the Media Link Entry is mandatory in the response (as Location:)

---ObjectURL

<link rel=”edit-media” href=””>

<atom:content type=”” src””>


URI for the media resource; these do not have to be the same.

---DisplayURL

<link rel=”edit” href=””> (MemberURI)


URI for the Media Link Entry

--DepositStatus (M)

http status codes returned in the response



---Accepted

201 Created


202 Accepted could be used for cases where there will be a delay in processing

---Rejected

415 Unsupported Media Type



---Error

http 4xx or 5xx codes (see below)



--ErrorCode

4xx or 5xx codes


Do we need to specify sword-specific error codes returned as xml (see oai-pmh), or are those returned in http responses sufficient?

---ErrorContent

404 Not Found


For cases where the server cannot access the material to be deposited

---ErrorParse

no


Is this necessary?

---ErrorChecksumMismatch

no


Could this be included in the atom entry? would there be an atom entry if the deposit had failed? If content-MD5 http header was used, how would a mismatch be identified

---ErrorUnknownChecksumType

no


see above

---ErrorBadRequest

400 Bad Request



---ErrorTargetUserUnknown

401 Unauthorised or 407 Proxy Authorisation Required



---ErrorMediationNotAllowed

403 Forbidden



--ErrorDescription (M)

yes



--TreatmentDescription (M)

no

<sword:treatment>



--FormatHandled

content-type (in <atom:entry> and response)

<dc:format> (from vocab)


--VerboseDescription

No

<dc:description>


--NoOp true||false

no

<sword:noOp>true|false


--Checksum

no

<sword:checksum>


--ChecksumType

no

<sword:checksumType>

Do we need to support multiple checksum types, or is MD5 enough?

SWORD use of APP

  • 5. Protocol Operations
  • 5.1 Retrieving a Service Document USED
  • 5.2 Listing Collection Members NOT USED
  • 5.3 Creating a Resource USED
  • 5.4 Editing a Resource NOT USED
  • 5.4.1 Retrieving a Resource NOT USED
  • 5.4.2 Updating a Resource NOT USED
  • 5.4.3 Deleting a Resource NOT USED
  • 5.5 Use of HTTP Response codes USED
  • 6. Atom Publishing Protocol Documents
  • 6.1 Document Types
  • 6.2 Document Extensibility USED
  • 7. Category Documents NOT USED
  • 8. Service Documents USED
  • 8.1 Workspaces USED
  • 8.3 Element Definitions
  • 8.3.1 The "app:service" Element USED
  • 8.3.2 The "app:workspace" Element USED
  • 8.3.3 The "app:collection" Element USED
  • 8.3.4 The "app:accept" Element USED
  • 8.3.5 The "app:categories" Element NOT USED
  • 9. Creating and Editing Resources USED
  • 9.1 Member URIs USED
  • 9.2 Creating resources with POST USED
  • 9.3 Updating Resources with PUT NOT USED
  • 9.4 Deleting Resources with DELETE NOT USED
  • 9.5 Caching and entity tags NOT USED?
  • 9.6 Media Resources and Media Link Entries USED
  • 9.7 The Slug: Header NOT USED
  • 10. Listing Collections NOT USED
  • 10.1 Collection partial lists
  • 10.2 The "app:edited" Element
  • 11. Atom Format Link Relation Extensions
  • 11.1 The "edit" Link Relation USED
  • 11.2 The "edit-media" Link Relation USED
  • 12. The Atom Format Type Parameter USED
  • 13. Atom Publishing Controls NOT USE
  • 13.1 The "app:control" Element NOT USED
  • 13.1.1 The "app:draft" Element NOT USED

APP and ATOM support for additional parameters

  • use of <atom:generator> within <atom:source> to identify the source repository/service making the deposit; i.e. to provide provenance information, could be extended with oai-pmh provenance elements (see http://www.openarchives.org/OAI/2.0/guidelines-provenance.htm)
  • <app:control>structured extension for publishing control, with <app:draft> (a request by the client to control the visibility of a Member Resource ) could be used to ask for deposits to be non-public (e.g. for embargoed material)
  • Listing collections offers a facility for listing members of repository collections using <atom:feed> documents. This is out of scope for the SWORD project but might be worthy of further investigation, alongside oai-pmh sets and sitemaps.org
  • Atom support for addition <link rel=””> attributes offer potential for identifying related objects

Issues, in- and out-of-scope

  • Versioning, adding new 'expressions' to an existing deposit, duplication
  • Identifiers, different servers assigning multiple identifiers; tracking provenance with a client ID, maintaining that ID
  • Formats, identifying the different types of packaging standard used
  • Mediation
  • Listing Collections, mandatory in ATOM
  • Authentication, must support http https

Metadata, files and packages

Three scenarios for <content>

  • POST media-file (single file), with metadata embedded in <content> element as structured xml, e.g. epdcx, oai_dc
  • POST media-file (single file), with metadata embedded within the object (e.g. PDF)
  • POST media-file (package or zip), which contains the metadata and objects, src attribute of <content> identifies</p>
  • POST xml package, which contains structured xml for both metadata and object

There is`a challenge here in knowing what we are getting

Reflections and recommendation

APP supports deposit of files (media) and is agnostic about content-types. It's easily extensible and 'foreign markup' shouldn't break processing. It also upports collections, encourages repositories to expose information about their collections in a standard way.

Start implementing based on the SWORD profile of APP, initial focus on level:0 (mandatory elements), moving to level:1 and extending the SWORD APP profile as necessary.

Proposed SWORD profile of APP / ATOM

Need to identify what elements are used and how, and what explicitly aren't. Recommendations might include metadata format (e.g. epdcx and/or simple DC) and recommended format types. We might also want to specify server/client requirements and create a (small) SWORD schema for extension elements.

Examples

To add.

Explain URLs

GET service document:

Deposit URLs

POST binary to:

Tools

See http://bitworking.org/projects/apptestclient/

Java client/server library for APP:

https://rome.dev.java.net/apidocs/subprojects/propono/0.4/overview-summary.html