SWORD APP evaluation

From DigiRepWiki

SWORD Home | SWORD Wiki | SWORD Project Background


The Standards

Evaluation of the Atom Publishing Protocol (APP - AtomPub) and Atom Syndication Format (ATOM) against SWORD parameters for repository deposit

TO EVALUATE: extensions mechanism; whether it will be constraining due to its application-specific nature and/or too in-depth for our purposes, requiring unnecessary implementation; mediated deposit possibilities, extensibility, namespaces, service description, atom tools, need to define new headers for atom, http headers; how does atom handle metadata?

<tbody> </tbody>

SWORD Parameter


Possible extensions

Notes / questions


     GET to Service Document


?conditional GET to return a particular set of information in the service document? (e.g. related to authentication)

--onBehalfOf TargetUser


is it possible to include this in a GET? where does authentication fit in here?


Service Document











not required if using APP




workspace = repository

--ID (M)


<dc:identifier> or <baseURL> (http://www.openarchives.org/OAI/2.0/provenance.xsd)

<atom:title> is mandatory in atom (human-readable name for the workspace); <app:workspace> can be extended, <dc:identifier> or <baseURL> from the oai-pmh provenance schema could be used to identify the repository if necessary.




<app:workspace> could be extended with <dcterms:accrualPolicy> with text and/or URI for a policy statement.










Recommend MD5?

--MediationAllowed true||false






<app:collection href”atomURI”>

Collection URI is captured as an attribute of the <app:collection> element and is mandatory; is this enough? An additional <dc:identifier> could be created as an extension if necessary.



<atom:title> is mandatory in atom




<app:collection> could be extended with a <dcterms:abstract> element




Presence of one collection could indicate default? Is an extension necessary?




Specifies a comma-separated list of media-ranges; is this enough, do we need a vocabulary of formats and an element extension to <app:collection>? Do we need to distinguish between different xml documents (didl, mods, ims etc.)


media type

<dc:format> (with vocab)

as mime media type only




possible extension to allow more detailed description/identification of accepted formats, see note about; could extend this to support namespace and schema (see oai-pmh)







<app:collection> could be extended with <dcterms:accrualPolicy> with text and/or URI for a policy statement.


     POST to URI of Collection


Collection URI


Content-Type in POST; <atom:content type=””>

<dc:format> (using vocab)

mime media type in either case; <atom:content> can also contain the content (e.g. xml); if extending with <sword:format> elements, some kind of description of what a zip file might be useful?



There is some confusion between <atom> and <app> regarding the <atom:id>. Atom defines is as 'permanent universally unique identifier for an entry or feed'; whereas APP states that 'The Entry created and returned by the Collection might not match the Entry POSTed by the client. A server MAY change the values of various elements in the Entry, such as the atom:id, atom:updated and atom:author values' – this requires some clarification in the SWORD profile.










Or use content-MD5 http header value?




Recommend MD5?



Possible foaf extensions for a username (for both author and contributor)

atom:contributor could be used for depositor; with atom:author for the target 'owner' (will this always be the 'author'? Can we assume/profile our use of author/contributor in this way); or might extend this with dcterms:mediator?


HTTP Response:

201 Created

Location: Member Entry URI



not used; response would be a HTTP response

--ServiceLevel 0||1


is this necessary here?



not necessary if using APP


Atom Entry



See notes above about confusion wrt <atom:id>

--IdentifierURI (M)

Location: (MemberURI) in response

In app, the URI of the Media Link Entry is mandatory in the response (as Location:)


<link rel=”edit-media” href=””>

<atom:content type=”” src””>

URI for the media resource; these do not have to be the same.


<link rel=”edit” href=””> (MemberURI)

URI for the Media Link Entry

--DepositStatus (M)

http status codes returned in the response


201 Created

202 Accepted could be used for cases where there will be a delay in processing


415 Unsupported Media Type


http 4xx or 5xx codes (see below)


4xx or 5xx codes

Do we need to specify sword-specific error codes returned as xml (see oai-pmh), or are those returned in http responses sufficient?


404 Not Found

For cases where the server cannot access the material to be deposited



Is this necessary?



Could this be included in the atom entry? would there be an atom entry if the deposit had failed? If content-MD5 http header was used, how would a mismatch be identified



see above


400 Bad Request


401 Unauthorised or 407 Proxy Authorisation Required


403 Forbidden

--ErrorDescription (M)


--TreatmentDescription (M)




content-type (in <atom:entry> and response)

<dc:format> (from vocab)




--NoOp true||false









Do we need to support multiple checksum types, or is MD5 enough?

SWORD use of APP

  • 5. Protocol Operations
  • 5.1 Retrieving a Service Document USED
  • 5.2 Listing Collection Members NOT USED
  • 5.3 Creating a Resource USED
  • 5.4 Editing a Resource NOT USED
  • 5.4.1 Retrieving a Resource NOT USED
  • 5.4.2 Updating a Resource NOT USED
  • 5.4.3 Deleting a Resource NOT USED
  • 5.5 Use of HTTP Response codes USED
  • 6. Atom Publishing Protocol Documents
  • 6.1 Document Types
  • 6.2 Document Extensibility USED
  • 7. Category Documents NOT USED
  • 8. Service Documents USED
  • 8.1 Workspaces USED
  • 8.3 Element Definitions
  • 8.3.1 The "app:service" Element USED
  • 8.3.2 The "app:workspace" Element USED
  • 8.3.3 The "app:collection" Element USED
  • 8.3.4 The "app:accept" Element USED
  • 8.3.5 The "app:categories" Element NOT USED
  • 9. Creating and Editing Resources USED
  • 9.1 Member URIs USED
  • 9.2 Creating resources with POST USED
  • 9.3 Updating Resources with PUT NOT USED
  • 9.4 Deleting Resources with DELETE NOT USED
  • 9.5 Caching and entity tags NOT USED?
  • 9.6 Media Resources and Media Link Entries USED
  • 9.7 The Slug: Header NOT USED
  • 10. Listing Collections NOT USED
  • 10.1 Collection partial lists
  • 10.2 The "app:edited" Element
  • 11. Atom Format Link Relation Extensions
  • 11.1 The "edit" Link Relation USED
  • 11.2 The "edit-media" Link Relation USED
  • 12. The Atom Format Type Parameter USED
  • 13. Atom Publishing Controls NOT USE
  • 13.1 The "app:control" Element NOT USED
  • 13.1.1 The "app:draft" Element NOT USED

APP and ATOM support for additional parameters

  • use of <atom:generator> within <atom:source> to identify the source repository/service making the deposit; i.e. to provide provenance information, could be extended with oai-pmh provenance elements (see http://www.openarchives.org/OAI/2.0/guidelines-provenance.htm)
  • <app:control>structured extension for publishing control, with <app:draft> (a request by the client to control the visibility of a Member Resource ) could be used to ask for deposits to be non-public (e.g. for embargoed material)
  • Listing collections offers a facility for listing members of repository collections using <atom:feed> documents. This is out of scope for the SWORD project but might be worthy of further investigation, alongside oai-pmh sets and sitemaps.org
  • Atom support for addition <link rel=””> attributes offer potential for identifying related objects

Issues, in- and out-of-scope

  • Versioning, adding new 'expressions' to an existing deposit, duplication
  • Identifiers, different servers assigning multiple identifiers; tracking provenance with a client ID, maintaining that ID
  • Formats, identifying the different types of packaging standard used
  • Mediation
  • Listing Collections, mandatory in ATOM
  • Authentication, must support http https

Metadata, files and packages

Three scenarios for <content>

  • POST media-file (single file), with metadata embedded in <content> element as structured xml, e.g. epdcx, oai_dc
  • POST media-file (single file), with metadata embedded within the object (e.g. PDF)
  • POST media-file (package or zip), which contains the metadata and objects, src attribute of <content> identifies</p>
  • POST xml package, which contains structured xml for both metadata and object

There is`a challenge here in knowing what we are getting

Reflections and recommendation

APP supports deposit of files (media) and is agnostic about content-types. It's easily extensible and 'foreign markup' shouldn't break processing. It also upports collections, encourages repositories to expose information about their collections in a standard way.

Start implementing based on the SWORD profile of APP, initial focus on level:0 (mandatory elements), moving to level:1 and extending the SWORD APP profile as necessary.

Proposed SWORD profile of APP / ATOM

Need to identify what elements are used and how, and what explicitly aren't. Recommendations might include metadata format (e.g. epdcx and/or simple DC) and recommended format types. We might also want to specify server/client requirements and create a (small) SWORD schema for extension elements.


To add.

Explain URLs

GET service document:

Deposit URLs

POST binary to:


See http://bitworking.org/projects/apptestclient/

Java client/server library for APP: