SWORD APP evaluation

From DigiRepWiki

SWORD Home \| SWORD Wiki \| SWORD Project Background

1 The Standards
2 Evaluation of the Atom Publishing Protocol (APP - AtomPub) and Atom Syndication Format (ATOM) against SWORD parameters for repository deposit
3 SWORD use of APP
4 APP and ATOM support for additional parameters
5 Issues, in- and out-of-scope
6 Metadata, files and packages
7 Reflections and recommendation
8 Proposed SWORD profile of APP / ATOM
9 Examples
- 9.1 Explain URLs
- 9.2 Deposit URLs
10 Tools

[edit]

The Standards

http://atompub.org/rfc4287.html (ATOM)
http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-14.html (APP - AtomPub)

[edit]

Evaluation of the Atom Publishing Protocol (APP - AtomPub) and Atom Syndication Format (ATOM) against SWORD parameters for repository deposit

TO EVALUATE: extensions mechanism; whether it will be constraining due to its application-specific nature and/or too in-depth for our purposes, requiring unnecessary implementation; mediated deposit possibilities, extensibility, namespaces, service description, atom tools, need to define new headers for atom, http headers; how does atom handle metadata?

SWORD Parameter	APP / ATOM	Possible extensions	Notes / questions

ExplainRequest	GET to Service Document	-	?conditional GET to return a particular set of information in the service document? (e.g. related to authentication)
--onBehalfOf TargetUser	no		is it possible to include this in a GET? where does authentication fit in here?
ExplainResponse	Service Document	-
-Wrapper	<app:service>	-
--ServerLevel	no	<sword:level>0\|1
--Version	no	-	not required if using APP
-Repository	<workspace>	-	workspace = repository
--ID (M)	<atom:title>	<dc:identifier> or <baseURL> (http://www.openarchives.org/OAI/2.0/provenance.xsd)	<atom:title> is mandatory in atom (human-readable name for the workspace); <app:workspace> can be extended, <dc:identifier> or <baseURL> from the oai-pmh provenance schema could be used to identify the repository if necessary.
--Policy	no	<dcterms:accrualPolicy>	<app:workspace> could be extended with <dcterms:accrualPolicy> with text and/or URI for a policy statement.
--VerboseSupported	no	<sword:verboseSupport>true\|false
--NoOpSupported	no	<sword:noOpSupport>true\|false
--ChecksumTypeSupported	no	<sword:checksumType>true\|false	Recommend MD5?
--MediationAllowed true\|\|false	no	<sword:mediation>true\|false
-Collections	<app:collection>
--ID	<app:collection href”atomURI”>		Collection URI is captured as an attribute of the <app:collection> element and is mandatory; is this enough? An additional <dc:identifier> could be created as an extension if necessary.
--Name	<atom:title>		<atom:title> is mandatory in atom
--Description	no	<dcterms:abstract>	<app:collection> could be extended with a <dcterms:abstract> element
--Default	no	<sword:defaultCollection>true\|false	Presence of one collection could indicate default? Is an extension necessary?
--DescribeFormat	<accept>	<sword:format>	Specifies a comma-separated list of media-ranges; is this enough, do we need a vocabulary of formats and an element extension to <app:collection>? Do we need to distinguish between different xml documents (didl, mods, ims etc.)
---FormatID	media type	<dc:format> (with vocab)	as mime media type only
---FormatDescription	No	<sword:formatDescription>	possible extension to allow more detailed description/identification of accepted formats, see note about; could extend this to support namespace and schema (see oai-pmh)
--TreatmentDescription	no	<sword:treatment>
--CollectionPolicy	no	<dcterms:accrualPolicy>	<app:collection> could be extended with <dcterms:accrualPolicy> with text and/or URI for a policy statement.
Deposit	POST to URI of Collection
--TargetCollection	Collection URI
--Format	Content-Type in POST; <atom:content type=””>	<dc:format> (using vocab)	mime media type in either case; <atom:content> can also contain the content (e.g. xml); if extending with <sword:format> elements, some kind of description of what a zip file might be useful?
--TransactionID	<atom:id>		There is some confusion between <atom> and <app> regarding the <atom:id>. Atom defines is as 'permanent universally unique identifier for an entry or feed'; whereas APP states that 'The Entry created and returned by the Collection might not match the Entry POSTed by the client. A server MAY change the values of various elements in the Entry, such as the atom:id, atom:updated and atom:author values' – this requires some clarification in the SWORD profile.
--Verbose	no	<sword:verbose>true\|false
--NoOp	no	<sword:noOp>true\|false
--Checksum	no	<sword:checksum>	Or use content-MD5 http header value?
--ChecksumType	no	<sword:checksumType>	Recommend MD5?
--TargetOwner	<atom:author>		Possible foaf extensions for a username (for both author and contributor) atom:contributor could be used for depositor; with atom:author for the target 'owner' (will this always be the 'author'? Can we assume/profile our use of author/contributor in this way); or might extend this with dcterms:mediator?
Receipt	HTTP Response: 201 Created Location: Member Entry URI
-Wrapper	No		not used; response would be a HTTP response
--ServiceLevel 0\|\|1	no		is this necessary here?
--Version	no		not necessary if using APP
-Receipt	Atom Entry
--TransactionID	<atom:id>		See notes above about confusion wrt <atom:id>
--IdentifierURI (M)	Location: (MemberURI) in response		In app, the URI of the Media Link Entry is mandatory in the response (as Location:)
---ObjectURL	<link rel=”edit-media” href=””> <atom:content type=”” src””>		URI for the media resource; these do not have to be the same.
---DisplayURL	<link rel=”edit” href=””> (MemberURI)		URI for the Media Link Entry
--DepositStatus (M)	http status codes returned in the response
---Accepted	201 Created		202 Accepted could be used for cases where there will be a delay in processing
---Rejected	415 Unsupported Media Type
---Error	http 4xx or 5xx codes (see below)
--ErrorCode	4xx or 5xx codes		Do we need to specify sword-specific error codes returned as xml (see oai-pmh), or are those returned in http responses sufficient?
---ErrorContent	404 Not Found		For cases where the server cannot access the material to be deposited
---ErrorParse	no		Is this necessary?
---ErrorChecksumMismatch	no		Could this be included in the atom entry? would there be an atom entry if the deposit had failed? If content-MD5 http header was used, how would a mismatch be identified
---ErrorUnknownChecksumType	no		see above
---ErrorBadRequest	400 Bad Request
---ErrorTargetUserUnknown	401 Unauthorised or 407 Proxy Authorisation Required
---ErrorMediationNotAllowed	403 Forbidden
--ErrorDescription (M)	yes
--TreatmentDescription (M)	no	<sword:treatment>
--FormatHandled	content-type (in <atom:entry> and response)	<dc:format> (from vocab)
--VerboseDescription	No	<dc:description>
--NoOp true\|\|false	no	<sword:noOp>true\|false
--Checksum	no	<sword:checksum>
--ChecksumType	no	<sword:checksumType>	Do we need to support multiple checksum types, or is MD5 enough?

[edit]

SWORD use of APP

5. Protocol Operations
5.1 Retrieving a Service Document USED
5.2 Listing Collection Members NOT USED
5.3 Creating a Resource USED
5.4 Editing a Resource NOT USED
5.4.1 Retrieving a Resource NOT USED
5.4.2 Updating a Resource NOT USED
5.4.3 Deleting a Resource NOT USED
5.5 Use of HTTP Response codes USED
6. Atom Publishing Protocol Documents
6.1 Document Types
6.2 Document Extensibility USED
7. Category Documents NOT USED
8. Service Documents USED
8.1 Workspaces USED
8.3 Element Definitions
8.3.1 The "app:service" Element USED
8.3.2 The "app:workspace" Element USED
8.3.3 The "app:collection" Element USED
8.3.4 The "app:accept" Element USED
8.3.5 The "app:categories" Element NOT USED
9. Creating and Editing Resources USED
9.1 Member URIs USED
9.2 Creating resources with POST USED
9.3 Updating Resources with PUT NOT USED
9.4 Deleting Resources with DELETE NOT USED
9.5 Caching and entity tags NOT USED?
9.6 Media Resources and Media Link Entries USED
9.7 The Slug: Header NOT USED
10. Listing Collections NOT USED
10.1 Collection partial lists
10.2 The "app:edited" Element
11. Atom Format Link Relation Extensions
11.1 The "edit" Link Relation USED
11.2 The "edit-media" Link Relation USED
12. The Atom Format Type Parameter USED
13. Atom Publishing Controls NOT USE
13.1 The "app:control" Element NOT USED
13.1.1 The "app:draft" Element NOT USED

[edit]

APP and ATOM support for additional parameters

use of <atom:generator> within <atom:source> to identify the source repository/service making the deposit; i.e. to provide provenance information, could be extended with oai-pmh provenance elements (see http://www.openarchives.org/OAI/2.0/guidelines-provenance.htm)
<app:control>structured extension for publishing control, with <app:draft> (a request by the client to control the visibility of a Member Resource ) could be used to ask for deposits to be non-public (e.g. for embargoed material)
Listing collections offers a facility for listing members of repository collections using <atom:feed> documents. This is out of scope for the SWORD project but might be worthy of further investigation, alongside oai-pmh sets and sitemaps.org
Atom support for addition <link rel=””> attributes offer potential for identifying related objects

[edit]

Issues, in- and out-of-scope

Versioning, adding new 'expressions' to an existing deposit, duplication
Identifiers, different servers assigning multiple identifiers; tracking provenance with a client ID, maintaining that ID
Formats, identifying the different types of packaging standard used
Mediation
Listing Collections, mandatory in ATOM
Authentication, must support http https

[edit]

Metadata, files and packages

Three scenarios for <content>

POST media-file (single file), with metadata embedded in <content> element as structured xml, e.g. epdcx, oai_dc

POST media-file (single file), with metadata embedded within the object (e.g. PDF)

POST media-file (package or zip), which contains the metadata and objects, src attribute of <content> identifies</p>

POST xml package, which contains structured xml for both metadata and object

There is`a challenge here in knowing what we are getting

[edit]

Reflections and recommendation

APP supports deposit of files (media) and is agnostic about content-types. It's easily extensible and 'foreign markup' shouldn't break processing. It also upports collections, encourages repositories to expose information about their collections in a standard way.

Start implementing based on the SWORD profile of APP, initial focus on level:0 (mandatory elements), moving to level:1 and extending the SWORD APP profile as necessary.

[edit]

Proposed SWORD profile of APP / ATOM

Need to identify what elements are used and how, and what explicitly aren't. Recommendations might include metadata format (e.g. epdcx and/or simple DC) and recommended format types. We might also want to specify server/client requirements and create a (small) SWORD schema for extension elements.

[edit]