SWORD discussion paper

From DigiRepWiki

SWORD Home | SWORD Wiki | SWORD Project Background

Contents

SWORD 2 discussion paper

This paper summarise suggestions for future work on SWORD, for discussion at the SWORD 2 kickoff meeting, 1st July. Much of this information has been gathered from SWORD developers - attribution is given wherever possible.

Refinements to the existing SWORD protocol and implementations

Improved self description

It seems that any carefully programmed SWORD interface will impose some limits on upload file size to protect the repository system, comply with local file size constraints and/or implement local file size policies. For arXiv, where worldwide accessibility is considered important, we set fairly low submission size limits. It might be useful to be able to express these in the service document so that a client could act accordingly rather than getting an error message for an oversize attempt. Perhaps:

<sword:maxUploadSize>10000000</sword:maxUploadSize>

The implementation at arXiv assumes several files will be uploaded via multiple SWORD deposits before finally being submitted to the repository. If such a model were to be widely adopted then it may also be worthwhile to specify a total submission size if different. (from arXiv case study)

Improved method for error reporting

In section 3.6 I have outlined the mechanism that we have adopted for extending error reporting. I think requiring the use of an Atom entry return instead of a weakly specified recommendation to include a human-readable message is a distinct improvement and would be even more powerful if adopted within the SWORD specification. It actually simplifies client code to use Atom, because the processing of responses can use standard parsing libraries.

The use of an arxiv:errorcode is clearly too implementation specific. If this method were adopted in the SWORD specification then it would make sense to have something like sword:errorcode and perhaps the best way for a restricted and extensible vocabulary would be to use URIs for the error codes, and have the URI as an attribute, as is customary, instead of relaying it in element content. There could then be a set of standard SWORD error codes (including the ones already defined), for example:

<sword:errorcode href="http://purl.org/net/sword/error/ChecksumMismatch" />

and particular implementation would be free to use errors not in the SWORD namespace (and thus easily recognized as non-standard), e.g.:

<sword:errorcode href="http://arxiv.org/schemas/sword/error/NoPrimaryCategory" />

If this were adopted then the HTTP header extension X-Error-Code could sensibly be dropped. (from arXiv case study)

Other

Issues with location of the installation?

Issues with error codes?

Developmental directions for SWORD

Workflow

Improved integration with more complex workflows

I imagine that there are many repositories that have workflows at least as complex as arXiv's (see #Workflow). Some way for the submitting client to indicate a callback mechanism better than email may well be broadly appropriate. Specification of such a mechanism would first require an overview of the methods that would work with a range of repository systems. (from arXiv case study)

While the scope of SWORD is understandably limited to defining a deposit API, the reality is that developers adopting it will want and need to use it within their business workflows. This would include deposits of temporary working documents and behind-the-scenes automated document archiving flows. It would seem useful to conduct one or two small-scale workflow projects where SWORD is used throughout various phases of the workflow, or for more advanced deposits (for example, are structured rather than textual responses to a deposit likely to be a future requirement). Alternatively, external projects could be identified and then asked to report back their experience in the form of an implementation report to inform later phases of SWORD. While external projects may not complete within the SWORD 2 project, the identification of projects and commitment by SWORD 2 to collate and summarise the outcomes in the form of recommendations for future work may be a satisfactory outcome. (Scott Yeadon)

Enabling workflow ... for example, it might be able to expose a "status" interface for every item in an archive (Richard Jones)

The Deposit

Taking SWORD "beyond packages"

The development of the OAI-ORE (Open Archives Initiative - Object Re-use and Exchange) complex object description standard raises some interesting questions for deposit. For example, ORE could be used to allow incremental construction of a complex object on the server side of a deposit protocol. This scenario has a number of benefits and potential use cases; as trivial examples: - Large complex objects could be built in multiple sessions

Per-file validation could feedback immediately, rather than waiting until the package is complete

Most of the SWORD constraints and extensions would be useful in this scenario. It would be interesting to see it investigated further. (SPECTRa case study)

Testing and integration with OAI-ORE more generally

Depositing non-binary objects (Paul Hart) ... such as URIs to externally referenced content. This might be enabled by allowing the deposit of ATOM documents.

How can we deal with package descriptions better? At the moment, the free text description, which is effectively a mime-type is already causing problems, now that I'm also trying to ingest other package formats (Richard Jones)

Fine-grained deposit ... like being able to add a single file to an object, or even manipulating metadata (although we don't want to just be DAV) (Richard Jones)

Accept More capacity to describe what sorts of things the deposit targets can accept. So, a DSpace specific example would be that some deposit targets can only accept collections of objects rather than individual objects, whereas some other targets will /not/ take collections. I'm not sure how you'd actually go about expressing this. (Richard Jones)

Recommending formats

I'd like for SWORD to step up to actually recommending that implementors support a specific deposit format (and still using compliance levels to indicate to what degree the deposit format is supported). I'm not picky about the format itself, although in keeping with the spirit of SWORD/APP, etc., nothing too complicated. (Eddie Shin, Fedora)

Create a process definition for registering package formats

Currently SWORD-supported packages are not well-defined and limit the potential of interoperability via the API. As an example, in the API documentation METS is an allowable package format. However METS as a format description is not very helpful as this description does not allow workflow systems or repositories to make pre-processing decisions without examining the payload of the request. It could be expected for example that a workflow or repository application would make different routing/processing decisions if knowing the object being deposited was a journal marked up to a particular METS profile. Given the header information could currently support this, it would seem prudent to narrow down an initial list of defined supported formats (for example, don't support a "METS" package format, define package format values for all METS profiles registered on the Library of Congress Web Site). This will help ensure developers first examine the list of supported package formats before resorting to their local extensions when developing SWORD clients and servers.

In addition to this, a process for registering package formats should be defined. This would allow new package formats to be added to the "SWORD supported" list. To what level this would need to be mediated or whether only open formats registered elsewhere would be allowed would need to be examined.

The challenge with this process is that it needs to be able to live on after completion of the SWORD 2 project. (Scott Yeadon)

Extending to full APP implementation

  • Update (Claire Knowles/Robin Taylor, Antony Corfield)
  • Delete (Claire Knowles/Robin Taylor, Antony Corfield)
  • Categories (Antony Corfield)
  • Accepting ATOM documents

I'm not sure the Retrieve is the business of a deposit tool, and I'm not 100% that it should be able to do Delete either. So that's a kind of "things I don't think SWORD 2 should worry about". Update, on the other hand, could be useful. (Richard Jones)

The Service Document

I'd like to address some of the scalability issues with service documents by introducing hierarchies of service documents (i.e. so that you can request of a deposit target any other deposit targets underneath it). (Richard Jones)

The ATOM response

The process of ingesting an item into an archive can produce new complex objects /inside/ the archive. I'd like a way for the SWORD response to be able to describe the object that has just been created. (Richard Jones)

Other

Creating collections

Investigating OpenAuth and OpenID as a replacement for mediated deposit.

Other suggestions

  • Ongoing maintenance of SWORD libraries
  • The example classes provided with the SWORD Java common library are of more use in production environments than the authors anticipated. If possible, effort should be expended in making the tools more robust and putting a sustainability mechanism in place.
  • additional code libraries (.net / php etc.)
  • support model for installations and demos
  • arXiv integration