JISC Information Environment

Content Submission

JISC IE UKOLN

Technical standards | Publications | Presentations

1. Introduction

This document specifies the key standards and protocols that support the submission of digital content to repositories within the JISC Information Environment (JISC IE). The JISC IE technical architecture specifies a set of standards and protocols that support the development and delivery of an integrated set of networked services that allow the end-user to discover, access, use and publish digital and physical resources as part of their learning and research activities.

This document is intended to be read by:

For the purposes of this document, 'repository' refers to any service, whether offered within an institution or nationally, that provides a managed store of digital content (e.g. eprint or other document archives, image databases, moving image databases, datasets, learning object repositories, metadata databases, etc.) with the aim of disclosing that content to a wide audience. The intention is to give some guidance to service providers about which standards and protocols should be used to support the submission of content to their digital repositories and to end-users about the ways in which they should expect to have to package their content in order to deposit it with those repositories. Other parts of the JISC IE technical architecture specify the standards and protocols for exposing content and metadata from within the repositories.

This document only considers the technical standards required to support the deposit of digital content with repositories. It does not consider work-flow issues - for example, it does not consider whether or how repository service providers should provide editorial control over the content being submitted to their repository. Nor does it consider policy issues such as whether content should be deposited in multiple repositories, the impact of depositing content on the ability to 'publish' that content in other forms, or issues associated with preservation and management of deposited content.

Finally, this document does not consider the deposit of physical resources. However, the deposit of (digital) metadata about physical resources is covered by the standards and protocols discussed here.

2. Scenarios

This section provides some hypothetical scenarios of how content submission might take place within the UK higher and further ediucation community.

2.1 Researcher depositing publication with institutional e-print archive

A researcher prepares a paper describing her recent work and submits it for publication to one of the key academic journals in her subject area. Following discussions with her local subject librarian and having checked the publisher's policy on the deposit of pre-print versions of publications, the researcher decides to deposit a copy of the pre-print (unrefereed) version of her paper in her institution's e-print archive.

The researcher visits the home page of her institutional e-print archive and logs in using her normal campus username and password. She clicks on the button to start depositing a new publication and fills in a series of Web forms, providing two copies of the pre-print, the original MS-Word document and a PDF version derived from it, and a fairly detailed set of metadata about the publication. As part of this process, the server-side scripts handling the submission call three external Web services in order to automatically assign several Dewey subject terms to the pre-print, to validate the supplied author names against a name-authority list and to generate OpenURLs from the list of references at the end of the pre-print. These Web services are offered by OCLC, the British Library and the University of Southampton (the first being made available to all members of the institution through a site licence).

Some months later, following the peer-review process and publication in the academic journal, she returns to the e-print archive in order to deposit the PDF post-print version of her paper (again, having first checked the publisher's rules about deposit of post-prints). A series of Web forms ask for metadata about the post-print and allow for uploading the PDF version (as before). The deposited post-print is flagged as being 'campus-only' for the next 12 months (in accordance with the publisher's rules about deposit of post-prints) and is linked to the previously deposited pre-print. The researcher also provides the URL of the publisher's online version of the paper (which happens, in this case, to be available on a site-licence basis to many other UK institutions).

All interactions with the institutional e-print archive use the researcher's Web browser and are based on the use of HTTP GET and POST requests (Web forms). The interactions between the server-side scripts that support the deposit process and the Web services are based on SOAP.

2.2 Learning technologist depositing learning object with national learning object repository

A learning technologist has been working closely with a lecturer in his department to develop a short video and associated Web pages covering part of a course module. Within their institution, the video and Web pages are delivered using the Blackboard virtual learning environment. The learning technologist and lecturer agree to make their resource freely available to other UK higher and further education institutions via a national learning object repository.

Using an IMS content packaging tool, the learning technologist packages up the video and Web pages along with some IEEE LOM metadata describing the resource. The chosen learning object repository supports three submission methods: Web forms (much like the scenario above), FTP and email. Having previously registered to provide content to the repository, using his personal Athens account, the learning technologist decides to submit the IMS package using email. He uses his normal email client to send the package as a MIME attachment to the repository's submission email address. A few minutes later he receives a response indicating that one of the IEEE LOM metadata fields mandated by the repository is missing and that he will need to re-submit a revised version.

The learning technologist uses the IMS content packaging tool to add the required metadata field to the IMS package, this time remembering to click on the 'validate' button in the tool which uses and external validation Web service, and re-submits it using email. A few minutes later he receives and email confirmation request (to confirm that the email submission wasn't forged). Clicking on the 'confirm' link in the email results in a Web page indicating that the learning object has been successfully deposited with the repository and will be visible in search/browse results the next day.

2.3 Content owner depositing digitised resources with a university library and a national 'arts' repository

On a her retirement a well-known photographer deposits a small collection of locally significant photographs taken in and around Basingstoke during the 1950's with the University of Reading.

The University library receives a small grant to digitise the collection and, with the permission of the owner, procedes to create a high-quality digital version of the collection, using the DCMI Libraries Application Profile as the basis of the metadata schema. The collection is managed and made available through the University Web site, but a copy is also deposited with a nationally-funded 'arts' repository. The deposit process requires that each digitaised image is bundled together with its metadata in the form of a METS package, before being drag-and-dropped into the repository via its WebDAV interface. The national repository requires that both DC and LOM metadata are deposited, so the tool used to create the METS package calls a 'DC to LOM' conversion Web service offered freely on the Dublin Core Web site to automatically generate the LOM version.

2.4 Researcher depositing experimental data from laboratory equipment to institutional data archive

A researcher working in the area of structural genomics uses X-Ray experimentation to develop 3-D structure data about a particular nucleic acid. A PC-based desktop tool is used to manage the experimental data locally and to interface to the X-Ray equipment.

The researcher decides to deposit the resulting structure data to his institutional repository. He selects the deposit 'button' on the desktop tool and enters the relevant metadata about the structure. The tool is configured to use two external, nationally-hosted SOAP-based Web services: the first converts the data to 'mmCIF' format; the second validates the format consistency of the supplied coordinates and generates a validation report about the structure. The report is presented back to the researcher on the desktop. Having confirmed acceptance of the validation report, the tool uses the SOAP-based deposit interface to the institutional repository to deposit a copy of the validated structure and its metadata.

3. Terminology

The following parties are involved in content submission:

content submitter
The content submitter has one or more items of content to make available, i.e. deposit.
repository
A repository is a managed place where content can be deposited and made available.

Content submission is not always a manual process. Examples of automated content submitters (i.e. agents) include Web-cams and laboratory devices.

In some cases the content submitter is also the content owner or creator, in other cases the owner or creator may hand over responsibility for submission to a third party content submitter (possibly in the form of a submission service). For example, a researcher may ask a librarian to deposit a copy of a research paper in an e-print archive.

An institutional repository is a repository that is managed within a UK higher or further education institution and where the content submitter is typically expected to be a member of that institution. Examples include an institutional e-Prints archive, an institutional learning object repository or some other kind of content management system.

A non-institutional repository is a repository that is managed by an organisation outside of the UK higher and further education community and/or where submission is not typically restricted to members of an institution. Examples include national services such as JORUM, the NLN and the AHDS, repositories maintained by scholarly publishers and scientific consortia, and subject-based e-Prints archives.

It is also possible for a content submitter to manage their own personal repository.

In many cases content is submitted directly from the content submitter to a repository. However, in some cases, there may be a direct exchange of content between two repositories.

Note that (somewhat confusingly) in the terminology of the JISC Information Environment technical architecture, a 'repository' is a 'content provider'.

4. Metadata

At the point of content submission, metadata is likely to be required to be submitted along with the content item(s). This document makes no firm recommendations about what metadata schemas should be used for this purpose. However, it is anticipated that a metadata record conforming to some profile of simple Dublin Core (DC) should be the minimum requirement.

In the case of the deposit of a learning object to a learning object repository, submitted metadata should conform to the IEEE LOM metadata specification.

As is always the case with metadata, agreements about what metadata schema to use are only part of the solution. Cataloguing guidelines also need to be considered covering, for example, the use of subject classification schemes, audience-level and resource-type terminology. This document makes no recommendations about appropriate cataloguing guidelines.

5. Content (data) formats

Prior to submission there need to be agreements in place about the content formats that can be submitted to any particular repository. For example, a repository might mandate that all images are to be submitted using the PNG format.

This document makes no recommendation about what content formats are appropriate for particular application environments. The JISC Standards and Guidelines to Build a National Resource document lists many of the appropriate standards (but note that this document is currently under revision).

6. Content packaging

Aditionally, there need to be agreements about how multiple content items are packaged together and about how metadata about those items is packaged.

There are three metadata packaging options:

embedded
The metadata about a content item is embedded into the item. Not all content item formats support embedded metadata. Examples of those that do include XML, HTML, PDF and PNG. MS-Office document formats provide limited support for embedded metadata.
packaged
The metadata about a content item is packaged with the item in some way, for example by grouping the content and metadata files in a ZIP archive, or encoding them in a single XML document.
coupled
The metadata and content item are submitted separately (typically as part of the same submission process), but the content item is identified in such a way that the metadata can refer to it explicity. For example by assigning a DOI to the content item and using that to identify the item in the metadata.

Note that many resources will comprise multiple content items (e.g. text and images). In some formats, these will be embedded (e.g. MS-Word), in others they won't (e.g. HTML).

Where multiple content items and their associated metadata are packaged together, packaging should conform to one of the following:

7. Transport

Content (and packages of content and/or metadata) needs to be passed across the network from the content submitter to the repository (or between repositories). This should be achieved using one of the following mechanisms:

Note that submission may be initiated manually or automatically and may take the form of a push (content submitter to repository) or a pull (repository from content submitter). Push may be implemented in the form of a 'regular pull' (a la RSS).

8. Authentication/Authorisation

There will need to be appropriate authentication at the interfaces to a reporitory to ensure that only authorised content submitters are able to deposit and update submitted items.

For institutional repositories it is anticipated that authentication will be based on the same mechanims used for other institutional services, e.g. access to email. For national repositories, authentication mechanisms should use Athens (or appropriate agreed future authentication standards).

9. Licencing, rights and certification

There needs to be some mechanism in place to allow content creators/owners to licence repositories to make their content available. Currently this is either done manually before content submission takes place (e.g. AHDS) or implicitly/not at all (e.g. University Web site). In many cases, it will also be necessary to attach an explicit end-user licence to the content as it is served from the repository. Solutions in this area should be based on one of the following 'rights markup' langauges:

and/or make use of the Creative Commons licences (which allow a content owner to easily attach a 'some rights reserved' licence to their resources).

These approaches allow content owners, submitters and repositories to encode rights statements about the resources being submitted, in such a way that other software applications can parse and take action based on them.

The JISC-funded RoMEO project investigated the rights issues associated with institutional e-print archives. The Australian COLIS project is using a application profile based on IEEE LOM and ODRL to include statements about the rights associated with learning objects in learning object repositories in such a way that they can be acted upon at the point that a learning object is delivered through a VLE.

10. Provenance

There may be a requirement to attempt to certify which person or organisation created a resource available through a repository. The use of digital signatures for this purpose needs to be investigated.