The role of 'self-archiving' and institutional repositories in scholarly communication

First published in: Piglet [staff newsletter of the University of Bath Library and Learning Centre], Vol. 8, No. 12, December 2003.


In my last piece, I outlined some recent events in the development of open-access journals and speculated on the effects that this might have on scholarly communication. In that, I mentioned that while open-access publishing had many supporters, some thought that the massive amount of publicity that it has received recently had drawn attention away from the 'real and immediate' gains for open-access that could be achieved by the 'self-archiving' of papers already published in so-called 'toll-access' journals. This article introduces 'self-archiving' and the related concept of institutional repositories.

The idea of 'self-archiving' has been around for a few years now. While initially it was sometimes seen as a possible alternative to publishing in peer-reviewed journals, its main proponents now argue that it should be seen primarily as a supplementary activity, focused on increasing the research impact of authors. 'Self-archiving' is based on the relatively simple idea that authors, in addition to publishing in peer-reviewed journals, should make papers publicly accessible by depositing (or archiving) them in e-print repositories. The deposit of papers (and their metadata) in repositories - rather than just their uploading to a personal or departmental Web page - is an important part of this model because it facilitates the development of third-party services that can give enhanced access to the content of multiple repositories. This process has been supported by the development of standards like the Open Archives Initiative Protocol for Metadata Harvesting, a protocol that can be used by repositories for sharing metadata with such services.

The most commonly cited exemplar of an e-print repository is the physics, mathematics and computer science repository operated by Cornell University ( This was originally set-up at the Los Alamos National Laboratory in the early 1990s to facilitate the prepublication communication requirements of high-energy physicists and is now one of the largest e-print repositories, with over 250,000 papers. While many of the most successful e-print repositories remain subject-based, there is a growing interest now in the development of institution-based repositories, defined by Raym Crow as digital collections that capture and preserve the intellectual output of research institutions and universities. By definition, the focus of institutional repositories potentially goes well-beyond e-prints - e.g. it has been proposed that they also might contain scientific data, learning resources, even administrative records - but, for now, e-prints are likely to remain an important component of early implementations. Software for supporting the development of repositories is now becoming available. Examples include the DSpace system jointly developed by MIT Libraries and Hewlett-Packard ( and the Fedora digital repository management system developed by the University of Virginia Library and Cornell University ( In addition, there are now a variety of initiatives designed to investigate, support and promote the creation of institutional e-print repositories. These include the JISC-funded FAIR (Focus on Access to Institutional Resources) Programme and the Dutch DARE (Digital Academic Repositories) initiative.

Despite these important developments, however, the number of repositories and the quantity of e-prints deposited in them remains relatively small. For example, in the summer of 2003, a JISC-funded feasibility and requirements study on the preservation of e-prints attempted to count the number of visible e-prints in the '' domain, finding just over 5,000 e-prints, of which more than half were located in two repositories based at the University of Southampton. While this may not have taken account of all activities, it is a signal that the take-up of 'self-archiving' by UK academics has (to date) been disappointing. There are many potential reasons for this, e.g. it could be related to a lack of easy-to-use 'self-archiving' tools, but may reflect a wider misunderstanding of the cultures of scholarly publication.

One issue, for example, is that of the copyright and licence agreements that authors sign in order to publish. In the past, the proponents of 'self-archiving' encouraged authors to retain the right to deposit their papers in e-print repositories. Now, as many publishers have produced policies on this issue, some effort has been expended on their categorisation. A good example of this is the colour-coded list of publisher policies produced by Loughborough University's Project RoMEO, the maintenance of which will soon be taken over by the SHERPA project ( An issue often raised by librarians concerns the responsibility for the preservation of the content of e-print repositories. For some proponents of 'self-archiving' this is a non-issue. They insist on maintaining a distinction between 'self-archiving' or 'deposit' and 'publishing.' This means that they can argue that e-print archives are an essentially supplemental activity and that there is no real need for the long-term preservation of e-prints - although they almost always simultaneously highlight the 'longevity' of the service (about 12 years). They argue instead that those who raise the preservation issue should turn their attention to the preservation of the 'canonical' versions, stored in publishers' databases. In the longer term, however, these distinctions may become untenable. It is interesting, therefore, that the National Library of the Netherlands has proposed to take responsibility for the long-term preservation of e-prints in the context of the DARE initiative.

So what issues do 'self-archiving' raise for academic libraries? Libraries have often been in the forefront of the open-access movement, believing that it may help offset the negative effects of journal cancellations on current users and - in the longer-term - may help to solve the serials pricing crisis. More generally, however, the 'self-archiving' of e-prints represents both a threat and an opportunity for libraries. The opportunity is the need for library leadership of (or at least involvement in) the development of institutional repositories within universities. However, their successful development is going to need additional infrastructure, staff expertise and finance. Even if open-access principles do in the longer-term help drive down the costs of journal subscriptions or licences, the likely co-existence of two publishing models will almost certainly mean additional costs in the shorter term. The opposite risk may be that repository development occurs outside the library context, e.g. in university departments or through discipline-based services - meaning that the library may have a diminishing role in the scholarly communication process. For example, becoming the custodians of a rapidly dwindling collection of physical and licensed journals, which may be difficult to integrate with the new discovery tools developed and used by academics.

Even now, when the success of 'self-archiving' is far from evident, it is likely that the existence of e-prints is already having some effects on library use. For example, while relatively few e-prints are being deposited in e-print repositories, many authors seem happy to place draft or published versions of papers on personal or departmental home pages. The intelligent use of Internet search-services like Google may mean that there is less use of licensed journals or inter-library loan services.

UKOLN is involved in several projects with relevance to the 'self-archiving' of e-prints and institutional repositories. Most of these concern the use of the OAI protocol. For example, it led the Open Archives Forum project (, a focus for European activity on open archives. UKOLN is also the leader of a project called ePrints UK, funded by the JISC as part of the FAIR Programme ( The project's primary objective is to develop services (based on the OAI protocol) through which the UK higher and further education communities can access the collective output of e-prints from UK repositories. Anyone interested in or who have comments on these projects are invited to look at the UKOLN Metadata Web pages ( or to contact me directly.

Michael Day, UKOLN, University of Bath