Integrating infrastructure cluster session 2006-03-27 report

From DigiRepWiki


Integrating Infrastructure cluster session

27th March 2006

Informal report from the integrating infrastructure cluster session at the Second JISC Digital Repositories Programme Meeting, 27th-28th March 2006, Warwick. Participants in the session are encouraged to add to or comment on this document, either by editing directly or adding comments to the associated talk page.

Neil Jacobs, JISC funding

Neil talked about forthcoming JISC funding for the repositories area, with particular reference to the Integrating infrastructure context. Areas for funding included common standards agreement, technical interoperability, cultural / organisational interoperability, tools and innovations, establishing a ‘Network of experts’ to support institutions, creating an interim repository and developing a national search infrastructure for eprint repositories. A Roadmap to facilitate discussion of where repositories are going has been created by Andy Powell and Rachel Heery and will be used to inform future JISC repositories funding and development. This will be available soon.

Alma Swan, Linking Repositories scoping study

Alma Swan gave a presentation on this study (available here), currently being carried out by Key Perspectives Ltd and the University of Hull, with significant input from SHERPA and the University of Southampton. The final report is due at the end of April 2006 and its purpose is to scope technical and organisational models for establishing a national repository services infrastructure.

The study is looking at user requirements, the roles and responsibilities of repositories and services, technical architecture and business models for offering a viable and sustainable infrastructure. In the remainder of the presentation, Alma talked about findings to-date in each of these areas, stressing that information gathering is ongoing and contributions from attendees at the session were welcome.

In considering users, the study has highlighted that the unsophisticated end user is not the focus for national aggregated repository services, as these users are increasingly ‘googlised’ and satisfied by search engine results. Sophisticated primary users, secondary (intermediary) users, such as librarians and information manipulators, and meta-users, which could include research councils, RAE panels or open access citation metrics analysts, are those for whom aggregated services are provided. Quoting Anderson and Heery’s ‘Repositories Review’, Alma stressed that the delivery of repository services “should be to communities of practice if take-up is to be achieved”.

The technical model recommended by the report will be a harvesting one, based on an earlier model, with some refinements to build in flexibility. In identifying the technical models, the study has looked primarily as open access repositories, but has considered the heterogeneous repository landscape and a range of non-OA repositories. In identifying services, there must be integration with the e-Framework.

Issues are wide ranging, from the technical (identifiers, packaging), to the organisational (FoI, branding, the role of national libraries), to those that span both (metadata, rights). Dealing with complex and varied content is a key area for study and repository services must consider whether providing access to a range of content is both desirable and viable.

It may be the case that the landscape needs time to mature before we can truly identify what services we wish to offer and build upon repositories. Identifying requirements, gaps and challenges and establishing the business model will drive this process. Issues of cost, scalability and an assessment of associated risks will also need to be considered in any future service development.

In conclusion, Alma stressed that communication is critical to the development of good, viable, sustainable services.

Questions for Alma touched on a number of interesting areas. It was suggested that scenarios and use cases might help to scope the requirements for the repository landscape, yet it is also important to predict the ‘to be’, in addition to identifying the ‘as is’ – Alma confirmed that their study has an element of ‘stargazing’. There was some concern expressed over the fragmentation of the D2D landscape, particularly if commercial discovery services that already offer a gateway to some repository content aren’t considered. Focussing on ‘simplicity’ for the end-user was seen as being in danger of dumbing-down services and led to the suggestion that ‘well-designed’ services are better than ‘simple’ services.

Summary of discussion

The ensuing discussion ranged across a number of interrelating topics:

(1) What is a repository?

  • There is ongoing typology / ecology / roadmap work in this area to help understand the landscape.
  • Notions of formal, informal, distributed, institutional, subject-based, national etc.
    • Personal web space can be a repository, where metadata exists and an institution has committed to maintaining the host server.
    • There are benefits in an informal approach where the author retaining ownership of “my repository”
      • P2P has potential in this area and, if used with Shibboleth and creative commons, it might be all that is needed at the informal level.
    • There is potential for informal repositories and formal repositories to co-exist, and for informal outputs to ‘slide’ into formal repositories.
    • A repository is defined by the standards used, which are in turn defined by function and context
      • e.g. preservation function
  • A commitment to structured metadata is common across all repositories.

A clear understanding of repository types could be embedded into a …

(2) National / international repositories body

  • Do we need ‘coalition(s)’ of repositories?
    • What would be the scope?
    • A national authority for registering collections, services (integrating with the e-Framework), metadata schemas, roles, responsibilities etc.
    • A national architecture for repository development.
    • Clarifying confusion in the current landscape, the perception of competing services, e.g. institutional, subject-based, national repositories and the potential for duplication that this range of repositories could promote (duplication in terms of duplication of effort by depositors). There might be an interesting study into multiple deposits of collaborative work.
    • To enable data flow between repositories that doesn’t currently exist.
    • To provide a collective voice on cultural / legal issues.
    • To offer resource, in additional to advice, e.g. hands-on technical support.
    • To provide clarity in the identification of business requirements and critical points for repositories, with reference to the typology/ecology.
  • How would it be implemented?
  • Where would certification fit it, if at all?
  • What already exists?
    • Registries of repositories, service registries are part of this national architecture
    • Forthcoming network of experts

These coalitions grow out of …

(3) Communities

  • How do we identify communities?
    • Communities of Practice, subject communities, functional communities
    • Identified through the typology / ecology and by DRP projects (e.g. CD-LOR).
  • Services could be defined by community needs, in an iterative way
    • These might be depositor needs
    • or user needs
    • or both
  • A repository might belong to multiple communities
    • it might want to interoperate with a range of services
    • and to do that must use a range of standards
    • it is defined by its functions, standards and context
  • How are these community requirements and community standards and services articulated to repositories?
    • through clear agreements
    • and careful repository design
    • through ‘coalitions’

Community needs inform …

(4) Identifying services

  • What other services need layering onto repositories?
    • E.g. RAE submission, personal C.V., group / departmental / faculty bibliographies, overlay journals, preservation services, access to multiple file formats, accessibility, linking data and publication
  • What services would be at an institutional level and what would be layered above that, at a collaborative/aggregated level?
    • e.g. preservation services, such as Sherpa DP or PRESERV
    • Economies of scale – a service might not be economic for an institution, might be better provided as an aggregated service.
  • How do we ensure services are ‘well-designed’ and responsive to requirements?
  • How do we promote these and to avoid barriers?
  • Repository administrators and academics might not yet know what services they need, ‘high-level’ enthusiasm must guide this, coupled with a vision of what will be needed in the future.
  • Gathering user requirements (use cases, scenarios etc) might help in identifying services.
  • This is an iterative process.

All of the above helps with …

(5) Affecting culture change and changing the notion of publishing

  • At a cultural, rather than technical level
  • Through improving workflows, seeing how papers evolve from idea, through collaboration and versioning, to dissemination.
  • By removing barriers.
  • By changing the culture of ‘competition’ to one of ‘sharing’.
  • By creating incentives for depositors; added-value.

This discussion was summarised on the accompanying presentation (Integrating infrastructure.ppt), presented to the plenary session by Neil Jacobs.