Programme meeting 10-10-2005 summary
Joint Programme Meeting : Digital Preservation & Asset Management and Digital Repositories
Summary of the meeting : draft
NB: A full agenda, with links to some presenations from the day are available from the JISC web site
Welcome & opening plenary
After an introduction from Rachel Bruce, Cliff Lynch opened the meeting with a thought-provoking plenary speech. His words brought focus to many questions and issues relevant to the global scope of repositories and digital preservation. These issues were further debated in the ensuing discussions and workshops.
Repository deployment is 25% and rising in UK HE Institutions. In the Netherlands it is 100%, but US deployment is hard to scope.
Storage should be distributed, robust and with replication. Data loss and the risk thereof should be a thing of the past. Centralisation is the enemy of preservation; in the same way that central decision-making is bad for culture, scholarship and science. Repositories can act as a vehicle for preservation, but how do repositories deliver distributed autonomous management of content? And how do collaborating repositories manage their curation responsibilities? How is curation reflected in metadata? Where replication is carried out, format migration by one repository can lead to forked curation and must be documented in the metadata.
Deployment is one thing, but adoption is quite another. The process of submission should be simplified, integrated and should demand minimal metadata from contributors. To add value, submission could trigger other things, e.g. copyright and licences, submissions to other repositories, submission to journal publishers and so on.
Authored texts and datasets
Few national data archives exist in the US, particularly for the humanities. Therefore, much research data will be submitted to institutional repositories. In the UK, there is a mixture of deposit to national centres and local repositories. Cross-referencing and linkage between data and texts is a significant issue and one that the JISC DRP seeks to address.
Tools and metadata creation
Much metadata could and should be automatically generated through machine processes. Descriptive and detailed subject and/or pedagogical metadata cannot be automated.
Repository functionality needs to reflect the policies and practices of the faculty and/or institution. Establishing the ownership of data by the institution or the individual is important, but becomes murkier when funding bodies and collaborations are involved. Can the repository make inter-institution collaboration easier? What happens when faculty members move institution? The faculty bibliography is reflected in the institutional repository, as is a personal C.V. Does a new institution populate its own repository with a new staff member's past work, including rights clearance?
Legal issues can also be problematic, but the primary question should be one of scholarly needs - "figure out what is needed, then figure out how to make that legal".
Digital libraries and institutional repositories
How does digital library and repository content differ? Does content move between them? Should there be policies for this?
Telling stories with repositories
How can repositories be use to tell stories - about contributions made by institutions, about disciplines, about individuals? Could learned societies have a role in this? What about synthesis tools, such as wikipedia?
Themed break-out sessions
Workflows (Chris Rusbridge & Helen Hockx-Yu)
Workflows concern data, tasks and people and are a structured definition of procedures. They combine human and automated tasks.
Repository workflows can be a subset of other workflows and it is important to understand the interactions between them. There need to be case studies or best practice guides in terms of how repository workflows fit into other, sometimes ill-defined, workflows.
Workflows can be different for different audiences and need to be properly documented.
Need common ways of describing workflows, especially ways to specify human workflows. The human element of workflows is the most challenging.
Not everything needs to be preserved and not all repositories can take on the task of preservation. But repositories should have retention policy and retention assessment built into the workflows. OAIS should be more widely used for analysis of workflows.
We might be able to learn from some records, and document, management practices: making tacit knowledge of provenance explicit; and the need for audit function within workflows. For repositories, deposit should be streamlined and embedded into personal and institutional workflow. The flow of information between repositories and other systems is also important.
Roles and interactions between repositories (Paul Ayris & Neil Jacobs)
Embedding repositories and services within institutions / subject communities
What possible relationships are there between different repositories?
- No culture of identifying separate repository services. How are services tied to repositories (intra-/local or extra-/beyond)?
- Institutional policies / cultures often geared to prevent sharing (e.g. p2p architectures)
- Local adoption can be best supported by conditions opposite to need for national services.
- e.g., reassurance that this is their space, is safe, won't be critiqued.
- Contrast with preservation / sharing priorities of regional approach.
- How to turn local resources into reusable objects? (teaching is individual, research is collaborative? culture and incentives)
- Departments looking for data management solution, so long as only they can see data? Could this be a tactic to encourage deposit, eventually working toward OA / sharing?
- NERC data centre culture / policy to move from private to shared access (but many projects don't deposit)
- Disciplinary differences - some disciplines really need to share data (genetics) in order to be effective.
- Using repository for RAE submission (IRRA); institution doesn't want this submission to be OA. Authentication required.
- Does RAE skew OA agenda? Need to find a way to make these work together.
- Relation b/w repository and research management system. Role of repository is to gather intellectual output.
- Partial closed access during project / initiative, OA later.
- Objects deposited once, metadata record enhanced over time depending on what the object is used for.
What other entities (e.g., registries, terminology services) need to be in place for repositories to interact with each other, and for them to underpin user-oriented services?
- Metadata: minimum level for a depositing individual, cleaned up and augmented later
- Authority services to help depositors, tied into institutional systems.
- Example of 'low level plumbing', institutions are probably doing this.
- Name authority services? And name resolution. (eg OCLC runs PICA name authority service into DSpace, but this mainly book authors)
How is versioning of information packages handled across repositories?
- Need to be accurate - different versions of research paper do not differ fundamentally in subject.
- What are disciplinary differences? Can be valid reasons in particular disciplines for going 'back' to previous versions, but also need to identify latest.
- Versions of metadata record(s).
- Sharing teaching / research - when the teaching material is based on research. The value of teaching materials is not in originality.
- Where is the value? In teaching, often not in the content, more in information about the content (eg learning design). Need to look at this from an IPR point of view.
- But is learning design 'metadata' or 'data'? We could invert this and say learning design is data and static content is metadata.
- Versions project (requirements gathering).
- Economics - tradition of getting results out early (working papers), but then need to identify the final version.
- European aspect - adequacy of RoMEO listing wrt international IPR law…
- Teaching: Draft status of collaboratively developed materials.
- Managing expectations - will a collection end up being shareable (still valuable if 'no')
- This is new - learning object model like an industrial production process - importance of version control. Not necessarily appropriate in educational institutions. Surely institutions should be putting money into generating quality metadata, to ensure content can be shared / reused effectively.
- Data: various versioning issues, sub-setting etc.
- 'Public' version and 'private' version
IPR: Repositories can act as reifying force, making things explicit that were hidden.
- Ad hoc, messy arrangements that institutions / scholars work with / depend on.
- Unwritten understandings, not supported by policies - unsustainable in longer term. Do we need a national model policy that institutions might consider adopting?
- Lack of clarity can impede deposit (but is common)
- Southampton linked up with Enterprise dept in institution. Their concern related to publishing. Approach was to ask academics to 'think twice' before giving away their IPR.
- Teaching resources - both third-party IPR and own…
- Also, issue of academic control over the objects they release, and possible tension between that and the 'academic record' / how objects are used / cited.
- (UCL) Institution has a duty of care to ensure that students (staff?) are aware of the IPR value of their work, and how it might be managed
- Public funding, why not OA?
- Partial commercial funding
- Arts and humanities theses lead to monographs - important to future career development, so don't want to share
- RCs obliged to exploit the data they (fund) produce
- IPR - Cliff's pragmatic point - find out what's necessary and then find a way to make it legal.
- "Low-level plumbing" needs to include legal, social, cultural issues.
Licence agreements with supplier organisations not always supportive of sharing / academic uses of data
- e.g. JISC model licence.
Repositories are not the only way academics and teachers share, e.g. use of unstructured web sites
- Metadata repository can point to object elsewhere (eg on personal? website)
Services, standards and specifications (Andy Powell & Lorna Campbell)
- What are the core repository services?
- What are the standards and specifications for delivery of these?
- Is there really any difference between the types of repositories at 'plumbing level'?
- How does this work fit into the wider e-framework?
This session was used to give people a chance to put faces to names and to share information about what their projects were hoping to do. The intention was that this should focus on services, standards and other technical issues but inevitably there were other more general issues talked about and not all the projects represented were technical in nature.
Projects represented in the group were PERX, GRADE, SherpaPlus, Rights and Rewards, CD-LOR and the support team from the Digital Repositories Programme, along with eSpider and DAAT from the Digital Preservation and Asset Management Programme. Individuals included Bill Oliver (e-framework), Phil Barker (CETIS SIGs) and Sayeed Chowdhury (Johns Hopkins University, A Technology Analysis of Repositories and Services project).
In terms of standards and technologies, most of the obvious ones were mentioned: OAI-PMH, SRW/SRU, Z39.50, content packaging standards (e.g. MPEG 21 DIDL), metadata standards and the international standards for metadata and web services within geography community.
For DP, the PRONOM file format registry was mentioned, as was web archiving.
Other issues raised included terminology issues, the lack of, and need for, subject/discipline information, IPR and digital preservation for metadata.
It was interesting to note that no-one explicitly mentioned authentication/access management or persistent identifiers as being the focus of their attention, although it was agreed these are common services important across all repositories.
Overall, there was general recognition that basic technologies are in place but that issues remain with how to actually deploy them in sensible/useful ways. Many of the technology-related issues around repositories are at the policy/social level rather than in the need to develop new technical standards.
Management and policy (Joe Wilson & Rachel Bruce)
There is a need to sell the benefits of repositories and to manage the change to repositories within budget. Submission can be mandatory but it needs a light-touch policy to achieve buy-in.
Different motivators for different communities, e.g.
- Teaching & Learning: improving practice
- Research: increased dissemination and profile, personal reputation, contribution to discipline group, input into the global scholarly community and preserving digital research and content as an essential part of scholarship
- Vice-Chancellors/stakeholders: quality of teaching and research, attracting students, effective and efficient working practices and, institutional reputation
- Student benefits
Other issues included the need for champions, the presentation of concrete messages to stakeholders and the use of pilots and demonstrators to engage people through example.
The issue of collaborative work and funded research raises IPR issues. Policies are needed for when staff change institutions. Few people understand their Institution's IPR policies - need top down support and guidelines. Are we being too risk averse?
IPR advocacy toolkit - JISC-SURF project.
Workshops and other sessions
Mahendra provided a brief introduction to the Digital Repositories Support team and outlined how their roles would support the work of the projects. Work areas for the support team include: providing training and guidance in scenario writing and use case development; collecting, collating and preparing use cases; providing guidance and collating workflow documentation; offering advice and support to projects; disseminating and synthesising programme outputs to a wider audience within JISC, the e-framework and the wider global community. Dissemination and communication will include a wiki environment where projects can access and submit material.
Sayeed Chowdhury (Johns Hopkins University) and Howard Noble (ASK project, Oxford University)
Sayeed gave a brief introduction to his project, A Technology Analysis of Repositories and Services. The project is gathering scenarios and writing use cases with the intention of developing a set of functional requirements to test against existing repository software. Howard spoke about ASK's involvement with this project and also noted that there is also a need to look beyond how tasks are currently done, to how emerging technologies might be exploited, e.g. Flickr.
Digital Preservation, Records Management and Repositories (Helen Hayes)
What can we learn from records management? Use of repositories for management information, e.g. project bids and funding applications (departmental/faculty). FOI and full economic costing.
RDN Resource Discovery and Metadata (Caroline Williams, Andy Powell, Linda Kerr and Mark Williams)
Caroline Williams gave a brief overview of the Resource Discovery Network whose mission is to "to advance education by promoting the best of the Web, through evaluation and collaboration". Andy Powell spoke about the technology and standards behind the RDN, including Dublin Core. The RDN has expertise in subject discipline, metadata creation and classification. Parallel brainstorming sessions discussed metadata and resource discovery:
- Sustainability and cost
- Do we need metadata? Yes, it's important for discovery but it's also overrated; need more of the right kind, but less that is unnecessary. Automated or manual metadata? Need a mixture of both; automatic wherever possible. Where the full text is indexed, manual metadata should add-to what can be automatically discovered, rather than duplicate.
- Different types of metadata: rights metadata; preservation metadata (how the object can be sustained over time); technical metadata (automatically generated); pedagogical metadata.
- Terminology, cataloguing and skills
- What is metadata for? Who is it for? Needs to be contextualised, e.g. different requirements for learning objects, data sets, papers etc. Who creates it?
- Guidelines required to implement metadata standards.
- Discovery should be simple to use but with advanced technology behind.
- Need to structure repositories / collections for users, such as subject browsing
- Repositories should learn from the RDN
- Demand for many ways to browse - specialist browsing needed
Summary & closing remarks
As highlighted by the themed break-out sessions, there are many questions around the topic of digital repositories and digital preservation, and, at present few answers. Discussion in the various sessions and workshops overlapped significantly, bringing to the fore many key common issues. These include, but are not limited to:
- Interoperability and interaction between repositories and related services within the e-framework and internationally. The need for access across different repositories, with seamless authentication, rights management and the use of persistent identifiers.
- Commonality between services for different repository types - common open standards and specifications are available. Unique requirements do exist and need further exploration and scoping.
- Many issues are at the policy, social and cultural level, rather than at the technical 'plumbing level'. These are wide in scope, covering: advocacy, uptake and buy-in; institutional policy and departmental practice; cost and benefits; sustainability; embedded submission and workflows; IPR and digital rights management.
- Metadata creation, terminology control, subject classification and the level of input required from individuals and/or information staff.
- Digital curation and preservation as core functions.
The Digital Repositories programme and the Digital Preservation and Asset Management programme, with their 30 plus projects, all have a significant role in finding answers to the global questions about the future of data curation, scholarly communication, open access, shared advocacy and innovation.
Thanks to Andy Powell, Helen Hockx-Yu, Linda Kerr, Neil Jacobs, Helen Hayes, Caroline Williams and Rachel Heery for notes. Julie Allinson, DRP Support, 14-10-05; revised 17-10-05; revised 28-10-05.
--JulieAllinson 08:19, 1 November 2005 (GMT)