Eprints Application Profile Meeting 2006-06-05 Minutes
This page is part of the Eprints Application Profile Wiki.
Scope and terminology
There was some discussion over the definition and scope of ‘eprints’. Suggested definition was ‘scholarly publication in electronic form’ although this might be too narrow. Scope certainly does include ‘grey’ literature, e.g. posters, presentations etc. The scope will be defined by the type list in the application profile. This needs to be flexible and extensible, to cope with potential future expansions to incorporate other types.
ACTION: check RAE scope and definition, e.g. exhibitions (EprintsAP)
ACTION: decide what is in and out of scope tied into the type list (EprintsAP)
ACTION: agree on whether ‘Eprints’ is the right term (EprintsAP)
Issues with using simple DC
Main issues as follows:
- Modelling the relationship between the work, expression and manifestation (in FRBR terms) and how different repositories interpret and deal with this. Current usages of dc:relation are ambiguous as it is impossible to know what the relation field is being used for.
- Can't reliably move to the full-text from the m/d
- Subject terms – can’t tell author-defined keywords from controlled terminology.
Names of authors, publishers and contributor – lack of standardisation, no affiliation or role info - difficult to usefully disambiguate; difficult to browse
- Use of dates are ambiguous, in terms of both format and type, impossible to know whether a date applies to a work, expression or manifestation
- Identifiers - don't know what URI you expect to resolve - identifier vs. locator issue, difficult to identify what the URI is pointing to
Main ‘use case’ is to use qualified Dublin Core to provide richer metadata for services, specifically the UK search service, to 'do more with'. Using Qualified DC is what JISC have specified, although it is important to be aware of what others and doing in the same area, e.g. OAIster and MODS, DARE and DIDL BL and e-theses. Other application profiles are certainly in scope for this work.
ACTION: watching brief on other activities in the area, e.g. OAIster, DARE, Arrow, DC-citations, COPAC, BL (EprintsAP)
The Producer-Consumer roles for the search service and repositories might be two-way if enhanced metadata is to be harvested back into repositories. This would require change management tracking. It was agreed that this is out of scope for the application profile work.
There is insufficient time to usefully gather scenarios and write use cases, although some might be created as work progresses to support identified usage scenarios. Importance of documenting requirements was accepted by the group but existing practical experience and expertise will contribute much to our understanding of requirements. Versions project will make available the results of their survey into academic versioning.
ACTION: Versions survey results to be made available confidentially to WG. (EprintsAP)
It was pointed out that bibliographic citations and references are not mentioned in the functional requirements, nor are they part of the model or application profile. It was agreed that this was an oversight.
ACTION: Add Bibliographic Citation and References to the model, application profile and functional requirements. (EprintsAP)
The model is based on FRBR. FRBR was developed primarily with monographs in mind, and its application to ePrints introduces some new challenges. The use of FRBR as the basis of a DC application profile also contrasts with many previous uses of DC. DC is traditionally used in a very flat way – this modelling attempts to extend its use to groups of related items. Specifying the data model for eprints is a key aspect of the work and exists outside of DC. Alternative metadata formats could be used, e.g. MODS. There is a question over whether FRBR and MODS/MARC can work together.
The drivers for using FRBR are: to facilitate grouping expressions of the same work, to clarify relationships between those expressions and to avoid having multiple records for what is apparently the same work without any means of identifying whether they are indeed the same versions.
Overall FRBR Model Approach
Agreed that this data model was a useful approach as it tackles issues (such as different versions, name authority) that access to the full-text might not. Modelling is increasingly used. Model is also extensible.
ACTION: Invite FRBR experts to feedback group, Tom Delsey?. (EprintsAP)
Some issues and concerns, as follows:
Definition and Complexity
Academics will have their own definition of what is a new work. It is also a much more complex modelling than the previous simple DC profile. We therefore need clarity over how the application profile defined what is a new work and what is an expression. Issue in complex approach.
ACTION: Define what constitutes a work and an expression, using examples.
There is an issue in that the version/expression level is trying to do too many things, e.g. different revisions, translations, ‘abridgements’ etc. Agreed that horizontal relationships for translations and revisions are critical, but ‘abridgements’ (e.g. presentation based on a paper) are not so important. Depending on how a ‘work’ is defined, there may need to be relationships between expressions of different works.
- hasRevision or/and IsRevisionOf
- hasTranslation or/and IsTranslationOf
Agreed that there are various issues at different levels. Access rights at Copy level; Copyright and IPR at Work and Expression level. Issue of authors signing away rights of particular Version/Expressions. What is in and out of scope for rights? Licenses are out of scope, but could be identified at Copy level. Might need use cases for rights issues here.
Agent – identifier
Agreed that agents need a unique identifier.
Agent – affiliation
Agreed that this is a very important issue. Two entities here: Agent and Agent instance. The Agent instance provides the affiliation for the Work. If the Agent details change, e.g. if the Agent moves institutions, the instance must not change. The eprint isCreatedBy the Creator who is an instance of the Agent. Should the affiliation be a property of the eprint, rather than the Agent? How do multiple authors affect this? Do we need to know which author is affiliated with which institution. It was agreed that identifying people is difficult and complex.
Agent – other issues
Separating corporate and personal names Attributes for corporate agents
Should there be back/upward and forward/downward references or is this open to individual implementation? Are there use cases for upward relationships? e.g. find a Copy without access rights and go upwards to find an alternative. What is returned in a remote search, e.g. SRU – a full description set or a single ‘item’ – what would be contained in a single item? OAI is not an assumption for this profile.
There is a need to standardise Simple DC use in repositories. Details about how this application profile dumb-downs to Simple DC are required?
Legacy meanings associated with 'Version' could be a problem. Is it useful to create new terms, rather than use FRBR.
ACTION: Consider using FRBR terms throughout or replacing Version with Expression. (EprintsAP)
Metadata quality and completeness
Remains an issue.
Agreed that the work should have one title. Alternative titles would be represented at the Version/expression level.
Title needs a language attribute.
Agreed that it’s important to identify terms taken from specific schemes and distinguish these from free-text keywords. This functionality exists.
Possible requirement for annotation, e.g. connotea tags
Awareness of future potential for auto-classification, e.g. current work in Netherlands
Are different types of description needed? Most important is that there is a description or abstract of sorts.
ACTION: Explore need for other descriptions. (EprintsAP)
Agreed that the work needs a persistent and unique identifier. Is it enough to say "This is the url for the eprint" or should there be further guidelines about what the identifier resolves to, e.g. "It is common to configure this uri such that it points to the jump-off page". This is not necessarily the identifier of the metadata record, nor is it the same as an oai identifier.
Multiple authorship and multiple deposits are an issue. Ideally one work, one identifier, although in practice this might not be possible.
Agreed that the creator might not be an author. This element is the 'isCreatedBy' relationship; the value of Creator is a pointer to the Agent.
This element is the 'isPublishedBy' relationship; the value of Publisher is a pointer to the Agent.
Editor / IsEditedBy [NEW]
Funder / IsFundedBy [NEW]
Supervisor / IsSupervisedBy? [NEW]
ACTION: Explore need for this element. (EprintsAP)
Agreed. Internal link to description of version/expression.
The current definition was contentious. Agreed that DateAvailable might be better, defined as the date on which the described version of the eprint was made publicly available. DateIssued or DatePublished – a more formal publication date might also be needed for formally published versions.
ACTION: identify what dates need to be captured and at what level and what date elements are available in qDC. (EprintsAP)
The NISO/ALPSP terms are contentious and currently only in draft. It was agreed that usage of this list would be premature. It was agreed to move PeerReviewed and NonPeerReviewed to this element (from Type). It was agreed that more community work is needed on status.
ACTION: recommend to JISC that work is done in this area. (EprintsAP)
Difficult to make recommendations about how this is used. Agreed that it has potential use in identifying the latest revision/version of a version/expression. Might be a numeric value or datestamp.
Version Description [NEW]
Free-text description of the current version. Agreed to add this.
Title of the version/expression, one only, acts as an alternative to the main work title.
Citation, agreed to adhere to dc-bibliographic citation for this.
References / Cites [NEW]
Uses dcterms:references. Cannot be mandatory at present. Difficulties in how to format this. Further understanding needed,
Further work needed.
Types are problematic. Current list is out of date. A type list is needed and requires community input.
ACTION: look at lists that exist already, including openurl genre lists. (EprintsAP)
Not needed for format.
Remaining elements weren’t considered.
Workplan and closing actions
ACTION: Use email@example.com for discussion. (EprintsAP)
ACTION: Invite additional feedback group members on to this list. (EprintsAP)
ACTION: Summarise discussion and agreements on to wiki regularly. (EprintsAP)
ACTION: Provide help instructions and logins for wiki, to those who want them. (EprintsAP)
ACTION: Discuss individual elements and properties via email. (EprintsAP)
ACTION: Divide application profile wiki page into separate pages for each element; (EprintsAP)
discussions to be recorded (i.e. summarised from email discussions - see above) on the accompanying 'comments' page.
ACTION: Additional work on mandatory, recommended and optional elements. (EprintsAP)
Agreed that it has been assumed that this work is aimed at the Intute search service.
ACTION: Clarify this with Intute and JISC. (EprintsAP)
Agreed that the timescale is very tight and may be unrealistic, although it was agreed that it is possible to propose a richer set of metadata by July.
ACTION: Feed this back to JISC. (EprintsAP)
Agreed that any application profile would need to be tested; repository software developers might need funding to do this.
ACTION: Feed this back to JISC. (EprintsAP)
View all actions from this meeting