Eprints.org Data Model

From DigiRepWiki

[ Home | Functional Requirements | Application Model | Application Profile | Community Acceptance Plan | Mapping to Simple DC | XML Format

This document is an analysis of the GNU eprints data model and the Eprints Application Profile.

This document takes no account of customisations, but considers purely the out-of-the box data model.

Contents

Initial observations

There are two sides to implementing the eprints application profile:

(1) Access: exposing eprints for harvesting/searching (dependent on the DC xml stuff)

Exposing oai_eprints should be relatively simple to build into GNU Eprints, using the guidelines for Expressing Dublin Core metadata using XML (once these are agreed). If nothing else were done, this would equate to exposing a single ScholaryWork record comprising of a single Expression, Manifestation and Copy.

(2) Deposit: the internal data model and deposit workflow

The most fundamental issue is to what extent GNU eprints does and can embrace the modelling aspects of the profile, i.e. how far will it go in bundling together different Expressions (or Versions), Manifestations (Formats) and Copies of a Scholarly Work.

The current system seems to have the following characteristics:

(1) assumes one expression per work (although it is possible to provide a link to the publishers metadata and/or copy which may or may not be a different expression)

(2) allows multiple manifestations to be uploaded (although there is no means of distinguishing whether two uploaded formats are two parts of the same eprint or two different manifestations of that eprint, e.g. a .pdf and a .doc vs a .doc and some additional image files)

(3) doesn't link between different ScholarlyWorks and between different Expressions, i.e. the hasAdaptation, hasTranslation, hasVersion elements from the application profile

(4) multiples not allowed, particularly for titles/descriptions; also no language attributes?

(5) The following elements are not present:

  • grant number (ScholarlyWork)
  • copyright holder (Expression)
  • licence (Copy)
  • is part of (Copy)
  • workplace homepage (Agent)
  • homepage (Agent)
  • funder (ScholarlyWork)
  • supervisor (ScholarlyWork)
  • version number or string (Expression)

The following may be system-generatable:

  • date modified (Manifestation)
  • language (Expression)

The following loosely map to existing data fields, or are system-generatable (see notes below):

  • description (Expression)
  • affiliated institution (is affiliated with > Agent) (ScholarlyWork)
  • is Manifested as (Manifestation)
  • is available as (identifier) (Copy)
  • access rights (Copy)

The remaining elements are all represented (see notes below).

Eprints Application Profile

A reminder of the attributes and relationships.

Scholarly Work

Attributes

  • title
  • subject
  • abstract
  • identifier (URI)
  • grant number

Relationships

  • supervisor (is supervised by > Agent)
  • creator (is created by > Agent)
  • funder (is funded by > Agent)
  • affiliated institution (is affiliated with > Agent)
  • has adaptation
  • is expressed as

Expression

Attributes

  • title
  • description
  • date available
  • status
  • version number or string
  • genre / type
  • copyright holder
  • bibliographic citation
  • references
  • language
  • identifier (URI)

Relationships

  • has version
  • has translation
  • editor (is edited by > Agent)

Manifestation

Attributes

  • format
  • date modified


Relationships

  • publisher (is published by > Agent)
  • is Manifested as (Manifestation)

Copy

Attributes

  • date available
  • access rights
  • licence
  • identifier/locator (URI)

Relationships

  • is part of
  • is available as (identifier)

Agent

  • name
  • family name
  • given name
  • type of agent (person or organisation)
  • workplace homepage
  • mailbox
  • homepage
  • identifier (URI)

Mapping eprint_fields.pl to eprints application profile

Mappings to Eprints application profile elements are provided below, with some notes.

$c->{fields}->{eprint} = [

#	{ name=>"authors", type=>"compound", multiple=>1, fields=>
#		{ name=>"authors_name", type=>"name" },
#		{ name=>"authors_email", type=>"email" },
#	}},


#	{ name => "creators", type => "name", multiple => 1, input_boxes => 4,
#		hasid => 1, input_id_cols=>20, 
#		family_first=>1, hide_honourific=>1, hide_lineage=>1 }, 


	{ name => "creators", type => "name", multiple => 1, input_boxes => 4,
		family_first=>1, hide_honourific=>1, hide_lineage=>1, allow_null=>1 }, 
  • Agent family name
  • Agent given name

(I'm not sure how names are constructed; the demo site separates the family and given, but I can't see that happening in the documentation)

	{ name => "creators_id", type => "text", multiple=>1, allow_null=>1, input_cols=>20 },
  • Agent mailbox
	{ name => "creators_list", type=>"compound",  multiple=>1,
		fields=>{id=>"creators_id", main=>"creators"} },

	{ name => "title", type => "longtext", multilang=>0, input_rows => 3 },
  • ScholarlyWork title or
  • Expression title

(see note about duplicates)

	{ name => "ispublished", type => "set", 
			options => [ "pub","inpress","submitted" , "unpub" ] },
  • not used ('submitted journal article' is one of our 'Types' but we don't at the moment model the publication status)
	{ name => "subjects", type=>"subject", top=>"subjects", multiple => 1, 
		browse_link => "subjects",
		render_input=>"EPrints::Extras::subject_browser_input" },
  • ScholarlyWork subject (using VES)
	{ name => "full_text_status", type=>"set",
			options => [ "public", "restricted", "none" ] },
  • Copy access rights

(check against eprints application profile access rights VES - are these the same definitions or are these internal/external access restrictions? the has VES open access [=public?], restricted access [=restricted?]; the value 'none' maps to a null Copy identifier)

	{ name => "monograph_type", type=>"set",
			options => [ 
				"technical_report", 
				"project_report",
				"documentation",
				"manual",
				"working_paper",
				"discussion_paper",
				"other" ] },

  • Monograph does not map to a value in the Eprints Type Vocabulary Encoding Scheme, beyond the generic ScholarlyText; the options map as follows
  • technical report = Report
  • project report = Report
  • documentation = ScholarlyText (generic Type)
  • manual = ScholarlyText (generic Type)
  • working paper = Working paper
  • discussion paper = Working paper
	{ name => "pres_type", type=>"set",
			options => [ 
				"paper", 
				"lecture", 
				"speech", 
				"poster", 
				"other" ] },
  • Workshop/Conference Item maps to Conference Item in the Eprints Type Vocabulary Encoding Scheme; refinements map as follows
    • paper = Conference paper
    • lecture = Conference item
    • speech = Conference item
    • poster = Conference poster
    • other = Conference item
	{ name => "keywords", type => "longtext", input_rows => 2 },
  • ScholarlyWork subject (unqualified)
	{ name => "note", type => "longtext", input_rows => 3 },
  • not used
	{ name => "suggestions", type => "longtext" },
  • not used; internal element
	{ name => "abstract", input_rows => 10, type => "longtext" },
  • ScholarlyWork abstract
	{ name => "date_sub", type=>"date", min_resolution=>"year" },
  • not used
	{ name => "date_issue", type=>"date", min_resolution=>"year" },
  • Date available
	{ name => "date_effective", type=>"date", min_resolution=>"year" },
  • not used
	{ name => "series", type => "text" },
  • part of Expression bibliographic citation
	{ name => "publication", type => "text" },
  • part of Expression bibliographic citation
	{ name => "volume", type => "text", maxlength => 6 },
  • part of Expression bibliographic citation
	{ name => "number", type => "text", maxlength => 6 },
  • part of Expression bibliographic citation
	{ name => "publisher", type => "text" },
  • part of Expression bibliographic citation
	{ name => "place_of_pub", type => "text" },
  • part of Expression bibliographic citation
	{ name => "pagerange", type => "pagerange" },
  • part of Expression bibliographic citation
	{ name => "pages", type => "int", maxlength => 6, sql_index => 0 },
  • part of Expression bibliographic citation
	{ name => "event_title", type => "text" },
  • part of Expression bibliographic citation
	{ name => "event_location", type => "text" },
  • part of Expression bibliographic citation
	{ name => "event_dates", type => "text" },
  • part of Expression bibliographic citation
	{ name => "event_type", type => "set", options=>[ "conference","workshop","other" ] },
  • not used
	{ name => "id_number", type => "text" },
  • Expression identifier??
	{ name => "patent_applicant", type => "text" },
  • not used
	{ name => "institution", type => "text" },
  • This loosely maps to ScholarlyWork affiliated institution, although the eprints application usage is wider than GNU eprints where is is used only for certain classes of item.
	{ name => "department", type => "text" },
  • not used
	{ name => "thesis_type", type => "set", options=>[ "masters", "phd", "other"] },
  • not used
	{ name => "refereed", type => "boolean", input_style=>"radio" },
  • Expression Status
	{ name => "isbn", type => "text" },
  • part of Expression bibliographic citation
	{ name => "issn", type => "text" },
  • part of Expression bibliographic citation
	{ name => "fileinfo", type => "longtext",
		render_value=>"render_fileinfo" },
  • There is a possible mapping to Expression description, although this would require a change in usage and documentation
	{ name => "book_title", type => "text" },
  • part of Expression bibliographic citation
	{ name => "editors", type => "name", multiple => 1, 
		input_boxes => 4, input_id_cols=>20, 
		family_first=>1, hide_honourific=>1, hide_lineage=>1, allow_null=>1 }, 
  • Agent - Family name
  • Agent - Given name
	{ name => "editors_id", type => "text", multiple=>1, allow_null=>1, input_cols=>20 },
  • Agent - mailbox
	{ name => "editors_list", type=>"compound",  multiple=>1,
		fields=>{id=>"editors_id", main=>"editors"} },

	{ name => "official_url", type => "url" },
  • Expression identifier???
# nb. Can't call this field "references" because that's a MySQL keyword.
	{ name => "referencetext", type => "longtext", input_rows => 3 },
]
  • Expression references

Ways forward

NO CHANGE

This would create a single ScholarlyWork - Expression - Manifestation - Copy description, as per example 2

BASIC IMPLEMENTATION

  • Add 'closed' as an option to full_text_status to enable better mapping to eprints application profile
  • Clarify extistence of Expressions, Manifestations or Copies. Current practice allows for addition of multiple files, but it seems that it isn't possible to know what whether the additional files are (a) multiple parts of the same manifestation, (b) different manifestations, (c) new expressions or (d) identical Copies.
  • Similarly I'm unclear as to whether offical URL and id_number are being used to identify different expressions or identical Copies. This might also be clarified?

FULL IMPLEMENTATION

Implement the above, plus

  • Incorporate all missing data elements
  • Enable the addition of different versions (Expressions) with additional data to be supplied about each Expression, Manifestation and Copy.
  • Enable linking between Expressions (hasTranslation, nasVersion)
  • Enable linking between ScholarlyWorks (hasAdaptation)
  • Enable addition of translated expressions.