UKOLN RDFHarvester Demonstrator

UKOLN RDFHarvester: a PRIDE tool

Contextual overview (this page)
A sample run
Configure/re-configure the Harvester

Background: The RDFHarvester is one small part of the PRIDE Directory Toolkit. It is an automated agent (or a robot) that is designed to work in the background (set up as a 'cron job' on Unix, for instance). These html pages and the scripts that they link to are not required to run the agent: they exist to provide a visual aspect to the working of the RDFHarvester, however, if they are of interest, the scripts are available.

Contextual overview: a look at the context for a xml:rdf harvest

The RDFHarvester uses the PRIDE directory to read its configuration information. From the 'config' directory entry, it learns the what to do: it reads information about location in the PRIDE directory where it will find and in turn read a list of WWW locations where data to be harvested exists and it reads where (in the PRIDE directory tree) to write a given entry. In other words, it makes a basic configuration read followed by a read of things to do. Once it 'knows what to do' (i.e. where, on the web to find directory content information in RDF fromat) it systematically visits each WWW location, gathers the specified data, and converts it to LDIF format for the PRIDE directory.

A Sample Run: using the above configuration values

You are not required to do anything (except press the button), but,
to keep in mind the above configuration settings the critical values are repeated below.

Step 1 (read the config entry)
- the dn of the config entry is "cn=config, cn=RDFHarvester, dc=pride-inf, dc=org"
- the dn of the source entry is "cn=sources, cn=RDFHarvester, dc=pride-inf, dc=org"
- the dn of the target entry is "cn=HarvestedData, cn=RDFHarvester, dc=pride-inf, dc=org"

Step 2 (read the sources entry)
- the WWW location of the source file is http://www.ukoln.ac.uk/metadata/pride/xrdf_file

Step 3 (the robot goes to one or more WWW sites to gather content descriptions)
- each file (containing one or more descriptions) is parsed and represented as LDIF

Step 4 (the robot binds to the directory and creates the relavent entries)
- the robot has administrator rights (i.e. it can write to the directory)
  in the area that will hold the freshly harvested entries

view a subtree to delete specific items

Configure the RDFHarvester: change to new configuration

In changing the values in the RDFHarvester config and sources entries the robot will perform as the new values dictate. In the event that errors are made in the config/source entries, the original values can easily be reloaded (feel free to experiment).

view a subtree