Metadata

UKOLN Software Tools


Here are some UKOLN software tools for handling metadata in various formats. UKOLN also maintain lists of:

 HTML-sum.pl
URL http://www.ukoln.ac.uk/metadata/software-tools/HTML-sum.pl   
Author Andy Powell
Publisher UKOLN
Requirements Perl 5.0, LWP (HTML::Parser)
Description Summarise HTML page to produce SOIF record
Keywords HTML, SOIF, summarisation, ROADS, DESIRE
Language perl
Usage HTML-sum.pl [-u URL] file
Comments This script is primarily intended as a replacement for the HTML summariser that is supplied with the Harvest suite of tools. It could also be used on it's own to summarise local HTML files or in combination with, say, lynx to summarise remote pages. The '-u URL' argument causes HTML-sum.pl to generate a full SOIF record including an opening '@FILE { URL' and closing '}'. Here's a simple shell script that uses lynx, HTML-sum.pl and soif2metadc to produce a Dublin Core description of a remote resource embedded in HTML META tags:
 #!/bin/sh
 lynx -source $1 > /tmp/$$
 HTML-sum.pl -u $1 /tmp/$$ | soif2metadc
 rm /tmp/$$
 

 soif2metadc
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/soif2metadc   
Author Andy Powell
Publisher UKOLN
Requirements Perl 5.0, LWP (HTML::Entities), soif.pl (from Harvest)
Description Convert SOIF record to Dublin Core embedded in HTML META tags
Keywords SOIF, Dublin Core, HTML, META tags
Language perl
Usage soif2metadc
Comments Reads SOIF record from STDIN and writes HTML to STDOUT. See HTML-sum.pl for a simple example of use.

Can also function as an Apache server-side include (SSI) script to embed DC META tags into resources on the fly. (This script may also work as a SSI for other Web servers but this has not been tested). The script looks for a SOIF record describing the current page in a file with the same name as the HTML file but with a .soif suffix. I.e. the SOIF record for intro.html is in intro.html.soif.

The Apache syntax for calling the script is:

     <!--#exec cmd="/opt/bin/soif2metadc" -->
 
So, the <head> section of intro.html may look like this:
     <html>
     <head>
     <title>A sample page</title>
     <!--#exec cmd="/opt/bin/soif2metadc" -->
     </head>
     <body>
     ...
 

 wfsend
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/wfsend   
Author Andy Powell
Publisher UKOLN
Requirements Perl 5.0, MIME-tools (MIME::Entity), LWP (MIME::(QuotedPrint, Base64)), MailTools (1.06 or higher)
Description Send a MIME encoded Warwick Framework container
Keywords Warwick Framework, MIME
Language perl
Usage wfsend [-d] -t to -s subject [[-c|-a|-r] file ... ] ...
Comments This script can be used to send some fairly simple Warwick Framework containers by wrapping them up in MIME e-mail messages. The -c, -r and -a arguments indicate the start of a container, where -c indicates that the packages in the container are not directly related to each other (though they may each describe the same resource in some way), -r is used to indicate that the container holds the resource (object) and it's metadata and -a that the packages are alternatives (i.e. any one of the packages can be used by the recipient).

This script looks for /etc/mime.types, /usr/local/lib/mime.types and ~/.mime.types in order to assign MIME types to files based on their extensions.

The -d option turns on debugging - the MIME message is written to STDOUT instead of being sent by e-mail.

Here are some examples of use:

  1. To send 3 sets of metadata that describe the same resource in a single container: wfsend -t lisap@ukoln.ac.uk -s "test 1" -c file.sgml file.pics file.ukmarc
  2. To send a Postscript file and it's associated UKMARC record and USMARC record (where either form of metadata can be used at the remote end): wfsend -t lisap@ukoln.ac.uk -s "test 2" -r file.ps -a file.ukmarc file.usmarc
  3. To send 2 sets of metadata (DC in SGML and a PICS label) about 2 separate resources: wfsend -t lisap@ukoln.ac.uk -s "test 3" -c -c file1.sgml file1.pics -c file2.sgml file2.pics

 gendc
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/gendc   
Author Andy Powell
Publisher UKOLN
Requirements Perl
Description Simple script to create embedded Dublin Core interactively
Keywords Dublin Core, HTML
Language perl
Usage gendc
Comments Simple script that asks questions about a resource in order to generate Dublin Core description (in embedded HTML META tag format). Can be configured to know about arbitrary qualifiers. Answers to questions are remembered and are given as defaults next time the command is used.

 ROADSHarvester
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/roads/roadsharvester-v1a1.tar.Z
Author Andy Powell
Publisher UKOLN
Requirements Perl 5.0, ROADS, Harvest v1.4pl2, Perl MD5 package
Description This package provides a combine-harvester for the ROADS software, adding the following functionality. 1) Automatic generation of metadata to 'pump-prime' ROADS records as part of the process of manually creating resource descriptions. Records created in this way can currently be based on either the DOCUMENT or the SERVICE template type. 2) Web robot based bulk-harvesting of records into a ROADS database based on the URLs listed in another ROADS database. Typically, all the records created in this way will be based on the DOCUMENT template type.
Keywords ROADS, harvesting, robot, Harvest, metadata
Language perl
Usage See the README file.
Comments

 roads2gils.pl
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/roads2gils.pl   
Author Andy Powell
Publisher UKOLN
Requirements Perl 5.0
Description Converts ROADS records to an SGML-like GILS record suitable for loading into Zebra.
Keywords ROADS, IAFA, GILS, Zebra
Language perl
Usage roads2gils.pl
Comments

 DC-dot
URL http://www.ukoln.ac.uk/metadata/sofware-tools/tools/dcdot/dcdot1.0.tar.Z
Author Andy Powell
Publisher UKOLN
Requirements Perl 5.0, Libwww-perl, soif.pl, Jon Knight's MARC module
Description A Perl CGI script for generating HTML Dublin Core META tags.
Keywords Dublin Core, DC, editor, Warwick Framework, USMARC, SOIF, TEI GILS, XML, ROADS, IAFA, DESIRE
Language perl
Usage
Comments See the installation instructions, <URL:http://www.ukoln.ac.uk/metadata/software-tools/tools/dcdot/INSTALL>.

 Java DC-dot
URL http://www.ukoln.ac.uk/metadata/metadata-tools/tools/dcdot/java/
Author Andy Powell
Publisher UKOLN
Requirements Java 1.1
Description A tool for creating Dublin Core metadata
Keywords Dublin Core, DC, editor, SOIF, XML
Language Java
Usage java ukoln.metadata.DCdot
Comments <URL:http://www.ukoln.ac.uk/metadata/software-tools/tools/dcdot/java/README>

 soif2nwi
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/soif2nwi/soif2nwi   
Author Andy Powell
Publisher UKOLN
Requirements Perl 5.0, soif.pl (from Harvest), Perl MD5 package
Description A perl script for converting SOIF records into NWI records suitable for loading into Zebra. This script is intended to run as a Harvest 'post-summarising' script by adding something like:
    Post-Summarizing: lib/nwirules
 
into the Harvest gatherer.cf config file. You'll need two other files and a following wind to get this stuff to work. The files both live in the gatherer's 'lib' directory and are called nwirules and nwipostproc.pl.
Keywords Harvest, SOIF, NWI, GILS, Zebra, DESIRE
Language perl
Usage See above.
Comments

 Patch to add MCF support to ROADS v1 addsl.pl
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/addsl.pl.patch   
Author Andy Powell
Publisher UKOLN
Requirements Perl 5.0
Description This is a patch to addsl.pl (v1b2pl9), adding a -M switch which causes it to generate an MCF file as well as HTML pages. NOTE: this generates an *old* format MCF file! Needs updating to generate newer versions of MCF (??) but might be of interest. The MCF is sent to a file called alphalist.mcf in your HTML directory (~/htdocs/) by default.
Keywords MCF, ROADS
Language perl
Usage
Comments To get the HotSauce plugin to work you need to get your Web server to type .mcf files as 'image/vasa', e.g. add
     AddType .mcf image/vasa 8bit 1.0
  
to your CERN httpd config file.

 roads2metadc.pl
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/roads2metadc.pl   
Author Tracy Gardner, Andy Powell
Publisher UKOLN
Requirements Perl 5.0, ROADS version 2
Description Output a ROADS DUBLINCORE record as HTML META tags or as RDF. Primarily intended to be used as an SSI script.
Keywords ROADS, Dublin Core, DC, HTML, META, RDF
Language perl
Usage For Apache, embed something like <!--#exec cmd="/opt/bin/roads2metadc 885679460-360" --> somewhere into the HEAD section of the page. 885679460-360 is the handle of the DUBLINCORE record that describes the current page.
Comments

 ls2cdf
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/ls2cdf   
Author Andy Powell
Publisher UKOLN
Requirements Perl 5.0
Description Produce a Channel Definition Format (CDF) file based on a simple list of file names.
Keywords CDF, Channel Definition Format
Language perl
Usage ls2cdf [-t title] [-a abstract] [-u URL] [-p period]

The ls2cdf command should be combined with the find command to build a CDF file for a set of Web resources. As an example of use, the following commands will build a 'channel' listing the UKOLN metadata Web pages that have been modified in the last 5 days:

     find /opt/web/content/www.ukoln.ac.uk/metadata -mtime -5 \
     -name \*.html -print | ls2cdf -t "UKOLN Metadata News" \
     -a "UKOLN Metadata Web pages modified in the last 5 days" \
     -u http://www.ukoln.ac.uk/metadata/
 
Comments The source to ls2cdf will need some local modofication to change the set of regular expressions near the top of the file that map filenames to URLs.

 hdlres
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/hdlres.c   
Author Andy Powell
Publisher UKOLN
Requirements CNRI Handle Client Library
Description A very simple Handle resolution client.
Keywords Handle, DOI, Digital Object Identifier, URN, N2L
Language C
Usage hdlres
Comments Intended for use with the N2L CGI-based URN resolver.

 N2L
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/N2L   
Author Andy Powell
Publisher UKOLN
Requirements Perl, hdlres
Description CGI-based DOI/Handle/IETF/ISBN URN resolver. Not quite RFC-2169 compliant. Resolves DOI namespace using hdlres.c. Resolves IETF namespace using code from draft-ietf-urn-ietf-04.txt. Resolves ISBN namespace using www.amazon.co.uk.
Keywords URN, N2L, Name to Location, resolver, Handle, DOI, IETF, ISBN
Language perl
Usage
Comments Probably needs installing as nph-N2L

 bibcheck.cgi
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/bibcheck.cgi   
Author Ian Peacock
Publisher UKOLN
Requirements Perl
Description A CGI-based tool for computing a BIBLINK.Checksum, a message digest (checksum) for Web pages.
Keywords MD5, message digest, checksum, BIBLINK
Language perl
Usage
Comments

 DC-Datamodel
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/dcdm/dcdm.tar.Z
Author Andy Powell
Publisher UKOLN
Requirements Perl 5.0
Description A set of Perl modules offering an object oriented implementation of the DC-datamodel as defined by "Guidance on expressing the Dublin Core within the Resource Description Framework (RDF)".
Keywords DC, datamodel, Perl, RDF, classes, object-oriented
Language perl
Usage
Comments

 DC-assist
URL http://www.ukoln.ac.uk/metadata/software-tools/tools/dcassist/dcassist.tar.Z
Author Andy Powell
Publisher UKOLN
Requirements Perl 5.0
Description A Perl CGI script that generates a JavaScript metadata help utility.
Keywords DC, Perl
Language perl
Usage
Comments


Maintained by: Andy Powell

[Metadata] [UKOLN]