dcbot.pl - Robot to harvest Dublin Core metadata
dcbot.pl [-d] [-f format] [-g] [-l ldat-path] [-p] [-u URL]
[-w WHOIS-command]
dcbot.pl - is a Perl program to generate Dublin Core elements in various formats from the metadata embedded in a nominated HTML page.
It was derived from the original UKOLN DC-dot program, which is intended to run as a CGI script under harness to a Web server. By removing the Web functionality the resulting code can be used trivially in a batch scripting environment, e.g. to harvest large volumes of Dublin Core metadata for later indexing.
Note that the Web functionality of DC-dot also includes a Dublin Core editor component, which for obvious reasons is not present in this program!
$ dcbot.pl -f SOIF -u http://www.lboro.ac.uk/
@FILE { http://www.lboro.ac.uk/
Description{75}: Loughborough University offers degree
programmes and world class research.
Last-Modification-Time{10}: 1032908400
Time-to-Live{7}: 2419200
Refresh-Rate{6}: 604800
Gatherer-Name{6}: DC-dot
Type{23}: text/html || 8415 bytes
File-Size{4}: 8415
MD5{32}: aef5f97ecfbd0a2018d105761e09cc52
Keywords{106}: University; England; postgraduate; undergraduate;
degree programme; course; research; teaching; prospectus
Title{23}: Loughborough University
Type{4}: Text
}
The dcbot.pl program has an external dependency on the Harvest system's Summary Object Interchange Format (SOIF) parser code.
It is assumed that Harvest has been installed in /usr/local/harvest. If this is not the case, the environmental variable HARVEST_HOME should be set to the Harvest top level directory, or the dcbot.pl code edited to reflect its location.
The original DC-dot program is Copyright (C) 1997 UKOLN, University of Bath, UK, and was written by UKOLN's Andy Powell. This derivative was created by Martin Hamilton in 2002.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA
This work was partially supported by grants from the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the joint JISC/NSF Digital Libraries Programme (IMesh Toolkit project).
UKOLN is funded by Resource: The Council for Museums, Archives & Libraries (the organisation succeeding the Library and Information Commission), the Joint Information Systems Committee (JISC) of the Higher and Further Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.