Tools and Software - paperBase
paperBase
Formal metadata extraction, the process of extracting facts about a document such as author, title, number of pages, place of publication and so forth, is often considered as potentially significant in repository enhancement. In particular, it represents an advance in ease of use. paperBase is a formal metadata extraction system that makes use of Bayesian statistics and a hidden Markov model (HMM) approach to extract relevant facts from the full text of documents.
Update history
paperBase was originally written with the University of Bristol as part of an investigation into citation analysis and formal metadata extraction. Its authors are Henk Muller, Andrew Moss and Emma Tonkin.
Download location
TBC