UKOLN logo NOF-digitise
Technical Advisory Service
2001 - 2004 archive
nof logo AHDS logo

Frequently Asked Questions

The questions and answers on this page have been asked by nof-digitise applicants. This page will be updated frequently.

Legal issues

  1. Where can I get a draft form to use for copyright clearance?
  2. The technical standards document talks about certain institutions having access to additional resources by signing a licence committing them to non-commercial use (section 3.1.5). Who would be parties to these licences?
    • Would these licences involve an exchange of money and if so between whom?
    • Would the copyright owner be entitled to a fee for reproduction in the same way as with non-digital reproduction?
  3. Where can I find out more about Copyright?
  4. Could you explain NOF's attitude to copyright ownership of bespoke software, custom databases and schemas developed for NOF-digi projects?
  5. Are there any constraints on projects which opt to sell goods or services from their website?

Hardware and software

  1. I am trying to find information on whether there is any OCR software that can cope reliably with 17th-19th century printed material, including material in columns. I would also like pointers to information on how existing OCR software would cope with 19th-century newspapers.
  2. What is the situation regarding servers?  Supplying video, for example, to many institutions simultaneously places huge demands upon UK Internet infrastructure as well as servers.  Should we be looking to host our servers at a high capacity Super Janet node or will other provisions be made?
  3. Can you clarify what you mean by a Content Management System?
    • Do I need one for my project?
    • Is a database sufficient?
  4. Can you provide further information on selecting scanners and digital cameras?
  5. Which is better SQL Server or an Access Database?
  6. Would NOF recommend the web site to be hosted on its own dedicated server or on a shared server? What are the things I should bear in mind to take such a decision?
  7. Should the web server be protected by a firewall? (Or is it enough to have a firewall installed on our office network server where we will store all our digital mastercopies?)
  8. Should we require back ups from the web hosting organisation? (We will have all back ups of mastercopies on our own network server in the office.)
  9. What type of connection and recommended speed should the web server have to the local ethernet (eg 100Mbits/sec)?
  10. What kinds of software are being used across the NOF-digitise programme?
  11. What are the hardware and software issues involved in creating digital material for community languages?

Legal Issues


1. Where can I get a draft form to use for copyright clearance?

IPR and copyright is a very complex area and unfortunately there is no "one-size-fits-all" solution to these issues. Every resource or collection of resources may have its own IPR problems that will need to be solved before a digitisation project can go ahead. However, as it is an issue of such importance when working in a networked environment, a number of excellent resources have been produced to guide you through the process of clearing resources for use. 

VADS (The Visual Arts Data Service) and TASI (The Technical Advisory Service for Images) have produced a guide to creating digital resources:
  http://vads.ahds.ac.uk/guides/creating_guide/contents.html

and the section on copyright and rights management can be found at:
 http://vads.ahds.ac.uk/guides/creating_guide/sect21.html& nbsp;

The document does give a good introduction to the issues and gives advice on the drafting of licences. However (very sensibly) it includes the following disclaimer:

"These sections do not constitute legal advice. They contain the interpretations of the copyright law by the authors. No responsibility will be taken for the interpretation of these sections by a third party. For specific advice on individual copyright issues, a visit to a specialist copyright lawyer is always recommended".
http://vads.ahds.ac.uk/guides/creating_guide/disclaimer.h tml

The EARL policy paper by Sandy Norman on copyright in the networked
environment is also worth a look:
http://www.earl.org.uk/policy/issuepapers/copyright.html< br>
More information is linked from the VADS/TASI document's bibliography:
http://vads.ahds.ac.uk/guides/creating_guide/bibliograp hy.html

and Catherine Seville's Cedars bibliography:
http://hcdt.oucs.ox.ac.uk/cedars/index.cfm


2. The technical standards document talks about certain institutions having access to additional resources by signing a licence committing them to non-commercial use (section 3.1.5). Who would be parties to these licences? Would these licences involve an exchange of money and if so between whom? Would the copyright owner be entitled to a fee for reproduction in the same way as with non-digital reproduction?

The principle is that the end user will be provided with access free at the point of use but that two issues have to be covered:

  • sustainability of service/provision
  • protection for Intellectual Property Rights.

There is a variety of models of course but a proven model involves using contracts i.e. issuing a licence.

Let us define three parties. The contributor is the body which owns the IPR in the resource. The Service Provider is the body which stores and makes available the resource. The User Institution is the body which accesses the Service Provider under licence.

The Contributor and the Service Provider could be the same thing. However, if that is not the case, there requires to be a licence setting out the conditions under which the Service Provider may make the material available.

There may also be a payment from the Service Provider to the Contributor in respect of the IPR to allow for non-profit, non proliferation educational use from then on.

The User Institution will be licensed to use the resources by the Service Provider and may pay an annual fee to allow that access. The User Institution agrees in the licence to certain conditions - normally non-profit, non proliferation use.

The User institution then allows its user group - students, those visiting a library or museum - to access the resources.

It is expected that so long as the use was purely for non-profit, non-proliferation educational purposes, the service provider would not make a further IPR payment. The licence fee it charges being merely there to sustain the service.


3. Where can I find out more about Copyright?

Try the following urls:
http://www.tasi.ac.uk/advice/managing/copyright_faq.html
http://vads.ahds.ac.uk/advice/copyright/
http://www.leeds.ac.uk/cedars/guideto/ipr/


4. Could you explain NOF's attitude to copyright ownership of bespoke software, custom databases and schemas developed for NOF-digi projects?

An explanation of NOF's IPR conditions and to issues around Open Source systems follows.

1. The NOF IPR conditions specify (page 4 under Definitions) that 'Material means any documentation or material (including without limitation software and databases) to be provided to the Fund etc..' This is further explained in the guidance letter (13 August 02 under 'Definitions' and under '2.2 Licences' on page 4) ...'in the case of material that is delivered with software or databases specially written for this project (including any adaptation of commercially available software or databases) The Fund would expect that any commercial exploitation would recognise the use of public funds in the generation of the material. Significant commercial exploitation might involve grant repayment.'

2. The IPR conditions give the Fund the right to use the materials developed for the programme but do not provide for any 'transfer' of IPR. This is an important difference. The Fund will not 'own' the IPR to materials created (including software) but through the conditions does have rights over the commercial exploitation of the material, as explained above.

3. If any grant holder in unclear on this point and has any query regarding terms and conditions of NOF grants please contact NOF directly either through your case manager or to this email address at digitisation@nof.org.uk.

4. If you are a supplier contracted to a grant-holder please address your queries to the grant-holder who will raise them with NOF where necessary.

5. Please note that neither the IPR conditions nor the Technical Standards conditions require a commitment to open source software, but we welcome that debate on the nof-jiscmail list as it raises awareness of an important issue.

5. Are there any constraints on projects which opt to sell goods or services from their website?

Summary

This information is not tendered as advice but does indicate which projects should most likely take note of regulations as they might apply to their situation. It also provides details on sources of information of possible use to projects.

This FAQ is potentially of importance to any project which opts to:

  • advertise goods or services online (i.e. via the Internet, interactive television or mobile telephone)
  • sell goods or services to businesses or consumers online

Requirements

If either of the above applies, then there are requirements which must be met in respect of three possible areas :

1 Information requirements, i.e. the information that must be provided to end-users. These requirements include providing your end users with:

  • the full contact details of your business including your geographic address as well as email address to enable direct and rapid communication with you
  • details of any relevant trade organisations to which you belong
  • details of any authorisation scheme relevant to your online business
  • your VAT number, if your online activities are subject to VAT
  • clear indications of prices, if relevant, including any delivery or tax charges.

The above requirements will probably apply to you if you sell or advertise goods or services online (i.e. via the Internet, interactive television or mobile telephone).

2 Commercial communications, i.e. essential identifications and explanations that must be provided to end-users, for example if a project markets via email. These requirements include providing your end users with:

  • clear identification of any electronic communications designed to promote (directly or indirectly) your goods, services or image (e.g. an e-mail advertising your goods or services)
  • clear identification of the person on whose behalf they are sent
  • clear identification of any promotional offers you advertise e.g. any discounts, premium gifts, competitions, games
  • clear explanation of any qualifying conditions regarding such offers
  • clear indication of any unsolicited commercial communications you send.

Note therefore that any form of electronic communication designed to promote your goods, services or image, such as an e-mail advertising your goods or services, must:

  • be clearly identifiable as a commercial communication
  • be identify the person and/or organisation on whose behalf it is sent.

The above requirements will probably apply to you if you promote goods or services through any form of electronic communication (e.g. an e-mail advertising your goods or services).

3 Electronic contracting, i.e. information and explanations about the process of creating a contract electronically with an end-user. These requirements include providing your end users with:

  • a description of the different technical steps to be taken to conclude a contract online
  • an indication of whether the contract will be filed by your business and whether it can be accessed
  • clear identification of the technical means to enable end users to correct any inputting errors they make
  • an indication of the languages offered in which to conclude the contract.

The above requirements will probably apply to you if you enable end users to place orders online.

The requirements contained in the three categories above represent the basic situation. There may be other requirements in addition which can be ascertained from the sources of information given below.
Some exceptions do apply.

In conclusion the DTI guidance states:
"Action you may need to take:
If the Regulations apply to you, you may need to make textual or structural changes to the medium you use to advertise or sell your goods or services online, e.g. your website, in order to comply with the new requirements."

Sources of Information

The Electronic Commerce Directive (00/31/EC) & The Electronic Commerce (EC Directive) Regulations 2002 (SI 2002 No. 2013)
HTML source on DTI website.

Guidance on Electronic Commerce Regulations
HTML source on DTI website

Beginners Guide to the E-Commerce Regulations 2002
Word document on DTI website

Frequently Asked Questions on The Electronic Commerce (EC Directive) Regulations 2002
HTML source on DTI website

Contacts

1]
Paul Redwin
Bay 202
Department of Trade and Industry
151 Buckingham Palace Road
London SW1W 9SS
Telephone: +44 (0)20 7215 1853
Fax: +44 (0)20 7215 4161
Email: ecom@dti.gsi.gov.uk

2]
Trading Standards Offices

You will find the address and telephone number of your local Trading Standards Department for England, Scotland or Wales in the telephone book under "Local Authority" or on the Internet by visiting http://www.tradingstandards.gov.uk/ and entering your postcode.

The address for Northern Ireland is:
Trading Standards Service
Department of Enterprise, Trade and Investment
176 Newtownbreda Road
Belfast BT8 6QS
Tel: (028) 9025 3900
Fax: (028) 9025 3953
Email: tss@detini.gov.uk

3]
Office of Fair Trading
You can contact the Office of Fair Trading through its website, http://www.oft.gov.uk/ or at:
Office of Fair Trading
Fleetbank House
2-6 Salisbury Square
London EC4Y 8JX
Tel: (020) 7211 8000
Fax: (020) 7211 8800
Email: enquiries@oft.gov.uk

4]
Her Majesty's Stationery Office

To obtain copies of relevant Acts of Parliament and Statutory Instruments, you should contact Her Majesty's Stationery Office (HMSO) at their website address:
http://www.hmso.gov.uk
or phone HMSO's Regulations Unit on 020 7276 5216.


Hardware and Software


1. I am trying to find information on whether there is any OCR software that can cope reliably with 17th-19th century printed material, including material in columns. I would also like pointers to information on how existing OCR software would cope with 19th-century newspapers.

Although we do not have very much experience of individual products most OCR software would still have problems with recognising these types of text. Even apart from the likelihood of non-standard typefaces and awkward columns, most OCR software might have problems with background noise (e.g. print bleed-through or foxing) and non-standard characters. It's probably worth testing OCR software before rejecting it, as the main alternatives would be re-keying the whole text (horribly expensive) or just digital imaging.

The AHDS/OTA's guide to Creating and documenting electronic texts is worth a look:

http://ota.ahds.ac.uk/documents/creating/

Some other recent opinions:

From: Alan Morrison, Michael Popham and Karen Wikander, Creating and documenting electronic texts: a guide to good practice. Oxford: Oxbow Books, 2000 (forthcoming):

" ... the first thing you must consider if you decide to use OCR for the text source is the condition of the document to be scanned. If the characters in the text are not fully formed or there are instances of broken type or damaged plates, the software will have a difficult time reading the material.  The implications of this are that late 19th and  20th-century texts have a much better chance of being read well by the scanning software. As you move further away from the present, with the differences in printing, the OCR becomes much less dependable. The changes in paper, moving from a bleached white to a yellowed, sometimes foxed, background creates noise that the software must sift through. Then the font differences wreak havoc on the recognition capabilities. The gothic and exotic type found in the hand-press period contrasts markedly with the computer-set texts of the late 20th century. It is critical that you anticipate type problems when dealing with texts that have such forms as long esses, sloping descenders, and ligatures. Taking sample scans with the source materials will help pinpoint some of these digitizing issues early on in the project.

http://ota.ahds.ac.uk/documents/creating/chap3.html

From: Steven Killings, 'Optical Character Recognition.' Connect, Spring 1999:

"In the final analysis, when you hear 98 percent accuracy rates quoted for OCR software packages, consider that these were most likely accomplished using laser printed business documents, where the degree of variation among characters is significantly small and where the orientations of characters is fixed and regular. An OCR operation on an average nineteenth century imprint will almost certainly be completed with less exactness.

http://www.nyu.edu/acf/connect/spring99/HumOCRSp99.html

Older PC Magazine article on OCR Software (20 January 1998):

http://www.zdnet.com/pcmag/features/ocr/_intro.htm


2. What is the situation regarding servers?  Supplying video, for example, to many institutions simultaneously places huge demands upon UK Internet infrastructure as well as servers.  Should we be looking to host our servers at a high capacity Super Janet node or will other provisions be made?

At this stage the NOF are not going to provide central servers on high speed networks, and the like. It will be down to the project to make arrangements to have their content connected to the Internet at speeds sufficient to deliver it to users in a useful fashion. Thus, a project delivering high bandwidth video will probably need a more robust (and faster) connection to the Net than one delivering small static images. The extra costs of this connection will need to be laid out - and justified - in the business plan.

Connection via one of the bigger SuperJANET nodes is one possibility that projects with HE partners might pursue, provided their use falls within JANET's Acceptable Use Guidelines http://www.ja.net/documents/use.html


3. Can you clarify what you mean by a Content Management System?

The term Contents Management System (CMS) is usually used to describe a database which organises and provides access to digital assets, from text and images to digital graphics, animation, sound and video. This type of product is relatively new and there are a few CMS available as off-the-shelf packages. CMS range from very basic databases to sophisticated tailor-made applications and can be used to carry out a wide range of tasks, such as holding digital content, holding information about digital content, publishing online and publishing on-the-fly.

For more information see http://www.ukoln.ac.uk/nof/support/help/papers/cms/

Do I need one for my project? Is a database sufficient?

The CMS provides mechamisms to support asset management, internal and external linking, validation, access control and other functionality. Typically, a CMS is built on an underlying database technology.

Content Management Systems range from very basic databases, to sophisticated tailor-made applications. They facilitate easier tracking of different parts of a Web site, enabling, for example, staff to easily see where changes have been made recently and - perhaps - where they might need to make changes (a 'News' page that hasn't been edited for 6 months?). They also ease the handling of routine updating/modifying of pages, where you want to change a logo or text on every page, for example.

A CMS can also simplify internal workflow processes and can ensure that you are working with a single master copy of each digital asset.

However there are other approaches which may be useable, such as making use of server-side scripting to manage resources.

Solutions may include:

Use of a dedicated CMS system.  Note this may be expensive, and there may be costs in learning the system, using it, etc. In addition you should ensure that an 'off-the-shelf' CMS product supports the metadata standards one might expect to use.

Use of a an open source CMS system.  This avoids licence costs, but there are still resource issues.

Use of a database.  May manage the resources but will it address issues such as workflow?

Use of server-side scripting approaches, such as PHP (Unix) and ASP (NT). These may allow bespoke applications to be developed, and may sit on top of databases.

To summarise then, the issue to be aware of is the difficulties in maintaining resources in formats such as HTML.  Using flat files and a CMS and/or databse is a way of addressing this management issue. Whilst it is not an explicit requirement that projects manage their resources with a CMS and/or a database, if such tools are not used, the project must show how it intends to faciltate good management of its digital assets.


4. Can you provide further information on selecting scanners and digital cameras?

Advice on selecting scanning equipment can be found in the Digitistation Process information paper available at:
http://www.ukoln.ac.uk/nof/support/help/papers/digitisation.htm

Suitable resolutions for digital master files for various media types are discussed in the HEDS Matrix [3], and the JIDI Feasibility Study [7] contains a useful table of baseline standards of minimum values of resolutions according to original material type.

A detailed discussion of resolution, binary and bit depth can be found on TASI's Web pages [8] and a good basic guide to colour capture can also be found on the EPIcentre Web pages [9].

References refer to :
[3] The HEDS matrix of potential cost factors
http://heds.herts.ac.uk/resources/matrix.html
[7] A feasibility study for the JISC Image Digitisation Initiative (JIDI)
http://heds.herts.ac.uk/resources/papers/jidi_fs.html
[8] TASI: The digital image
http://www.tasi.ac.uk/advice/creating/image.html
[9] The art and science of digital imaging
http://www.epi-centre.com/basics/basics2.html

Also both the HEDS and TASI sites, particularly at:
http://www.tasi.ac.uk/advice/creating/camera.html contain useful information on equipment.


5. Which is better SQL Server or an Access Database?

MS Access was designed as a database system for small scale office use. It was not designed for use as a database server, although it can be used in this mode for simple use.

For further information we suggest you look at the Microsoft Web site.

Some comments on Access vs SQLServer are available at:
http://webmasterworld.com/forum23/516.htm

You may also wish to look at:

http://www.jiscmail.ac.uk/cgi-bin/wa .exe?A2=ind0002&L=web-support&P=R9178

http://hotwired.lycos.com/webmonkey/bac kend/databases/tutorials/tutorial1.html

Although Access may be capable of handling the sorts of query volume you suggest, at least in the short term, you do need to consider scalability (SQLServer scales better), Web site integration (SQLServer *probably* integrates better), enterprise access to the data (SQLServer will better enable intranet access to the data, etc).

Data structures are unlikely to be affected by a move to SQLServer.


6. Would NOF recommend the web site to be hosted on its own dedicated server or on a shared server? What are the things I should bear in mind to take such a decision?

The issues to be considered in making this decision relate to performance, security and potential conflicts between software applications:

Performance - with a shared server, projects will need to ensure that the performance of their service is not impacted by the other things the server is doing.  Peak access times (of both the project's service and the other services on the shared machine) need to be considered.  Projects need to think about the performance of the server itself, as well as available network bandwidth to the machine.

Security - as a general rule, the more services offered by a machine, the harder it is to make that machine secure.  Projects should ensure that any machine they use is operated in a secure manner.

Conflicts - on a shared server there are more likely to be software conflicts, e.g. to run package X, package Y needs to be installed, but this conflicts with package Z that is already installed for some other service.  There are also issues associated with hosting more than one Web server on a single machine.  Typically these are resolved by hosting multiple 'named virtual hosts' (though under some operating systems it is also possible to assign multiple 'virtual' IP addresses to a single network interface, or to install multiple network interfaces).  Where 'named virtual hosts' are used it should be noted that the browser must support HTTP 1.1. However, this is not a significant problem.  The Apache manual says:

 "The main disadvantage [of name-based virtual hosts] is that the client  must support this part of the protocol. Almost all browsers do, but there  are still tiny numbers of very old browsers in use which do not."


7. Should the web server be protected by a firewall? (Or is it enough to have a firewall installed on our office network server where we will store all our digital mastercopies?)

As the technical standards state (section 3.1.4):

"Machines should be placed behind a firewall if possible, with access to the Internet only on those ports that are required for the project being delivered."

This applies to all machines used to deliver the project.  Projects are strongly encouraged to protect all machines used to deliver material (Web servers and back-end master storage servers) with a firewall.


8. Should we require back ups from the web hosting organisation? (We will have all back ups of mastercopies on our own network server in the office.)

Projects will need to be able to recover their Web service in the event of server failure, disk failure, or malicious hacking.  Backups therefore need to be taken of all files that need to be restored to recover a service.  I would anticipate that, in most cases, this means that projects will need to take backups of more than just master copies.


9. What type of connection and recommended speed should the web server have to the local ethernet (eg 100Mbits/sec)?

It is not possible to give a single answer to this kind of question.  The issue is ensuring sufficient bandwidth, given anticipated levels of traffic. Traffic levels will depend on numbers of users and the kind of material being accessed.  A project that anticipated 10 concurrent users accessing text-based material will have significantly less bandwidth requirements than a project anticipating 100 concurrent users accessing streaming video.

Access performance should be 'reasonable' for all resources served by the project, but it is difficult to provide guidance currently on what 'reasonable' means.  Available bandwidth at the server end of less than 56K for any individual end-user is likely to have an impact on their perception of server performance.  Image or video projects will probably want to aim much higher than this.  Available bandwidth is total bandwidth divided by total numbers of users (but remember to allow for bandwidth being used for other things - e.g. some public library/council networks will have bandwidth reserved for CCTV, administrative computing and so on.


10. What kinds of software are being used across the NOF-digitise programme?

As part of the Technical Evaluation process run by BECTa, the software that each project is using is recorded.
These reports are available here:
Software survey with consolidated software versions and comments - MS DOC FORMAT (83KB)
Software survey with consolidated software versions and comments - HTML FORMAT (19KB)


11. What are the hardware and software issues involved in creating digital material for community languages?

The NOF technical standards and guidelines require that all text is encoded in a way that makes it compatible with Unicode UTF-8. This allows for the simultaneous use of languages that deploy different (e.g. Roman and non-Roman) character sets, including many of the community languages being used by NOF-digi projects. Project managers need to be aware of what hardware / software is required in order to use Unicode. Basic information on Unicode is available from
http://www.unicode.org/unicode/standard/WhatIsUnicode.html

Windows 2000 and XP currently both support Unicode, whereas earlier versions do not. However, *applications* running on the earlier Windows operating systems can still support Unicode.

The upcoming Mac OS 10.2 is alleged to have better Unicode support than previous versions. For information on Unicode and previous Mac operating systems see:
(for OS9) http://www.alanwood.net/unicode/fonts_mac.html)
(for OS10) http://www.alanwood.net/unicode/fonts_macosx.html)

Some web browsers are better than others are reading community language scripts but any browser which claims to support HTML4 should be able to support Unicode. Overall, Mozilla, the open-source browser, is the preferred choice, followed by Netscape Navigator, then Internet Explorer. See http://www.alanwood.net/unicode/browsers.html for more details on how browsers need to be configured to read Unicode

It is necessary to obtain a Unicode font in order to display the different character sets. For a list of all the different Unicode fonts, Alan Wood's site is again a good source of information (see http://www.alanwood.net/unicode/fonts.html). Often a Unicode font comes embedded within particular applications. Many PCs have Microsoft's Arial Unicode font installed along with their copy of Microsoft Office 2000. Those without Office 2000 used to be able to download this font for free, but the font was removed from the Microsoft website in August 2002, leaving no suitable free Unicode in existence. For a further discussion on this see http://slashdot.org/comments.pl?sid=38224&cid=4092943.

Many developers employing Unicode, however, prefer to use one of two software packages which act as multi-character set text editors. These come with their own rudimentary Unicode fonts. Unipad, available for free, can be downloaded from http://www.unipad.org/, with versions for Windows 95 and above. A trial version of Uniedit, which should run on Windows 3.1 and above, is available from http://www.humancomp.org/uniintro.htm. Both programs cater for built-in keyboards and a wide variety of character sets - although some of these sets require further downloading from the relevant websites.



 
UKOLN is funded by MLA, the Museums, Libraries and Archives Council, the Joint Information Systems Committee (JISC) of the Higher and Further Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.

T A S : 2 0 0 1 - 2 0 0 4 : A R C H I V E
This page is part of the NOF-digi technical support pages http://www.ukoln.ac.uk/nof/support/.
The Web site is no longer maintained and is hosted as an archive by UKOLN.
Page last updated on Monday, May 09, 2005