Administrators may wish to restrict access by automated robot software to web resources for a variety of reasons:
- To prevent resources from being indexed
- To minimise load on the web server
- To minimise network load
The Robot Exclusion Protocol is a set of rules which robot software should obey.
A robots.txt file located in the root of the web server can contain information on:
- Areas which robots should not access
- Particular robots which are not allowed access
Some issues to be aware of:
- Prohibiting robots will mean that web resources will not be found on search engines such as Alta Vista
- Restricting access to the main search engine robots may mean that valuable new services cannot access the resources
- The existence of a small robots.txt file can have performance benefits
- It may be desirable to restrict access to certain areas, such as cgi-bin and images directories.
WebWatch Hosts A robots.txt Checker Service