if you have installed this hack and you like it, but you dont want some sections of your UBB to be crawled then you can use the :
Robots.txtWhen a Robot visits a Web site, say
https://ubbdev.com/, it firsts checks for
https://ubbdev.com/robots.txt. If it can find this document, it will analyze its contents to see if it is allowed to retrieve the document. You can customize the robots.txt file to apply only to specific robots, and to disallow access to specific directories or files.
Here is a sample robots.txt file that prevents all robots from visiting the entire site
User-agent: * # applies to all robots
Disallow: / # disallow indexing of all pages
The Robot will simply look for a "/robots.txt" URI on your server, where a site is defined as a HTTP server running on a particular host and port number. Here are some sample locations for robots.txt:
Site URL for robots.txt https://ubbdev.com/
URL for robots.txt https://ubbdev.com/robots.txt
Site URL for robots.txt https://ubbdev.com:80/
Site URL for robots.txt https://ubbdev.com:80/robots.txt
There can only be a single "/robots.txt" on a site. Specifically, you should not put "robots.txt" files in user directories, because a robot will never look at them. If you want your users to be able to create their own "robots.txt", you will need to merge them all into a single "/robots.txt". If you don't want to do this your users might want to use the Robots META Tag instead.
Some tips: URL's are case-sensitive, and "/robots.txt" string must be all lower-case. Blank lines are not permitted within a single record in the "robots.txt" file.
There must be exactly one "User-agent" field per record. The robot should be liberal in interpreting this field. A case-insensitive substring match of the name without version information is recommended.
If the value is "*", the record describes the default access policy for any robot that has not matched any of the other records. It is not allowed to have multiple such records in the "/robots.txt" file.
The "Disallow" field specifies a partial URI that is not to be visited. This can be a full path, or a partial path; any URI that starts with this value will not be retrieved. For example,
Disallow: /help disallows both /help.html and /help/index.html, whereas
Disallow: /help/ would disallow /help/index.html but allow /help.html.
An empty value for "Disallow", indicates that all URIs can be retrieved. At least one "Disallow" field must be present in the robots.txt file.
A good example of a robot.txt file is:
#
# robots.txt for http://www.djwebpages.com/
#
# $Id: robots.txt,v 1.22 2003/09/11 20:23:04 ted Exp $
#
# For use by search.w3.org
User-agent: W3Crobot/1
Disallow: /Out-Of-Date
# AltaVista Search
User-agent: AltaVista Intranet V2.0 W3C Webreq
Disallow: /Out-Of-Date
# exclude some access-controlled areas
User-agent: *
Disallow: /Images
Disallow: /Privat
Disallow: /cgi-bin/linkexchange/
Disallow: /Web
Disallow: /History
Disallow: /Out-Of-Date
Disallow: /2003/09/mid
Disallow: /People/all/
To create your own fast and secure robots.txt you can use
Robotpack "
- 100% Freeware, No Spyware, No Sponsorware and No Adware
- Create robot-exclusion files by selecting documents and directories.
- Log into FTP servers and upload ROBOTS.TXT from RobotPack.
- Manage Projects for multiple web site.
- RobotPack come with the Open Robots.txt Directory (ORD).
- Ability to edit the robot database and add additional user-agents.
Download Robotpack Note: This is not tested yet at UBB but i am almost 100% sure that it will work.