UBB.Dev
Posted By: RandyM Spider hack warning - 01/23/2004 8:06 AM
Maybe warning is a bit strong, let's say a bit of advise to people that have added the spider hack to their boards.

In the text file with instructions for implementing the spider bait hack you are given a warning about the possible load that it can place on your server during the time your boards are being crawled. This is a serious issue for large boards on shared servers. Another warning warns about impact on your boards if you don't have the accelerator enabled.

Please listen to these warnings. I've helped more than one site troubleshoot problems that ended up being caused by using this hack without enabling the accelerator and being on a shared server. UBB already has a reputation for being a resource hog and a lot of hosting companies no longer allow it as part of their less expensive plans.

We all know that UBB Classic is a great product. The spider bait hack is a must have for anyone that wants their forum content to be searchable. Just remember that it's easier to set your site up properly than it is to have to troubleshoot problems later or find a new host because they booted you for using more than your share of the server resources.

I've also seen cases where the server crashes with a ton of perl processes.
Posted By: usr bin geek Re: Spider hack warning - 01/23/2004 6:09 PM
.classic isn't much as a resource hog as it used to be with earlier 6.x versions but I do agree, you must have the accelerator enabled and you also need to consider the potential problems when enabling the spider hack on an over-shared shared server.

Good post. I'm featuring this topic.
Posted By: Charles Capps Re: Spider hack warning - 01/23/2004 7:38 PM
Any script has the potential to be a resource hog when malicious spiders come crawling along.

Google, Inktomi, and others intentionally limit the crawling rate in order to prevent hitting the server too hard, even for static pages. Not all spiders are as nice. A single malicious spider can easily make a hundred requests a minute, which can promptly bring any server to its knees.
Posted By: Gizmo Re: Spider hack warning - 01/24/2004 3:10 AM
Agreed, I've seen some serious impact on some boards running spider hack when spiders come along, a lot of hosts don't like it when that happenes now adays due to spiderable url's in the beta...

Perhaps look at a new host before you enable spidering... HostNuke hosts both of my boards and Al's, and we both have spidering enabled... Works like a dream...
Posted By: REAMERE Re: Spider hack warning - 01/27/2004 10:11 AM
What do you think of LunarPages in this regards?

I've heard (and seen a little) that their servers do bow to spiders at times.
Posted By: Gizmo Re: Spider hack warning - 01/28/2004 1:32 AM
I never really liked LunarPages... The only post I like anymore is HostNuke; their systems support my board when 3 other providers turned me away...
Posted By: REAMERE Re: Spider hack warning - 01/28/2004 11:20 PM
Hmmm. good to know, thanks.
I use LunarPages (obviously) for all of my clients and I really have no complaints. Of course, I have been on worse providers. Oh...the stories I could tell!

But DrkKnight and I just moved our boards over to Lunar and are doing well. I do about 3000 unique visits per month but he does well over 20,000 (crazy traffic), he's using Ubb.threads now, and I'm using classic, again no complaints.

But we do not have the spider hack installed. I was wondering if anyone had any experience with UBB and the Spider Hack on Lunar Pages as a host.

Thanks for the feedback
Posted By: Gizmo Re: Spider hack warning - 01/29/2004 1:49 AM
We were using a spider mod when on there, see above, they didn't like it lol...
Posted By: REAMERE Re: Spider hack warning - 01/29/2004 3:53 AM
Ahhh, I didn't realize you were using them.
I suppose that either the server crashed or they told you that they were dropping the account?
Either way, thanks, good to know, because we were both thinking of using the hack.
Posted By: Ian Spence Re: Spider hack warning - 01/29/2004 3:57 AM
if you wait a week or two, the next version will have it, so no need to alter your boards.


As for drknight, I believe that threads 6.5 will also have it
Posted By: Gizmo Re: Spider hack warning - 01/29/2004 4:28 AM
Correct, as AL sais, no need to use a hack when it's standard in a few weeks in 6.7.

And I was running my board there and they had told me that I was recieving too much traffic and creating too much of a load with their systems.

So I simply moved to HostNuke and never looked back...
Posted By: Ian Spence Re: Spider hack warning - 01/29/2004 5:54 AM
OK, I'm now taking bets on how many times Gizzy is gonna mention Hostnuke in this thread.
Posted By: Gizmo Re: Spider hack warning - 01/29/2004 6:02 AM
you counting my sig in this count? lol...
Posted By: REAMERE Re: Spider hack warning - 01/29/2004 7:57 PM
Heheh tipsy
Now you say it's gonna be standard, but will it be the same hack. Meaning, will as many pages be spidered or will it be kind of like a Spider Hack Light?
My obvious concern is that (when) we upgrade to 6.7 Classic and 6.5 threads respectively will this cause major problems with our current host.

You know how much effort it is moving boards around. (you know, to places like HOSTNUKE wink )

Throw me your opinions and advice please!
Posted By: Ian Spence Re: Spider hack warning - 01/30/2004 12:16 AM
the one is the base code is written by Charles Capps, who wrote the mod. And now that it's built in, it's much better, with .html extensions for spiders the require.
Posted By: REAMERE Re: Spider hack warning - 01/30/2004 12:54 AM
Nice. Thanks guys for all of the quick answers.
So basically we should:
1.) Make sure the PHP Accelerator is on and functional
2.) Monitor the Apache Servers during peak hours (among other times) to make sure they aren't getting overloaded.
3.) Consume several Gin & Tonics

Am I missing anything??
Posted By: Gizmo Re: Spider hack warning - 01/30/2004 1:11 AM
Don't forget the vodka; other than that, it's lookin good.
Posted By: Ian Spence Re: Spider hack warning - 02/02/2004 4:18 AM
I found this website that will have an htaccess that prevents most malicious spiders.

Code
# source: http://www.webmasterworld.com/forum13/687-9-15.htm
# some of these are commented out, they are possibly legitate download agents
# if you don't want anyone downloading your sites, uncomment them

RewriteEngine On
RewriteCond %{HTTP_REFERER} q=Guestbook [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot mailto:[email protected] [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GornKer [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Irvine [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC Web Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} dloader(NaverRobot) [OR]
#RewriteCond %{HTTP_USER_AGENT} ^puf [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SearchExpress [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web Image Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebBandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR]
RewriteCond %{HTTP_USER_AGENT} ^ZyBorg

RewriteRule ^.* - [F,L]
Posted By: REAMERE Re: Spider hack warning - 02/03/2004 2:31 AM
Nice find. I'll be adding that information to mine.
Excellent site also, lots of good info.

Question: Is there any way to block spiders from within any of the .cgi or .pl files?

Thanks in advance
Posted By: Ian Spence Re: Spider hack warning - 02/03/2004 4:50 AM
just disallow access to the cgi-bin and the noncgi/templates folder
Posted By: REAMERE Re: Spider hack warning - 02/03/2004 7:05 PM
Thanks again Al. Cant vote for ya again, but you have my thanks!
© UBB.Developers