UBB.Dev
Posted By: isaac [7.x] Robots (Crawlers) List for UBB.threads - 06/25/2014 11:45 AM
Robots (Crawlers) List for UBB.threads - UPDATED 2022-07-10
formerly named "Search Engine Spiders List for UBB.threads"

About:
The advantage to using this list is that Robots (Crawlers and Search Engine Spiders) get put into the correct "Robots (Crawlers)" group when viewing your forum's Who's Online page.

This translation list is not used anywhere else in UBB.threads except on the Who's Online page at /ubbthreads.php/online. Having a long list of Robots will not slow down your forums.

Details:
Robots List: 2022-07-10
Total Agent Strings: 912
Sources:
https://udger.com/resources/ua-list/crawlers (2022-07-10) and
https://github.com/monperrus/crawler-user-agents (2022-05-03)
I've simply converted, combined, and cleaned the data for use with UBB.threads.

How to install:
1) Go to Control Panel > Display Options > Who's Online Settings.
2) Copy/paste the text from the newest list in to the "Robots (Crawlers)" box at the bottom of the page.
3) Click the "Update General Display Options" button.
done.

Notes:
Always use the newest list. Older lists contain old robots and Inactive crawlers.
Installs prior to 7.6.2 did not include a list of robot agents.
robots_20141014-UBBT762.txt Fresh Installs of UBB.threads 7.6.2 to 7.7.3 are pre-populated with this list.
robots_20200114-UBBT774.txt Fresh Installs of UBB.threads 7.7.4 are pre-populated with this list.
robots_20200924-UBBT775.txt Fresh Installs of UBB.threads 7.7.5 are pre-populated with this list.
robots_20220710-UBBT800.txt Fresh Installs of UBB.threads 8.0.0+ are pre-populated with this list.
robots_20220710-CATEGORIES.txt is the same robots list as the stock list, plus it includes a category name for many of the robots, such as Search, Marketing, Monitoring, Link Checker, Tool, etc.

Having problems using this list on an older version of UBB.threads? Remove any blank lines from the top/bottom of your copied list.


Description: Fresh Installs of UBB.threads 7.6.2 to 7.7.3 are pre-populated with this
Attached File
robots_20141014-UBBT762.txt  (0 downloads)

Description: Fresh Installs of UBB.threads 7.7.4 are pre-populated with this
Attached File
robots_20200114-UBBT774.txt  (0 downloads)

Description: Fresh Installs of UBB.threads 7.7.5 are pre-populated with this
Attached File
robots_20200924-UBBT775.txt  (3 downloads)

Description: Fresh Installs of UBB.threads 8.0.0+ are pre-populated with this
Attached File
robots_20220710-UBBT800.txt  (3 downloads)
Attached File
Posted By: isaac Re: [7.x] Robots (Crawlers) List for UBB.threads - 06/25/2014 12:12 PM
Although the source information for you to create your own updates and conversions is in the OP, I plan on updating this post every couple of months, making your job as an UBBT forum admin much easier.

---
EDIT: user-agent-string.info no longer provides a list of user agent strings without a subscription. this means that until another resource for this data is found, this list will remain as it currently is.
Posted By: Bill B Re: [7.x] Robots (Crawlers) List for UBB.threads - 06/25/2014 12:47 PM
This is absolutely awesome!!! Many, many thanks from all of us.
This is superb. I only ever had about 6 lines in there. That's so comprehensive as to be unbelievable.

Thank you.
This list has been updated 2014-10-14.
Thanks again.. This really makes the bot display more accurate. I just saw some bots that we from a company that I trusted... ouch.
In my "anonymous" list, I see a number of IPs like this: 157.55.39.xxx. Hovering over the "i" icon it shows:
Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

The list above has bingbot entries, and I added one from an older list, so I have this in my search engine agents list:
Code
bingbot=bingbot
bingbot=bingbot/2.0
bingbot=bingbot SitemapProbe
BingPreview=BingPreview
In spite of that, bingbot continues to stay in the anonymous group. Is there a bug, or something I can do to fix it?
bingbot=bingbot
bingbot=bingbot/2.0
bingbot=bingbot SitemapProbe

Are all the same thing.

USAGE: SearchEngineName=AgentBotString

The "AgentBotString" on the right side of the equation will search the whole "Agent" and return the matching "SearchEngineName"

So basically, if only list "bingbot=bingbot", you will cover all the other variations of "AgentBotString" for Microsoft's Bing spider/bot.
Ok, thanks. "Bing" shows up on Who's online on this forum, so why do they show as anonymous users on mine? (I'm at v 7.5.8, will move to 7.5.9 soon.)
If an unregistered user arrives at your site through a search using bing, they will be clasified as anonymous. if your site is being crawled by bingbot (not a live person), it should be classified and shown within the spider section.

Sounds like you have visitors finding your site through Bing. This is good!
Originally Posted by id242
Sounds like you have visitors finding your site through Bing. This is good!
I am pretty sure that is not the case.
My Who's Online shows these anonymous guests:
157.55.39.120
157.55.39.9
157.55.39.232
157.55.39.218
157.55.39.231
157.55.39.224
Each one shows bingbot when I hover over the "i" icon, and the Referrer: part is blank.

FWIW, 157.55.39 is owned by Microsoft, just run the IP's through Domain Tools
Originally Posted by Gizmo
FWIW, 157.55.39 is owned by Microsoft, just run the IP's through Domain Tools
Yes, Microsoft bingbot. Which brings me back to the original question. Why are six of those showing up as anonymous guests in my Who's Online list.

I would like to move them down into the Search Spiders section, but that is not happening. Any recommendations?
Posted By: Gizmo Re: [7.x] Robots (Crawlers) List for UBB.threads - 06/14/2015 11:24 PM
Well, if the referrer listing is blank, there isn't much you can do; as the WoL system just parses the spider data based on what is being supplied by bots IN the referrer variable.
Originally Posted by Gizmo
Well, if the referrer listing is blank, there isn't much you can do; as the WoL system just parses the spider data based on what is being supplied by bots IN the referrer variable.
The referrer listing is blank on all the Search Spider entries, and is blank on all the bingbot "guest" entries. So I don't understand ho the referrer being blank causes it to show up in the guest area.

I do appreciate your taking the time to post in this dialog. I am mostly a "grasshopper" here.
Posted By: Gizmo Re: [7.x] Robots (Crawlers) List for UBB.threads - 06/15/2015 11:40 AM
I talked with Isaac last night and evidently when the user is viewing a "cached result" from Bing there is no referrer variable passed as it's not "BingBot", but a user that's requesting data through a Bing server.
You're giving Bing waaay too much credit. Right now, I have 11 guests, 5 search spiders, and one user. Normal for this small forum with very little activity at night.

Of the 11 guests, 5 are bingbot. And 3 of them are walking this silly thread that has 99 pages. Long continuing threads confound the search spiders. Every time there is a new post, they spend a long time walking through every page in the sequence of pages.
Well, not really giving them too much credit; they force SSL for all queries now (source), so it could really be either incoming users from bing are coming in on their SSL (which is the default) or they're coming in from the cache.
Posted By: isaac Re: [7.x] Robots (Crawlers) List for UBB.threads - 11/02/2015 11:57 PM
Some further reading regarding http/https referer data:
https://yoast.com/web-https/
At the moment, I have FOUR of the anonymous Bingbots, and EIGHTEEN from 72.21.217.XXX (Amazon).

Of course, that pales in comparison to the THIRTY FOUR properly identified Baidu spiders.
SteveS, are you confident your Amazon stuff is not related caching of content within your site by AWS Cloud Computing /Route 53? https://aws.amazon.com/

Additional reading at:
https://en.wikipedia.org/wiki/Amazon_Route_53

The IP 72.21.217.n has been used as a proxy for a User Agent of MSIE-6, which is in itself highly deprecated. Headers can also be consistent with either a battened-down proxy or a bot.

Additional reading at:
"amazonaws.com plays host to wide variety of bad bots"
https://www.webmasterworld.com/search_engine_spiders/3828718.htm
I am not confident in anything, but I do a lot with Amazon, so I figured they were "botting" for that.
Posted By: isaac Re: [7.x] Robots (Crawlers) List for UBB.threads - 11/16/2019 10:58 PM
Changelog 2019-11-16
The Robots (Crawlers) List for UBB.threads has been updated in OP
Changelog 2020-01-14
The Robots (Crawlers) List for UBB.threads has been updated in OP
Changelog 2020-09-24
The Robots (Crawlers) List for UBB.threads has been updated in OP
When I try to add the list, I paste them, click save, and I get a 403 forbidden page.

But when I do something else on the page, click save, I don't.

Not sure why. My forum is https://ChristianDiscussion.org I have the latest version of UBB.
Originally Posted by Aaron101
When I try to add the list, I paste them, click save, and I get a 403 forbidden page.

But when I do something else on the page, click save, I don't.

Its your host. they are censoring the content that you send. theyre basically blocking you from typing words/phrases they do not agree with... for "security" lol

refer back to your other post when you had the same problem
https://www.ubbcentral.com/forums/ubbthreads.php/topics/265214/403-forbidden
Do I even need to add this list if I have the latest version of UBB?
Originally Posted by Aaron101
Do I even need to add this list if I have the latest version of UBB?

7.7.5 changelog says:
Quote
Updated Robots (Crawlers) list from 20200114 to build 20200924

Source:
https://www.ubbcentral.com/changelog.php#775


edit -
If it was an initial install of ubb.threads that you did, its included as the changelog says.
If this was just an upgrade you did, the robots.txt will be what came with your original version PLUS any customizations that you had done to it. ie;the content in this post is for you.
Okay, I am good then. I just purchased UBB this week. smile
Posted By: isaac Re: [7.x] Robots (Crawlers) List for UBB.threads - 07/09/2022 12:42 PM
This list has been updated 2022-07-09.
© UBB.Developers