Previous Thread
Next Thread
Print Thread
Rate Thread
#53891 06/27/2002 1:30 PM
Joined: Jun 2000
Posts: 93
Member
Member
Offline
Joined: Jun 2000
Posts: 93
I don't know about your guys, but I can tell you that it would be ab fab if we could have some of our great discussion listed in the search engines, directories. Here's my idea:

Search engines don't crawl dynamically created pages by default. So one of you brilliant hacking-types can write an addition or modification to the existing board so that every time a new topic is created, a separate .html file is created that contains each post's copy with a link to the actual topic.

The topic will get indexed, and when the person clicks on the link from the search engine, they get a link to the .cgi topic and the information is now well-exploited.

The specifics are that the file would only contain the title of the topic and the replies. Each of these replies would be one line long, with each reply deliniated by that line break.

I know this might sound silly and like a waste of bandwidth, but I am more than willing to give up a couple hundred megs for a feature like this that will keep my site well known.


ni·hil·ism (n-lzm, n-)
n.

Philosophy.
An extreme form of skepticism that denies all existence.
A doctrine holding that all values are baseless and that nothing can be known or communicated.
Sponsored Links
#53892 06/27/2002 1:43 PM
Joined: Mar 2000
Posts: 21,079
Likes: 3
I type Like navaho
I type Like navaho
Joined: Mar 2000
Posts: 21,079
Likes: 3
You mean like a "latest threads" modification?


- Allen wavey
- What Drives You?
#53893 06/27/2002 2:19 PM
Joined: Jun 2000
Posts: 93
Member
Member
Offline
Joined: Jun 2000
Posts: 93
In theory that might be what I'm talking about, but I want for it to write to a static html document - one for each topic that links to the actual topic. The search engines will spyder an html document, but not a .cgi


ni·hil·ism (n-lzm, n-)
n.

Philosophy.
An extreme form of skepticism that denies all existence.
A doctrine holding that all values are baseless and that nothing can be known or communicated.
#53894 06/27/2002 2:49 PM
Joined: Mar 2000
Posts: 21,079
Likes: 3
I type Like navaho
I type Like navaho
Joined: Mar 2000
Posts: 21,079
Likes: 3
actually, most/all search engines are moving to spidering dynamic pages... php, asp and cgi

you can adapt a latest threads mods to write the output in html...


- Allen wavey
- What Drives You?
#53895 06/27/2002 3:12 PM
Joined: Mar 2000
Posts: 21,079
Likes: 3
I type Like navaho
I type Like navaho
Joined: Mar 2000
Posts: 21,079
Likes: 3
What might work better is to get the daily active topics to write an active.html page.... it gives you some of the topic and a link to it and the forums. Would make an excellent search engine submission page... content changes all the time, so you're less likely to get slammed for spamming the search engines as well.


- Allen wavey
- What Drives You?
Sponsored Links
#53896 06/28/2002 6:43 AM
Joined: Oct 2001
Posts: 114
Content Kidman
Content Kidman
Offline
Joined: Oct 2001
Posts: 114
What you need is a crawler page or set of crawler pages.

One page that lists the forums just as plain links, i.e. forums.htm

The links on this page will link to other pages called f1.htm, f2.htm, etc. One for each forum.

Each f??.htm page will then have a list of links on them to the actual topics on the site.

You can then submit the one forums.htm page to the search engines and let it rip.

Most search engines are experimenting with dynamically generated pages or they only spider a few pages. They also have a tendency to get lost inside a forum with the many, many links.

It's easier to setup a crawler page and use the robots meta tag, to tell the crawler to follow the links on the crawler pages but not to index them. And then on the forums, add a robots tag to the forum header that says (index, no follow). This should mean your content is indexed, the crawler pages are not and the spider doesn't hammer your site following all the reply and edit links on all the forum pages.

This is a theory at the moment but one I'll be trying shortly. I want all my pages indexed internally and externally and this seems the easiest way around it. However, I can't write Perl so I'm going to use PHP to generate the pages from the UBB files.

If I get this working I'll share it here.

#53897 06/28/2002 4:15 PM
Joined: Oct 2001
Posts: 114
Content Kidman
Content Kidman
Offline
Joined: Oct 2001
Posts: 114
OK, I've turned the theory into practice.

Here's the PHP code...

http://www.liversidge.net/paul/crawler.zip

It's two PHP files, one called forums.php that generates a set of links for the forums you want to be crawled. The other is called topics.php and this will build a set of links for all your forum topics and is referenced from forums.php.

The code only reads the forum files so there's no risk of damaging anything but I'm not sure if the actually process works. I don't have a local search engine I can try it on and it would take anything from a week to a few months for Google et al to prove it was happy crawling these pages.

In theory this should be the answer to getting your forums crawled by the search engines. If not, I'll keep going until I've found a way as I want to make sure my own are crawled.

I'm not sure if you guys class this as a hack as it's PHP and it's more of an add-on. I'm sure this could be written in Perl but I don't know the language. Somebody can work the algorithm out from the PHP, it's pretty easy and the code is self-documenting.

I've tested it on 6.05 and 6.3.0. As the delimiters have changed in a couple of the UBB vars files this is where it's most likely to fail. If you have any problems let me know, [email protected].

#53898 09/16/2002 8:13 PM
Joined: May 2001
Posts: 283
Member
Member
Offline
Joined: May 2001
Posts: 283
Can anyone create a Perl version of this? It sounds great.

Paulus, can you explain a little further how to code the meta tags you were talking about?

Thanks.

#53899 09/20/2002 11:34 AM
Joined: May 2001
Posts: 283
Member
Member
Offline
Joined: May 2001
Posts: 283
Is this too old to revisit, or what?

#53900 09/30/2002 12:30 AM
Joined: May 2001
Posts: 283
Member
Member
Offline
Joined: May 2001
Posts: 283
Quote
quote:
It's easier to setup a crawler page and use the robots meta tag, to tell the crawler to follow the links on the crawler pages but not to index them.
Can you explain this meta tag a little more?

Sponsored Links

Link Copied to Clipboard
Donate Today!
Donate via PayPal

Donate to UBBDev today to help aid in Operational, Server and Script Maintenance, and Development costs.

Please also see our parent organization VNC Web Services if you're in the need of a new UBB.threads Install or Upgrade, Site/Server Migrations, or Security and Coding Services.
Recommended Hosts
We have personally worked with and recommend the following Web Hosts:
Stable Host
bluehost
InterServer
Visit us on Facebook
Member Spotlight
Bill B
Bill B
Issaquah, WA
Posts: 87
Joined: December 2001
Forum Statistics
Forums63
Topics37,575
Posts293,931
Members13,824
Most Online6,139
Sep 21st, 2024
Today's Statistics
Currently Online
Topics Created
Posts Made
Users Online
Birthdays
Top Posters
AllenAyres 21,079
JoshPet 10,369
LK 7,394
Lord Dexter 6,708
Gizmo 5,834
Greg Hard 4,625
Top Posters(30 Days)
Gizmo 1
Top Likes Received
isaac 82
Gizmo 20
Brett 7
WebGuy 2
Morgan 2
Top Likes Received (30 Days)
None yet
The UBB.Developers Network (UBB.Dev/Threads.Dev) is ©2000-2025 VNC Web Services

 
Powered by UBB.threads™ PHP Forum Software 8.0.1
(Snapshot build 20240918)