#53891
06/27/2002 1:30 PM
|
Joined: Jun 2000
Posts: 93
Member
|
Member
Joined: Jun 2000
Posts: 93 |
I don't know about your guys, but I can tell you that it would be ab fab if we could have some of our great discussion listed in the search engines, directories. Here's my idea:
Search engines don't crawl dynamically created pages by default. So one of you brilliant hacking-types can write an addition or modification to the existing board so that every time a new topic is created, a separate .html file is created that contains each post's copy with a link to the actual topic.
The topic will get indexed, and when the person clicks on the link from the search engine, they get a link to the .cgi topic and the information is now well-exploited.
The specifics are that the file would only contain the title of the topic and the replies. Each of these replies would be one line long, with each reply deliniated by that line break.
I know this might sound silly and like a waste of bandwidth, but I am more than willing to give up a couple hundred megs for a feature like this that will keep my site well known.
ni·hil·ism (n-lzm, n-) n.
Philosophy. An extreme form of skepticism that denies all existence. A doctrine holding that all values are baseless and that nothing can be known or communicated.
|
|
|
#53892
06/27/2002 1:43 PM
|
Joined: Mar 2000
Posts: 21,079 Likes: 3
I type Like navaho
|
I type Like navaho
Joined: Mar 2000
Posts: 21,079 Likes: 3 |
You mean like a "latest threads" modification?
|
|
|
#53893
06/27/2002 2:19 PM
|
Joined: Jun 2000
Posts: 93
Member
|
Member
Joined: Jun 2000
Posts: 93 |
In theory that might be what I'm talking about, but I want for it to write to a static html document - one for each topic that links to the actual topic. The search engines will spyder an html document, but not a .cgi
ni·hil·ism (n-lzm, n-) n.
Philosophy. An extreme form of skepticism that denies all existence. A doctrine holding that all values are baseless and that nothing can be known or communicated.
|
|
|
#53894
06/27/2002 2:49 PM
|
Joined: Mar 2000
Posts: 21,079 Likes: 3
I type Like navaho
|
I type Like navaho
Joined: Mar 2000
Posts: 21,079 Likes: 3 |
actually, most/all search engines are moving to spidering dynamic pages... php, asp and cgi
you can adapt a latest threads mods to write the output in html...
|
|
|
#53895
06/27/2002 3:12 PM
|
Joined: Mar 2000
Posts: 21,079 Likes: 3
I type Like navaho
|
I type Like navaho
Joined: Mar 2000
Posts: 21,079 Likes: 3 |
What might work better is to get the daily active topics to write an active.html page.... it gives you some of the topic and a link to it and the forums. Would make an excellent search engine submission page... content changes all the time, so you're less likely to get slammed for spamming the search engines as well.
|
|
|
#53896
06/28/2002 6:43 AM
|
Joined: Oct 2001
Posts: 114
Content Kidman
|
Content Kidman
Joined: Oct 2001
Posts: 114 |
What you need is a crawler page or set of crawler pages.
One page that lists the forums just as plain links, i.e. forums.htm
The links on this page will link to other pages called f1.htm, f2.htm, etc. One for each forum.
Each f??.htm page will then have a list of links on them to the actual topics on the site.
You can then submit the one forums.htm page to the search engines and let it rip.
Most search engines are experimenting with dynamically generated pages or they only spider a few pages. They also have a tendency to get lost inside a forum with the many, many links.
It's easier to setup a crawler page and use the robots meta tag, to tell the crawler to follow the links on the crawler pages but not to index them. And then on the forums, add a robots tag to the forum header that says (index, no follow). This should mean your content is indexed, the crawler pages are not and the spider doesn't hammer your site following all the reply and edit links on all the forum pages.
This is a theory at the moment but one I'll be trying shortly. I want all my pages indexed internally and externally and this seems the easiest way around it. However, I can't write Perl so I'm going to use PHP to generate the pages from the UBB files.
If I get this working I'll share it here.
|
|
|
#53897
06/28/2002 4:15 PM
|
Joined: Oct 2001
Posts: 114
Content Kidman
|
Content Kidman
Joined: Oct 2001
Posts: 114 |
OK, I've turned the theory into practice. Here's the PHP code... http://www.liversidge.net/paul/crawler.zip It's two PHP files, one called forums.php that generates a set of links for the forums you want to be crawled. The other is called topics.php and this will build a set of links for all your forum topics and is referenced from forums.php. The code only reads the forum files so there's no risk of damaging anything but I'm not sure if the actually process works. I don't have a local search engine I can try it on and it would take anything from a week to a few months for Google et al to prove it was happy crawling these pages. In theory this should be the answer to getting your forums crawled by the search engines. If not, I'll keep going until I've found a way as I want to make sure my own are crawled. I'm not sure if you guys class this as a hack as it's PHP and it's more of an add-on. I'm sure this could be written in Perl but I don't know the language. Somebody can work the algorithm out from the PHP, it's pretty easy and the code is self-documenting. I've tested it on 6.05 and 6.3.0. As the delimiters have changed in a couple of the UBB vars files this is where it's most likely to fail. If you have any problems let me know, [email protected].
|
|
|
#53898
09/16/2002 8:13 PM
|
Joined: May 2001
Posts: 283
Member
|
Member
Joined: May 2001
Posts: 283 |
Can anyone create a Perl version of this? It sounds great.
Paulus, can you explain a little further how to code the meta tags you were talking about?
Thanks.
|
|
|
#53899
09/20/2002 11:34 AM
|
Joined: May 2001
Posts: 283
Member
|
Member
Joined: May 2001
Posts: 283 |
Is this too old to revisit, or what?
|
|
|
#53900
09/30/2002 12:30 AM
|
Joined: May 2001
Posts: 283
Member
|
Member
Joined: May 2001
Posts: 283 |
quote: It's easier to setup a crawler page and use the robots meta tag, to tell the crawler to follow the links on the crawler pages but not to index them. Can you explain this meta tag a little more?
|
|
|
Donate to UBBDev today to help aid in Operational, Server and Script Maintenance, and Development costs.
Please also see our parent organization VNC Web Services if you're in the need of a new UBB.threads Install or Upgrade, Site/Server Migrations, or Security and Coding Services.
|
|
Posts: 87
Joined: December 2001
|
|
Forums63
Topics37,575
Posts293,931
Members13,824
|
Most Online6,139 Sep 21st, 2024
|
|
Currently Online
Topics Created
Posts Made
Users Online
Birthdays
|
|
|
|