UBB.Dev
Posted By: mario2 how come we still have no HTML filter? - 08/02/2003 9:08 AM
As far as I know, even some ancient boards had HTML filters years ago.

They allow some basic html to be posted, all else gets converted into plain text.

Obviously, we cannot allow javascript to open infinite windows and other funny things. Even funny background pictures and colors can be a pain and probably should not be allowed. Nor should closing html tags, unbalanced h1 tags, etc. So the way html is implemented, without filters, it is totally unusable. It MUST be turned off!!

I did a search on the board, and it is the unanimous opinion that html MUST be turned off for safety reasons.

Why would our visitors, most of whom know some rudimentary html, have to learn markup? We are not that important to them, that they will learn a new language. this is like requiring them to post in Russian!!!! Our users are up in arms since I cut html posting!

I don't think it is necessary to reinvent the wheel. I believe html filters can be copied from other boards, from chats, etc.

For starters, img, h1,h2, a href, and tables will do. Real fancy html is not needed.
Posted By: Astaran Re: how come we still have no HTML filter? - 08/02/2003 11:16 AM
In my opinion, markup tags are a lot easier to remind for people that didn't know HTML. People that already know html will learn the markup tags within seconds.

Allowing the img tag is a already s security risk, by the way.

If you need to share more complex documents, why not append the as an attachment?
Posted By: mario2 how is the img tag a security risk? - 08/03/2003 12:12 AM
anyone has in depth knowledge of html filters and their properties???

I guess they are not trivial, but the problem probably has been solved long ago. If img is a risk, then things, of course, are problematic ....
Posted By: Dave_L_dup1 Re: how is the img tag a security risk? - 08/03/2003 12:38 AM
how is the img tag a security risk?

If the img tag contains Javascript or other tricky code, the browser will attempt to silently execute it. UBB.threads has filtering in place to prevent this, but there have been numerous exploits involving img tags in both UBB.threads and UBB.classic, so it's possible that not all the holes have been plugged.
Posted By: mario2 Re: how is the img tag a security risk? - 08/03/2003 10:41 AM
can you be more precise? maybe give an example!? is that not easy to filter out? does this not need an easy to filter java script command?

As far as I understand, a filter should be restrictive. Convert any < brackets into < (ampersand lt), equally all > brackets into >

Only some specific commands would be allowed. If < is followed by H1 as in <H1>, then it is ok and will be retained. If necessary, even filter the commands through regular expressions, so only certain parameters would be allowed.

Of course, there are some tricky points. For example, we have to make sure that html does not creep in during post editing!
Posted By: Gardener Re: how is the img tag a security risk? - 08/04/2003 3:32 AM
I started writing on a function to strip anything but safe html some time ago, but I never finished it. The code is based on a comment to strip_tags() in the manual on php.net so I can't take credit for all of it.

First it removes all content within some tags so that you want see things like javascript code. Then it removes all tags that aren't specifically allowed. After that it removes the attributes on all allowed tags (which can be set in the code).

There are some problems with this, first it doesn't nuke everything in improperly nested disallowed tags. But the only result is that it will show up as text, so it isn't too bad. Also it isn't possible to do links since even the href attribute is removed. There is also no way to do tables unless they are added in the allowed tags, but they would be very limited without attributes.

I've had some testing done on this without anyone being able to break it, but it will still need some more good eyes to have a go at it I believe.

It would be nice to have some crude way of using links and images as well, but I'm not sure how that could be pulled off. Maybe some sort of similar checks as is done for the IMG markup in the do_markup function for the img tag.

Code
<br />// Nuke tags and their contents.<br />function nuke_contents($str) {<br />    $disallowed = array("script", "head", "title", "style", "applet", "object");<br />    foreach ( $disallowed as $tag ) {<br />        $str = preg_replace("'<\s*?{$tag}[^>]*?>.*?<\s*?/\s*?{$tag}[^>]*?>'si", "", $str);<br />    }<br />    return $str;<br />}<br />// Strip unwanted tags.<br />function safehtml ($str) {<br />    // Nuke some tags and anything inbetween<br />    $str = nuke_contents($str);<br /><br />    // Listed of tags that will not be stripped but whose attributes will be.<br />    $allowed = "br|b|i|p|u|a|pre|center|hr|blockquote|em|strong|big|small";<br />    $allowed .= "|h1|h2|h3|h4|h5|h6|q|sub|sup|tt|cite|code|address|abbr";<br />    // Start removing unwanted tags and attributes to wanted tags.<br />    $str = preg_replace("/<((?!\/?($allowed)\b)[^>]*>)/xis", "", $str);<br />    $str = preg_replace("/<($allowed)[^>]*?>/xis", "<\\1>", $str);<br />    $str = str_replace("<br>", "<br />", $str); // xhtml compliancy<br />    $str = str_replace("<hr>", "<hr />", $str); // xhtml compliancy<br /><br />    return $str;<br />}
© UBB.Developers