I started writing on a function to strip anything but safe html some time ago, but I never finished it. The code is based on a comment to strip_tags() in the manual on php.net so I can't take credit for all of it.
First it removes all content within some tags so that you want see things like javascript code. Then it removes all tags that aren't specifically allowed. After that it removes the attributes on all allowed tags (which can be set in the code).
There are some problems with this, first it doesn't nuke everything in improperly nested disallowed tags. But the only result is that it will show up as text, so it isn't too bad. Also it isn't possible to do links since even the href attribute is removed. There is also no way to do tables unless they are added in the allowed tags, but they would be very limited without attributes.
I've had some testing done on this without anyone being able to break it, but it will still need some more good eyes to have a go at it I believe.
It would be nice to have some crude way of using links and images as well, but I'm not sure how that could be pulled off. Maybe some sort of similar checks as is done for the IMG markup in the do_markup function for the img tag.
<br />// Nuke tags and their contents.<br />function nuke_contents($str) {<br /> $disallowed = array("script", "head", "title", "style", "applet", "object");<br /> foreach ( $disallowed as $tag ) {<br /> $str = preg_replace("'<\s*?{$tag}[^>]*?>.*?<\s*?/\s*?{$tag}[^>]*?>'si", "", $str);<br /> }<br /> return $str;<br />}<br />// Strip unwanted tags.<br />function safehtml ($str) {<br /> // Nuke some tags and anything inbetween<br /> $str = nuke_contents($str);<br /><br /> // Listed of tags that will not be stripped but whose attributes will be.<br /> $allowed = "br|b|i|p|u|a|pre|center|hr|blockquote|em|strong|big|small";<br /> $allowed .= "|h1|h2|h3|h4|h5|h6|q|sub|sup|tt|cite|code|address|abbr";<br /> // Start removing unwanted tags and attributes to wanted tags.<br /> $str = preg_replace("/<((?!\/?($allowed)\b)[^>]*>)/xis", "", $str);<br /> $str = preg_replace("/<($allowed)[^>]*?>/xis", "<\\1>", $str);<br /> $str = str_replace("<br>", "<br />", $str); // xhtml compliancy<br /> $str = str_replace("<hr>", "<hr />", $str); // xhtml compliancy<br /><br /> return $str;<br />}