Previous Thread
Next Thread
Print Thread
Rate Thread
Joined: May 1999
Posts: 1,715
Addict
Addict
Joined: May 1999
Posts: 1,715
I started writing on a function to strip anything but safe html some time ago, but I never finished it. The code is based on a comment to strip_tags() in the manual on php.net so I can't take credit for all of it.

First it removes all content within some tags so that you want see things like javascript code. Then it removes all tags that aren't specifically allowed. After that it removes the attributes on all allowed tags (which can be set in the code).

There are some problems with this, first it doesn't nuke everything in improperly nested disallowed tags. But the only result is that it will show up as text, so it isn't too bad. Also it isn't possible to do links since even the href attribute is removed. There is also no way to do tables unless they are added in the allowed tags, but they would be very limited without attributes.

I've had some testing done on this without anyone being able to break it, but it will still need some more good eyes to have a go at it I believe.

It would be nice to have some crude way of using links and images as well, but I'm not sure how that could be pulled off. Maybe some sort of similar checks as is done for the IMG markup in the do_markup function for the img tag.

Code
<br />// Nuke tags and their contents.<br />function nuke_contents($str) {<br />    $disallowed = array("script", "head", "title", "style", "applet", "object");<br />    foreach ( $disallowed as $tag ) {<br />        $str = preg_replace("'<\s*?{$tag}[^>]*?>.*?<\s*?/\s*?{$tag}[^>]*?>'si", "", $str);<br />    }<br />    return $str;<br />}<br />// Strip unwanted tags.<br />function safehtml ($str) {<br />    // Nuke some tags and anything inbetween<br />    $str = nuke_contents($str);<br /><br />    // Listed of tags that will not be stripped but whose attributes will be.<br />    $allowed = "br|b|i|p|u|a|pre|center|hr|blockquote|em|strong|big|small";<br />    $allowed .= "|h1|h2|h3|h4|h5|h6|q|sub|sup|tt|cite|code|address|abbr";<br />    // Start removing unwanted tags and attributes to wanted tags.<br />    $str = preg_replace("/<((?!\/?($allowed)\b)[^>]*>)/xis", "", $str);<br />    $str = preg_replace("/<($allowed)[^>]*?>/xis", "<\\1>", $str);<br />    $str = str_replace("<br>", "<br />", $str); // xhtml compliancy<br />    $str = str_replace("<hr>", "<hr />", $str); // xhtml compliancy<br /><br />    return $str;<br />}

Sponsored Links
Entire Thread
Subject Posted By Posted
how come we still have no HTML filter? mario2 08/02/2003 9:08 AM
Re: how come we still have no HTML filter? Astaran 08/02/2003 11:16 AM
how is the img tag a security risk? mario2 08/03/2003 12:12 AM
Re: how is the img tag a security risk? Dave_L_dup1 08/03/2003 12:38 AM
Re: how is the img tag a security risk? mario2 08/03/2003 10:41 AM
Re: how is the img tag a security risk? Gardener 08/04/2003 3:32 AM

Link Copied to Clipboard
Donate Today!
Donate via PayPal

Donate to UBBDev today to help aid in Operational, Server and Script Maintenance, and Development costs.

Please also see our parent organization VNC Web Services if you're in the need of a new UBB.threads Install or Upgrade, Site/Server Migrations, or Security and Coding Services.
Recommended Hosts
We have personally worked with and recommend the following Web Hosts:
Stable Host
bluehost
InterServer
Visit us on Facebook
Member Spotlight
Posts: 70
Joined: January 2007
Forum Statistics
Forums63
Topics37,573
Posts293,925
Members13,849
Most Online5,166
Sep 15th, 2019
Today's Statistics
Currently Online
Topics Created
Posts Made
Users Online
Birthdays
Top Posters
AllenAyres 21,079
JoshPet 10,369
LK 7,394
Lord Dexter 6,708
Gizmo 5,833
Greg Hard 4,625
Top Posters(30 Days)
Top Likes Received
isaac 82
Gizmo 20
Brett 7
WebGuy 2
Morgan 2
Top Likes Received (30 Days)
None yet
The UBB.Developers Network (UBB.Dev/Threads.Dev) is ©2000-2024 VNC Web Services

 
Powered by UBB.threads™ PHP Forum Software 8.0.0
(Preview build 20240506)