Headline
CVE-2010-3870: A couple of unicode issues on PHP and Firefox
The utf8_decode function in PHP before 5.3.4 does not properly handle non-shortest form UTF-8 encoding and ill-formed subsequences in UTF-8 data, which makes it easier for remote attackers to bypass cross-site scripting (XSS) and SQL injection protection mechanisms via a crafted string.
Well, here I am developing ACS, finding that this project resembles at some degree the creation of a browser… but anyway, it’s close to a working beta (yay!).
In any case, a couple of bugs came to my attention, some of them are public, some of them are not.
First of all, I want to describe the PHP vulnerability I made public on my presentation with David Lindsay, at Blackhat USA 2009, that apparently only Chris Weber, Giorgio Maone (creator of NoScript), Mario Heiderich (creator of PHP-IDS) and the Acunetix security team have realized the danger of it.
It has been reported, well, more than enough times to the PHP team (I made another attempt today, hoping this will get fixed in some time soon… if at all). This issue affects all PHP versions Mario Heiderich and me could test, and endangers practically all PHP programs that use the utf8_decode() function for decoding (as recommended by OWASP guidelines).
The disclosure timeline follows:
* Reported by [email protected]: May 11 2009
* Discovered by [email protected]: June 19 2009
* Discovered by Giorgio Maone / Eduardo Vela: July 14 2009
* Reported and Fixed on PHPIDS: July 14 2009
* Microsoft notified of a XSS Filter bypass: July 14 2009
* Fixed XSS Filter bypass on NoScript 1.9.6: July 20 2009
* Vulnerability disclosed on BlackHat USA 2009: July 29 2009
* Added signature to Acunetix WVS: August 14 2009
* Re-reported by [email protected]: September 27 2009
* Vendor claims it was fixed on 5.2.11: September 29 2009
* Re-re-reported by [email protected] after checking 5.2.11: October 16 2009
* Published sirdarckcat.blogspot.com: October 16 2009
You can check the bug here:
http://bugs.php.net/bug.php?id=49687
In reality there are several vulns in just a couple of lines, so I’ll describe them here:
1.- Overlong UTF-8:
As REQUIRED by UNICODE 3.1, and noted in the Unicode Technical Report #36, UTF-8 is forbidden to interpretate a character’s non-shortest form.
http://www.unicode.org/reports/tr36/#UTF-8_Exploit
VULN: PHP makes no checks whatsoever on this matter.
Why is this a vulnerability?
A filter (such as addslashes, htmlentities, escapeshellarg, etc.) will NOT be able to detect&escape such byte sequences, and so an application that relies on them for security checks wont be protected at all. Because it allows an attacker to encode “dangerous” chars, such as ', ", <, ;, &, \0 in different ways:
' = %27 = %c0%a7 = %e0%80%a7 = %f0%80%80%a7
" = %22 = %c0%a2 = %e0%80%a2 = %f0%80%80%a2
< = %3c = %c0%bc = %e0%80%bc = %f0%80%80%bc
; = %3b = %c0%bb = %e0%80%bb = %f0%80%80%bb
& = %26 = %c0%a6 = %e0%80%a6 = %f0%80%80%a6
\0= % 00 = %c0%80 = %e0%80%80 = %f0%80%80%80
Use hackvertor to generate them.
Enabling attacks on systems that use addslashes for example (but almost all encoding functions would be vulnerable):
// add slashes!
foreach($_GET as $k=>$v)$_GET[$k]=addslashes(“$v”);// … some code …
// $name is encoded in utf8
$name=utf8_decode($_GET[‘name’]);
mysql_query(“SELECT * FROM table WHERE name=’$name’;”);?>
2.- Ill formed sequences:
As REQUIRED by UNICODE 3.0, and noted in the Unicode Technical Report #36, if a leading byte is followed by an invalid successor byte, then it should NOT consume it.
http://www.unicode.org/reports/tr36/#Ill-Formed_Subsequences
VULN: PHP will consume invalid bytes.
Why is this a vulnerability?
It will allow an attacker to “eat” controll chars. For example:
// htmlentities
foreach($_GET as $k=>$v)$_GET[$k]=htmlentities("$v",ENT_QUOTES);// … some code …
$name=$_GET[‘name’];
$url=$_GET[‘url’];// … some code …
$profileImage="<img alt=\"Photo of $name\" src=\"http://$url\" />";
// … some code …
echo utf8_decode($profileImage);
?>
A request such as:
?name=%90&src=%20onerror=alert(1)%20
Will execute the code "alert(1)" when the page loads.
Note that htmlpurifier does a utf8_decode function call at the end of the decoding, BUT they are safe because of a pre-encoding made by htmlpurifier… other codes that do the same wont be so lucky.
Bogdan Calin from Acunetix WVS described a couple of other potential attack scenarios:
Where an attacker could fool the filter by doing a request like:
vuln.php?input=%F6%3Cimg+onmouseover=prompt(/xss/)//%F6%3E
And:
Where an attacker could fool the filter by doing a request like:
index.php?username=test%FC%27%27+or+1=1+–+&password=a
3.- Integer overflow:
Unsigned short has a size of 16 bits (2 bytes), that is UNCAPABLE of storing unicode characters of 21 bits, and represented on UTF with 4 bytes (1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx). PHP attempts to sum a 21 bits value to a 16 bits-size variable, and then makes no checks on the value.
The affected code follows:
// php/ext/xml/xml.c#558
PHPAPI char *xml_utf8_decode( // …
{
int pos = len;
char *newbuf = emallo // …
unsigned short c; // sizeof(unsigned short)==16 bits
char (*decoder)(unsig // …
xml_encoding *enc = x // …
// …
// #580
c = (unsigned char)(*s);
if (c >= 0xf0) { /* four bytes encoded, 21 bits */
if(pos-4 >= 0) {
c = ((s[0]&7)<<18) | ((s[1]&63)<<12) | ((s[2]&63)<<6) | (s[3]&63);
} else {
c = '?’;
}
s += 4;
pos -= 4;
// …
The relevant part of the code is of course, the declaration of c as an unsigned int, the comment specifing that the char is 21 bits, and this:
x= ((s[0]&7)<<18) | …
s[0]&7<<18 means it will move 3 bits, 18 bits to the right. As we noted before… c’s size is only 16 bits.
(xxxx xxxx & 0000 0111) << 18
Also, this part:
… ((s[1]&63)<<12) | …
s[1]&63<<12 means it will move 6 bits, 12 bits to the right. So, 2 bits are going to be lost.
(xxxx xxxx & 0011 1111) << 12
This allows us to make something even more interesting.
Code like this:
%FF%F0%40%FC that is invalid unicode, overlong, and all you want (definatelly NOT valid), will be casted as a “lower than” simbol (<).
http://eaea.sirdarckcat.net/xss.php?unicode&html_xss=%FF%F0%40%FC
This besides the already mentioned problems, and the possibility of bypassing quite a lot of WAFs and Filters… demonstrate the problem of a bad unicode implementation on PHP.
I hope the PHP development team acknowledges all this issues that have been reported before, and were explained some months ago on Blackhat USA (and the developers were noticed to check the ppt more than once), and now are explained yet another time.
This was fixed on 5.2.11 :) on my birthday!! Sept 17
Anyway… that’s not all, now to finish this post I want to publish a overlong utf-8 exception on Firefox (actually, Mozilla’s).
The firefox one
Firefox is supposed to consider the non-shortest form exception (point #1 in the PHP vulnerabilities), and section 3.1 of the Unicode Technical Report #36 but apparently there’s a flaw on it. This is specially problematic for the reasons that an overlong unicode sequence not taken into consideration may allow several types of filter bypasses.
Anyway, the severity of this vulnerability is not as high as the PHP ones, but is worth mentioning. The following non-shortest form for the char U+1000:
0xF0 0x81 0x80 0x80
is allowed, as well as the correct shortest form:
0xE1 0x80 0x80
Note that this problem is only present on the 4 bytes representation.
You can track this bug at:
https://bugzilla.mozilla.org/show_bug.cgi?id=522634
Anyway, that’s all! Thanks for your time :)
Greetings!!
Related news
spl_array.c in the SPL extension in PHP before 5.5.37 and 5.6.x before 5.6.23 improperly interacts with the unserialize implementation and garbage collection, which allows remote attackers to execute arbitrary code or cause a denial of service (use-after-free and application crash) via crafted serialized data.
The cdf_check_stream_offset function in cdf.c in file before 5.19, as used in the Fileinfo component in PHP before 5.4.30 and 5.5.x before 5.5.14, relies on incorrect sector-size data, which allows remote attackers to cause a denial of service (application crash) via a crafted stream offset in a CDF file.