Headline
CVE-2023-36807: ROB: Fix infinite loop due to Invalid object by pubpub-zz · Pull Request #1331 · py-pdf/pypdf
pypdf is a pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. In version 2.10.5 an attacker who uses this vulnerability can craft a PDF which leads to an infinite loop. This infinite loop blocks the current process and can utilize a single core of the CPU by 100%. It does not affect memory usage. That is, for example, the case if the user extracted metadata from such a malformed PDF. Versions prior to 2.10.5 throw an error, but do not hang forever. This issue was fixed with https://github.com/py-pdf/pypdf/pull/1331 which has been included in release 2.10.6. Users are advised to upgrade. Users unable to upgrade should modify PyPDF2/generic/_data_structures.py::read_object
to an an error throwing case. See GHSA-hm9v-vj3r-r55m for details.
Conversation
fixes py-pdf#1329 *prevent loop within dictionnaries where objects not respecting standard *fix cmap warnings due to “numbered” characters ( #2d instead of -) *apply unnumbering to nameobject *add _get_indirect_object for debug/dev purpose *add some missing seeks (no issue reported yet)
return NameObject.read_from_stream(stream, pdf)
elif idx == 1:
elif tok == b"<":
MartinThoma added a commit that referenced this pull request
Sep 9, 2022
Robustness (ROB):
- Fix infinite loop due to Invalid object (#1331)
- Fix image extraction issue with superfluous whitespaces (#1327)
Full Changelog: 2.10.5…2.10.6
MartinThoma changed the title ROB : fix infinite loop due to Invalid object ROB: Fix infinite loop due to Invalid object
Jun 30, 2023
Related news
### Impact An attacker who uses this vulnerability can craft a PDF which leads to an infinite loop. This infinite loop blocks the current process and can utilize a single core of the CPU by 100%. It does not affect memory usage. That is, for example, the case if the user extracted metadata from such a malformed PDF. ### Patches The issue was fixed with https://github.com/py-pdf/pypdf/pull/1331 ### Workarounds If you cannot update your version of `PyPDF2` (preferably to `pypdf>3.1.0` as PyPDF2 is deprecated), you should modify `PyPDF2/generic/_data_structures.py::read_object`. Replace: ```python else: # number object OR indirect reference peek = stream.read(20) stream.seek(-len(peek), 1) # reset to start if IndirectPattern.match(peek) is not None: return IndirectObject.read_from_stream(stream, pdf) else: return NumberObject.read_from_stream(stream) ``` by ```python elif tok in b"0123456789+-.": # numb...