Headline
GHSA-4vvm-4w3v-6mr8: pypdf and PyPDF2 possible Infinite Loop when a comment isn't followed by a character
Impact
An attacker who uses this vulnerability can craft a PDF which leads to an infinite loop if __parse_content_stream
is executed. This infinite loop blocks the current process and can utilize a single core of the CPU by 100%. It does not affect memory usage. That is, for example, the case if the user extracted text from such a PDF.
Example Code and a PDF that causes the issue:
from pypdf import PdfReader
# https://objects.githubusercontent.com/github-production-repository-file-5c1aeb/3119517/11367871?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230627%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230627T201018Z&X-Amz-Expires=300&X-Amz-Signature=d71c8fd9181c4875f0c04d563b6d32f1d4da6e7b2e6be2f14479ce4ecdc9c8b2&X-Amz-SignedHeaders=host&actor_id=1658117&key_id=0&repo_id=3119517&response-content-disposition=attachment%3Bfilename%3DMiFO_LFO_FEIS_NOA_Published.3.pdf&response-content-type=application%2Fpdf
reader = PdfReader("MiFO_LFO_FEIS_NOA_Published.3.pdf")
page = reader.pages[0]
page.extract_text()
The issue was introduced with https://github.com/py-pdf/pypdf/pull/969
Patches
The issue was fixed with https://github.com/py-pdf/pypdf/pull/1828
Workarounds
If you cannot update your version of pypdf, you should modify pypdf/generic/_data_structures.py
:
OLD: while peek not in (b"\r", b"\n"):
NEW: while peek not in (b"\r", b"\n", b""):
Impact
An attacker who uses this vulnerability can craft a PDF which leads to an infinite loop if __parse_content_stream is executed. This infinite loop blocks the current process and can utilize a single core of the CPU by 100%. It does not affect memory usage. That is, for example, the case if the user extracted text from such a PDF.
Example Code and a PDF that causes the issue:
from pypdf import PdfReader
# https://objects.githubusercontent.com/github-production-repository-file-5c1aeb/3119517/11367871?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230627%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230627T201018Z&X-Amz-Expires=300&X-Amz-Signature=d71c8fd9181c4875f0c04d563b6d32f1d4da6e7b2e6be2f14479ce4ecdc9c8b2&X-Amz-SignedHeaders=host&actor_id=1658117&key_id=0&repo_id=3119517&response-content-disposition=attachment%3Bfilename%3DMiFO_LFO_FEIS_NOA_Published.3.pdf&response-content-type=application%2Fpdf reader = PdfReader(“MiFO_LFO_FEIS_NOA_Published.3.pdf”) page = reader.pages[0] page.extract_text()
The issue was introduced with py-pdf/pypdf#969
Patches
The issue was fixed with py-pdf/pypdf#1828
Workarounds
If you cannot update your version of pypdf, you should modify pypdf/generic/_data_structures.py:
OLD: while peek not in (b"\r", b"\n"):
NEW: while peek not in (b"\r", b"\n", b""):
References
- GHSA-4vvm-4w3v-6mr8
- https://nvd.nist.gov/vuln/detail/CVE-2023-36464
- py-pdf/pypdf#1828
- py-pdf/pypdf#969
- py-pdf/pypdf@b0e5c68
- https://github.com/py-pdf/pypdf/releases/tag/3.9.0
Related news
pypdf is an open source, pure-python PDF library. In affected versions an attacker may craft a PDF which leads to an infinite loop if `__parse_content_stream` is executed. That is, for example, the case if the user extracted text from such a PDF. This issue was introduced in pull request #969 and resolved in pull request #1828. Users are advised to upgrade. Users unable to upgrade may modify the line `while peek not in (b"\r", b"\n")` in `pypdf/generic/_data_structures.py` to `while peek not in (b"\r", b"\n", b"")`.