Security
Headlines
HeadlinesLatestCVEs

Headline

GHSA-4vvm-4w3v-6mr8: pypdf and PyPDF2 possible Infinite Loop when a comment isn't followed by a character

Impact

An attacker who uses this vulnerability can craft a PDF which leads to an infinite loop if __parse_content_stream is executed. This infinite loop blocks the current process and can utilize a single core of the CPU by 100%. It does not affect memory usage. That is, for example, the case if the user extracted text from such a PDF.

Example Code and a PDF that causes the issue:

from pypdf import PdfReader

# https://objects.githubusercontent.com/github-production-repository-file-5c1aeb/3119517/11367871?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230627%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230627T201018Z&X-Amz-Expires=300&X-Amz-Signature=d71c8fd9181c4875f0c04d563b6d32f1d4da6e7b2e6be2f14479ce4ecdc9c8b2&X-Amz-SignedHeaders=host&actor_id=1658117&key_id=0&repo_id=3119517&response-content-disposition=attachment%3Bfilename%3DMiFO_LFO_FEIS_NOA_Published.3.pdf&response-content-type=application%2Fpdf
reader = PdfReader("MiFO_LFO_FEIS_NOA_Published.3.pdf")
page = reader.pages[0]
page.extract_text()

The issue was introduced with https://github.com/py-pdf/pypdf/pull/969

Patches

The issue was fixed with https://github.com/py-pdf/pypdf/pull/1828

Workarounds

If you cannot update your version of pypdf, you should modify pypdf/generic/_data_structures.py:

OLD: while peek not in (b"\r", b"\n"):
NEW: while peek not in (b"\r", b"\n", b""):
ghsa
#vulnerability#mac#git#pdf

Impact

An attacker who uses this vulnerability can craft a PDF which leads to an infinite loop if __parse_content_stream is executed. This infinite loop blocks the current process and can utilize a single core of the CPU by 100%. It does not affect memory usage. That is, for example, the case if the user extracted text from such a PDF.

Example Code and a PDF that causes the issue:

from pypdf import PdfReader

# https://objects.githubusercontent.com/github-production-repository-file-5c1aeb/3119517/11367871?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230627%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230627T201018Z&X-Amz-Expires=300&X-Amz-Signature=d71c8fd9181c4875f0c04d563b6d32f1d4da6e7b2e6be2f14479ce4ecdc9c8b2&X-Amz-SignedHeaders=host&actor_id=1658117&key_id=0&repo_id=3119517&response-content-disposition=attachment%3Bfilename%3DMiFO_LFO_FEIS_NOA_Published.3.pdf&response-content-type=application%2Fpdf reader = PdfReader(“MiFO_LFO_FEIS_NOA_Published.3.pdf”) page = reader.pages[0] page.extract_text()

The issue was introduced with py-pdf/pypdf#969

Patches

The issue was fixed with py-pdf/pypdf#1828

Workarounds

If you cannot update your version of pypdf, you should modify pypdf/generic/_data_structures.py:

OLD: while peek not in (b"\r", b"\n"):
NEW: while peek not in (b"\r", b"\n", b""):

References

  • GHSA-4vvm-4w3v-6mr8
  • https://nvd.nist.gov/vuln/detail/CVE-2023-36464
  • py-pdf/pypdf#1828
  • py-pdf/pypdf#969
  • py-pdf/pypdf@b0e5c68
  • https://github.com/py-pdf/pypdf/releases/tag/3.9.0

Related news

CVE-2023-36464: improved ExtractText(3) by pubpub-zz · Pull Request #969 · py-pdf/pypdf

pypdf is an open source, pure-python PDF library. In affected versions an attacker may craft a PDF which leads to an infinite loop if `__parse_content_stream` is executed. That is, for example, the case if the user extracted text from such a PDF. This issue was introduced in pull request #969 and resolved in pull request #1828. Users are advised to upgrade. Users unable to upgrade may modify the line `while peek not in (b"\r", b"\n")` in `pypdf/generic/_data_structures.py` to `while peek not in (b"\r", b"\n", b"")`.