Security
Headlines
HeadlinesLatestCVEs

Headline

GHSA-hm9v-vj3r-r55m: PyPDF2 vulnerable to possible Infinite Loop when reading malformed objects

Impact

An attacker who uses this vulnerability can craft a PDF which leads to an infinite loop. This infinite loop blocks the current process and can utilize a single core of the CPU by 100%. It does not affect memory usage. That is, for example, the case if the user extracted metadata from such a malformed PDF.

Patches

The issue was fixed with https://github.com/py-pdf/pypdf/pull/1331

Workarounds

If you cannot update your version of PyPDF2 (preferably to pypdf>3.1.0 as PyPDF2 is deprecated), you should modify PyPDF2/generic/_data_structures.py::read_object.

Replace:

    else:
        # number object OR indirect reference
        peek = stream.read(20)
        stream.seek(-len(peek), 1)  # reset to start
        if IndirectPattern.match(peek) is not None:
            return IndirectObject.read_from_stream(stream, pdf)
        else:
            return NumberObject.read_from_stream(stream)

by

    elif tok in b"0123456789+-.":
        # number object OR indirect reference
        peek = stream.read(20)
        stream.seek(-len(peek), 1)  # reset to start
        if IndirectPattern.match(peek) is not None:
            return IndirectObject.read_from_stream(stream, pdf)
        else:
            return NumberObject.read_from_stream(stream)
    else:
        raise PdfReadError(
            f"Invalid Elementary Object starting with {tok} @{stream.tell()}"
        )

References

ghsa
#vulnerability#git#pdf

Impact

An attacker who uses this vulnerability can craft a PDF which leads to an infinite loop.
This infinite loop blocks the current process and can utilize a single core of the CPU by 100%. It does not affect memory usage. That is, for example, the case if the user extracted metadata from such a malformed PDF.

Patches

The issue was fixed with py-pdf/pypdf#1331

Workarounds

If you cannot update your version of PyPDF2 (preferably to pypdf>3.1.0 as PyPDF2 is deprecated), you should modify PyPDF2/generic/_data_structures.py::read_object.

Replace:

else:
    \# number object OR indirect reference
    peek \= stream.read(20)
    stream.seek(\-len(peek), 1)  \# reset to start
    if IndirectPattern.match(peek) is not None:
        return IndirectObject.read\_from\_stream(stream, pdf)
    else:
        return NumberObject.read\_from\_stream(stream)

by

elif tok in b"0123456789+-.":
    \# number object OR indirect reference
    peek \= stream.read(20)
    stream.seek(\-len(peek), 1)  \# reset to start
    if IndirectPattern.match(peek) is not None:
        return IndirectObject.read\_from\_stream(stream, pdf)
    else:
        return NumberObject.read\_from\_stream(stream)
else:
    raise PdfReadError(
        f"Invalid Elementary Object starting with {tok} @{stream.tell()}"
    )

References

  • pypdf issue #1329
  • pypdf PR #1331

References

  • GHSA-hm9v-vj3r-r55m
  • https://nvd.nist.gov/vuln/detail/CVE-2023-36807
  • py-pdf/pypdf#1329
  • py-pdf/pypdf#1331
  • py-pdf/pypdf@e6531a2

Related news

CVE-2023-36807: ROB: Fix infinite loop due to Invalid object by pubpub-zz · Pull Request #1331 · py-pdf/pypdf

pypdf is a pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. In version 2.10.5 an attacker who uses this vulnerability can craft a PDF which leads to an infinite loop. This infinite loop blocks the current process and can utilize a single core of the CPU by 100%. It does not affect memory usage. That is, for example, the case if the user extracted metadata from such a malformed PDF. Versions prior to 2.10.5 throw an error, but do not hang forever. This issue was fixed with https://github.com/py-pdf/pypdf/pull/1331 which has been included in release 2.10.6. Users are advised to upgrade. Users unable to upgrade should modify `PyPDF2/generic/_data_structures.py::read_object` to an an error throwing case. See GHSA-hm9v-vj3r-r55m for details.