Headline
CVE-2023-36464: improved ExtractText(3) by pubpub-zz · Pull Request #969 · py-pdf/pypdf
pypdf is an open source, pure-python PDF library. In affected versions an attacker may craft a PDF which leads to an infinite loop if __parse_content_stream
is executed. That is, for example, the case if the user extracted text from such a PDF. This issue was introduced in pull request #969 and resolved in pull request #1828. Users are advised to upgrade. Users unable to upgrade may modify the line while peek not in (b"\r", b"\n")
in pypdf/generic/_data_structures.py
to while peek not in (b"\r", b"\n", b"")
.
@MartinThoma
my problems are this section of code
@pytest.mark.parametrize(
("dat", "pos", "expected", "expected_pos"),
[
(b"abc", 1, b"a", 0),
(b"abc", 2, b"ab", 0),
(b"abc", 3, b"abc", 0),
(b"abc\n", 3, b"abc", 0),
(b"abc\n", 4, b"", 3),
(b"abc\n\r", 4, b"", 3),
(b"abc\nd", 5, b"d", 3),
# Skip over multiple CR/LF bytes
(b"abc\n\r\ndef", 9, b"def", 3),
# Include a block full of newlines...
(
b"abc" + b"\n" * (2 * io.DEFAULT_BUFFER_SIZE) + b"d",
2 * io.DEFAULT_BUFFER_SIZE + 4,
b"d",
3,
),
# Include a block full of non-newline characters
(
b"abc\n" + b"d" * (2 * io.DEFAULT_BUFFER_SIZE),
2 * io.DEFAULT_BUFFER_SIZE + 4,
b"d" * (2 * io.DEFAULT_BUFFER_SIZE),
3,
),
# Both
(
b"abcxyz"
+ b"\n" * (2 * io.DEFAULT_BUFFER_SIZE)
+ b"d" * (2 * io.DEFAULT_BUFFER_SIZE),
4 * io.DEFAULT_BUFFER_SIZE + 6,
b"d" * (2 * io.DEFAULT_BUFFER_SIZE),
6,
),
],
)
def test_read_previous_line(dat, pos, expected, expected_pos):
s = io.BytesIO(dat)
s.seek(pos)
assert read_previous_line(s) == expected
assert s.tell() == expected_pos
Related news
### Impact An attacker who uses this vulnerability can craft a PDF which leads to an infinite loop if `__parse_content_stream` is executed. This infinite loop blocks the current process and can utilize a single core of the CPU by 100%. It does not affect memory usage. That is, for example, the case if the user extracted text from such a PDF. Example Code and a PDF that causes the issue: ```python from pypdf import PdfReader # https://objects.githubusercontent.com/github-production-repository-file-5c1aeb/3119517/11367871?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230627%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230627T201018Z&X-Amz-Expires=300&X-Amz-Signature=d71c8fd9181c4875f0c04d563b6d32f1d4da6e7b2e6be2f14479ce4ecdc9c8b2&X-Amz-SignedHeaders=host&actor_id=1658117&key_id=0&repo_id=3119517&response-content-disposition=attachment%3Bfilename%3DMiFO_LFO_FEIS_NOA_Published.3.pdf&response-content-type=application%2Fpdf reader = PdfReader("MiFO_LFO_FEIS_NO...