Headline

CVE-2023-36464: improved ExtractText(3) by pubpub-zz · Pull Request #969 · py-pdf/pypdf

pypdf is an open source, pure-python PDF library. In affected versions an attacker may craft a PDF which leads to an infinite loop if __parse_content_stream is executed. That is, for example, the case if the user extracted text from such a PDF. This issue was introduced in pull request #969 and resolved in pull request #1828. Users are advised to upgrade. Users unable to upgrade may modify the line while peek not in (b"\r", b"\n") in pypdf/generic/_data_structures.py to while peek not in (b"\r", b"\n", b"").

2 years ago

CVE

Open in Source

#pdf

@MartinThoma
my problems are this section of code

@pytest.mark.parametrize(
    ("dat", "pos", "expected", "expected_pos"),
    [
        (b"abc", 1, b"a", 0),
        (b"abc", 2, b"ab", 0),
        (b"abc", 3, b"abc", 0),
        (b"abc\n", 3, b"abc", 0),
        (b"abc\n", 4, b"", 3),
        (b"abc\n\r", 4, b"", 3),
        (b"abc\nd", 5, b"d", 3),
        # Skip over multiple CR/LF bytes
        (b"abc\n\r\ndef", 9, b"def", 3),
        # Include a block full of newlines...
        (
            b"abc" + b"\n" * (2 * io.DEFAULT_BUFFER_SIZE) + b"d",
            2 * io.DEFAULT_BUFFER_SIZE + 4,
            b"d",
            3,
        ),
        # Include a block full of non-newline characters
        (
            b"abc\n" + b"d" * (2 * io.DEFAULT_BUFFER_SIZE),
            2 * io.DEFAULT_BUFFER_SIZE + 4,
            b"d" * (2 * io.DEFAULT_BUFFER_SIZE),
            3,
        ),
        # Both
        (
            b"abcxyz"
            + b"\n" * (2 * io.DEFAULT_BUFFER_SIZE)
            + b"d" * (2 * io.DEFAULT_BUFFER_SIZE),
            4 * io.DEFAULT_BUFFER_SIZE + 6,
            b"d" * (2 * io.DEFAULT_BUFFER_SIZE),
            6,
        ),
    ],
)
def test_read_previous_line(dat, pos, expected, expected_pos):
    s = io.BytesIO(dat)
    s.seek(pos)
    assert read_previous_line(s) == expected
    assert s.tell() == expected_pos

### Impact An attacker who uses this vulnerability can craft a PDF which leads to an infinite loop if `__parse_content_stream` is executed. This infinite loop blocks the current process and can utilize a single core of the CPU by 100%. It does not affect memory usage. That is, for example, the case if the user extracted text from such a PDF. Example Code and a PDF that causes the issue: ```python from pypdf import PdfReader # https://objects.githubusercontent.com/github-production-repository-file-5c1aeb/3119517/11367871?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230627%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230627T201018Z&X-Amz-Expires=300&X-Amz-Signature=d71c8fd9181c4875f0c04d563b6d32f1d4da6e7b2e6be2f14479ce4ecdc9c8b2&X-Amz-SignedHeaders=host&actor_id=1658117&key_id=0&repo_id=3119517&response-content-disposition=attachment%3Bfilename%3DMiFO_LFO_FEIS_NOA_Published.3.pdf&response-content-type=application%2Fpdf reader = PdfReader("MiFO_LFO_FEIS_NO...

2 years ago

ghsa

Open in Source

#vulnerability #mac #git #pdf