Headline
GHSA-wvhx-q427-fgh3: Arbitrary HTML present after sanitization because of unicode normalization
Impact
If using keep_typographic_whitespace=False
(which is the default), the sanitizer normalizes unicode to the NFKC form at the end. Some unicode characters normalize to chevrons; this allows specially crafted HTML to escape sanitization.
Patches
The problem has been fixed in 2.4.2.
Workarounds
Set keep_typographic_whitespace=True
explicitly, or normalize to NFKC yourself earlier.
Skip to content
Navigation Menu
Actions
Automate any workflow
Packages
Host and manage packages
Security
Find and fix vulnerabilities
Codespaces
Instant dev environments
Copilot
Write better code with AI
Code review
Manage code changes
Issues
Plan and track work
Discussions
Collaborate outside of code
GitHub Sponsors
Fund open source developers
* The ReadME Project
GitHub community articles
- Pricing
Provide feedback
Saved searches****Use saved searches to filter your results more quickly
Sign up
- GitHub Advisory Database
- GitHub Reviewed
- CVE-2024-34078
Arbitrary HTML present after sanitization because of unicode normalization
High severity GitHub Reviewed Published May 5, 2024 in matthiask/html-sanitizer • Updated May 6, 2024
Package
pip html-sanitizer (pip)
Affected versions
< 2.4.2
Description
Impact
If using keep_typographic_whitespace=False (which is the default), the sanitizer normalizes unicode to the NFKC form at the end. Some unicode characters normalize to chevrons; this allows specially crafted HTML to escape sanitization.
Patches
The problem has been fixed in 2.4.2.
Workarounds
Set keep_typographic_whitespace=True explicitly, or normalize to NFKC yourself earlier.
References
- GHSA-wvhx-q427-fgh3
- matthiask/html-sanitizer@48db42f
Published to the GitHub Advisory Database
May 6, 2024