Headline
CVE-2021-46339: Assertion 'lit_is_valid_cesu8_string (string_p, string_size)' failed at jerryscript/jerry-core/ecma/base/ecma-helpers-string.c(ecma_new_ecma_string_from_utf8):371. · Issue #4935 · jerryscript-project
There is an Assertion 'lit_is_valid_cesu8_string (string_p, string_size)' failed at /base/ecma-helpers-string.c(ecma_new_ecma_string_from_utf8) in JerryScript 3.0.0.
JerryScript revision
Commit: a6ab5e9
Version: v3.0.0
Build platform
Ubuntu 18.04.5 LTS (Linux 4.19.128-microsoft-standard x86_64)
Ubuntu 18.04.5 LTS (Linux 5.4.0-44-generic x86_64)
Build steps
python ./tools/build.py --clean --debug --compile-flag=-fsanitize=address --compile-flag=-m32 --compile-flag=-g --strip=off --lto=off --logging=on --line-info=on --error-message=on --system-allocator=on --stack-limit=20
Test case
poc-as.txt
Execution steps & Output
$ ./jerryscript/build/bin/jerry poc.js
ICE: Assertion 'lit_is_valid_cesu8_string (string_p, string_size)' failed at jerryscript/jerry-core/ecma/base/ecma-helpers-string.c(ecma_new_ecma_string_from_utf8):371. Error: ERR_FAILED_INTERNAL_ASSERTION [1] abort jerry poc.js
Credits: Found by OWL337 team.
@rerobika I think it is not a bug, but a feature. “𞸋” is encoded in UTF-8 as 0xF09EB88B which is invaliid in CESU8. But of course we could raise a user friendly error message instead of assertion.
The issue is not with the “𞸋” character, all non-BMP characters are converted to cesu8 encoding during parsing.
The problem is that the first character is in the basic multilingual plane and should be encoded using 3 bytes, however it is encoded using 4 bytes in the input. This messes up the conversion logic, which always expects the cesu8 equivalent to be 6 bytes long.
+info, a simple /*𝔽*/
string fails with the same error if we build with tools/build.py --debug --function-to-string=on