Headline
CVE-2022-44311: Out of Bounds Read In static void elm_close(tree_node_t *nodo) · Issue #19 · jfisteus/html2xhtml
html2xhtml v1.3 was discovered to contain an Out-Of-Bounds read in the function static void elm_close(tree_node_t *nodo) at procesador.c. This vulnerability allows attackers to access sensitive files or cause a Denial of Service (DoS) via a crafted html file.
Hi there!
Great work on html2xhtml, I find myself using it quite often. While I was using the tool I created some fuzz tests to run in the background. A couple of test cases led to a segfault when using the '-t frameset’ option, which led me to further investigate the crash.
Valgrind
I started with Valgrind, which reported an invalid read of size 4 in each of the test cases:
==1040381== Memcheck, a memory error detector
==1040381== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1040381== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==1040381== Command: ./src/html2xhtml -t frameset report/vuln/id:000000,sig:11,src:001386+001369,time:12081510,execs:2336913,op:splice,rep:16
==1040381==
==1040381== Invalid read of size 4
==1040381== at 0x40E911: elm_close (procesador.c:944)
==1040381== by 0x410617: err_html_struct (procesador.c:1889)
==1040381== by 0x40F20A: err_content_invalid (procesador.c:0)
==1040381== by 0x40F20A: elm_close (procesador.c:959)
==1040381== by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040381== by 0x40DF7A: main (html2xhtml.c:117)
==1040381== Address 0x6f20d4 is not stack'd, malloc'd or (recently) free'd
==1040381==
==1040381==
==1040381== Process terminating with default action of signal 11 (SIGSEGV)
==1040381== Access not within mapped region at address 0x6F20D4
==1040381== at 0x40E911: elm_close (procesador.c:944)
==1040381== by 0x410617: err_html_struct (procesador.c:1889)
==1040381== by 0x40F20A: err_content_invalid (procesador.c:0)
==1040381== by 0x40F20A: elm_close (procesador.c:959)
==1040381== by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040381== by 0x40DF7A: main (html2xhtml.c:117)
==1040381== If you believe this happened as a result of a stack
==1040381== overflow in your program's main thread (unlikely but
==1040381== possible), you can try to increase the size of the
==1040381== main thread stack using the --main-stacksize= flag.
==1040381== The main thread stack size used in this run was 8388608.
==1040381==
==1040381== HEAP SUMMARY:
==1040381== in use at exit: 88,190 bytes in 13 blocks
==1040381== total heap usage: 22 allocs, 9 frees, 2,218,413 bytes allocated
==1040381==
==1040381== LEAK SUMMARY:
==1040381== definitely lost: 0 bytes in 0 blocks
==1040381== indirectly lost: 0 bytes in 0 blocks
==1040381== possibly lost: 0 bytes in 0 blocks
==1040381== still reachable: 88,190 bytes in 13 blocks
==1040381== suppressed: 0 bytes in 0 blocks
==1040381== Rerun with --leak-check=full to see details of leaked memory
==1040381==
==1040381== For lists of detected and suppressed errors, rerun with: -s
==1040381== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==1040419== Memcheck, a memory error detector
==1040419== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1040419== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==1040419== Command: ./src/html2xhtml -t frameset report/vuln/id:000001,sig:11,src:001386+001369,time:12316330,execs:2651995,op:splice,rep:16
==1040419==
==1040419== Invalid read of size 4
==1040419== at 0x40E911: elm_close (procesador.c:944)
==1040419== by 0x410617: err_html_struct (procesador.c:1889)
==1040419== by 0x40F20A: err_content_invalid (procesador.c:0)
==1040419== by 0x40F20A: elm_close (procesador.c:959)
==1040419== by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040419== by 0x40DF7A: main (html2xhtml.c:117)
==1040419== Address 0x6efc84 is not stack'd, malloc'd or (recently) free'd
==1040419==
==1040419==
==1040419== Process terminating with default action of signal 11 (SIGSEGV)
==1040419== Access not within mapped region at address 0x6EFC84
==1040419== at 0x40E911: elm_close (procesador.c:944)
==1040419== by 0x410617: err_html_struct (procesador.c:1889)
==1040419== by 0x40F20A: err_content_invalid (procesador.c:0)
==1040419== by 0x40F20A: elm_close (procesador.c:959)
==1040419== by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040419== by 0x40DF7A: main (html2xhtml.c:117)
==1040419== If you believe this happened as a result of a stack
==1040419== overflow in your program's main thread (unlikely but
==1040419== possible), you can try to increase the size of the
==1040419== main thread stack using the --main-stacksize= flag.
==1040419== The main thread stack size used in this run was 8388608.
==1040419==
==1040419== HEAP SUMMARY:
==1040419== in use at exit: 88,190 bytes in 13 blocks
==1040419== total heap usage: 22 allocs, 9 frees, 2,218,413 bytes allocated
==1040419==
==1040419== LEAK SUMMARY:
==1040419== definitely lost: 0 bytes in 0 blocks
==1040419== indirectly lost: 0 bytes in 0 blocks
==1040419== possibly lost: 0 bytes in 0 blocks
==1040419== still reachable: 88,190 bytes in 13 blocks
==1040419== suppressed: 0 bytes in 0 blocks
==1040419== Rerun with --leak-check=full to see details of leaked memory
==1040419==
==1040419== For lists of detected and suppressed errors, rerun with: -s
==1040419== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==1040433== Memcheck, a memory error detector
==1040433== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1040433== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==1040433== Command: ./src/html2xhtml -t frameset report/vuln/id:000002,sig:11,src:001386+001369,time:43142960,execs:4309058,op:splice,rep:8
==1040433==
==1040433== Invalid read of size 4
==1040433== at 0x40E911: elm_close (procesador.c:944)
==1040433== by 0x410617: err_html_struct (procesador.c:1889)
==1040433== by 0x40F20A: err_content_invalid (procesador.c:0)
==1040433== by 0x40F20A: elm_close (procesador.c:959)
==1040433== by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040433== by 0x40DF7A: main (html2xhtml.c:117)
==1040433== Address 0x6efc84 is not stack'd, malloc'd or (recently) free'd
==1040433==
==1040433==
==1040433== Process terminating with default action of signal 11 (SIGSEGV)
==1040433== Access not within mapped region at address 0x6EFC84
==1040433== at 0x40E911: elm_close (procesador.c:944)
==1040433== by 0x410617: err_html_struct (procesador.c:1889)
==1040433== by 0x40F20A: err_content_invalid (procesador.c:0)
==1040433== by 0x40F20A: elm_close (procesador.c:959)
==1040433== by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040433== by 0x40DF7A: main (html2xhtml.c:117)
==1040433== If you believe this happened as a result of a stack
==1040433== overflow in your program's main thread (unlikely but
==1040433== possible), you can try to increase the size of the
==1040433== main thread stack using the --main-stacksize= flag.
==1040433== The main thread stack size used in this run was 8388608.
==1040433==
==1040433== HEAP SUMMARY:
==1040433== in use at exit: 92,286 bytes in 14 blocks
==1040433== total heap usage: 23 allocs, 9 frees, 2,222,509 bytes allocated
==1040433==
==1040433== LEAK SUMMARY:
==1040433== definitely lost: 0 bytes in 0 blocks
==1040433== indirectly lost: 0 bytes in 0 blocks
==1040433== possibly lost: 0 bytes in 0 blocks
==1040433== still reachable: 92,286 bytes in 14 blocks
==1040433== suppressed: 0 bytes in 0 blocks
==1040433== Rerun with --leak-check=full to see details of leaked memory
==1040433==
==1040433== For lists of detected and suppressed errors, rerun with: -s
==1040433== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==1040439== Memcheck, a memory error detector
==1040439== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1040439== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==1040439== Command: ./src/html2xhtml -t frameset report/vuln/id:000003,sig:11,src:001386+001369,time:43143048,execs:4309129,op:splice,rep:8
==1040439==
==1040439== Invalid read of size 4
==1040439== at 0x40E911: elm_close (procesador.c:944)
==1040439== by 0x410617: err_html_struct (procesador.c:1889)
==1040439== by 0x40F20A: err_content_invalid (procesador.c:0)
==1040439== by 0x40F20A: elm_close (procesador.c:959)
==1040439== by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040439== by 0x40DF7A: main (html2xhtml.c:117)
==1040439== Address 0x6e7074 is not stack'd, malloc'd or (recently) free'd
==1040439==
==1040439==
==1040439== Process terminating with default action of signal 11 (SIGSEGV)
==1040439== Access not within mapped region at address 0x6E7074
==1040439== at 0x40E911: elm_close (procesador.c:944)
==1040439== by 0x410617: err_html_struct (procesador.c:1889)
==1040439== by 0x40F20A: err_content_invalid (procesador.c:0)
==1040439== by 0x40F20A: elm_close (procesador.c:959)
==1040439== by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040439== by 0x40DF7A: main (html2xhtml.c:117)
==1040439== If you believe this happened as a result of a stack
==1040439== overflow in your program's main thread (unlikely but
==1040439== possible), you can try to increase the size of the
==1040439== main thread stack using the --main-stacksize= flag.
==1040439== The main thread stack size used in this run was 8388608.
==1040439==
==1040439== HEAP SUMMARY:
==1040439== in use at exit: 92,286 bytes in 14 blocks
==1040439== total heap usage: 23 allocs, 9 frees, 2,222,509 bytes allocated
==1040439==
==1040439== LEAK SUMMARY:
==1040439== definitely lost: 0 bytes in 0 blocks
==1040439== indirectly lost: 0 bytes in 0 blocks
==1040439== possibly lost: 0 bytes in 0 blocks
==1040439== still reachable: 92,286 bytes in 14 blocks
==1040439== suppressed: 0 bytes in 0 blocks
==1040439== Rerun with --leak-check=full to see details of leaked memory
==1040439==
==1040439== For lists of detected and suppressed errors, rerun with: -s
==1040439== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
GDB Backtrace and Source Code
I attached gdb to html2xhtml in an attempt to find where the Out of Bounds Read was taking place:
Taking a look at the segfault in GDB led me to the following function:
static void elm_close(tree_node_t *nodo)
A user could provide a malformed document with an invalid 'ELM_PTR(nodo).contenttype[doctype]', resulting in the following comparison in assembly:
cmp dword ptr [rbp + rax*4 + 0xc], 4
This could be leveraged to read locations that they should not have access to. I have attached multiple crash files to help reproduce the issue.
Thanks again!
crashes.zip