Security
Headlines
HeadlinesLatestCVEs

Headline

CVE-2022-44311: Out of Bounds Read In static void elm_close(tree_node_t *nodo) · Issue #19 · jfisteus/html2xhtml

html2xhtml v1.3 was discovered to contain an Out-Of-Bounds read in the function static void elm_close(tree_node_t *nodo) at procesador.c. This vulnerability allows attackers to access sensitive files or cause a Denial of Service (DoS) via a crafted html file.

CVE
#vulnerability#dos

Hi there!

Great work on html2xhtml, I find myself using it quite often. While I was using the tool I created some fuzz tests to run in the background. A couple of test cases led to a segfault when using the '-t frameset’ option, which led me to further investigate the crash.

Valgrind

I started with Valgrind, which reported an invalid read of size 4 in each of the test cases:

==1040381== Memcheck, a memory error detector
==1040381== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1040381== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==1040381== Command: ./src/html2xhtml -t frameset report/vuln/id:000000,sig:11,src:001386+001369,time:12081510,execs:2336913,op:splice,rep:16
==1040381== 
==1040381== Invalid read of size 4
==1040381==    at 0x40E911: elm_close (procesador.c:944)
==1040381==    by 0x410617: err_html_struct (procesador.c:1889)
==1040381==    by 0x40F20A: err_content_invalid (procesador.c:0)
==1040381==    by 0x40F20A: elm_close (procesador.c:959)
==1040381==    by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040381==    by 0x40DF7A: main (html2xhtml.c:117)
==1040381==  Address 0x6f20d4 is not stack'd, malloc'd or (recently) free'd
==1040381== 
==1040381== 
==1040381== Process terminating with default action of signal 11 (SIGSEGV)
==1040381==  Access not within mapped region at address 0x6F20D4
==1040381==    at 0x40E911: elm_close (procesador.c:944)
==1040381==    by 0x410617: err_html_struct (procesador.c:1889)
==1040381==    by 0x40F20A: err_content_invalid (procesador.c:0)
==1040381==    by 0x40F20A: elm_close (procesador.c:959)
==1040381==    by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040381==    by 0x40DF7A: main (html2xhtml.c:117)
==1040381==  If you believe this happened as a result of a stack
==1040381==  overflow in your program's main thread (unlikely but
==1040381==  possible), you can try to increase the size of the
==1040381==  main thread stack using the --main-stacksize= flag.
==1040381==  The main thread stack size used in this run was 8388608.
==1040381== 
==1040381== HEAP SUMMARY:
==1040381==     in use at exit: 88,190 bytes in 13 blocks
==1040381==   total heap usage: 22 allocs, 9 frees, 2,218,413 bytes allocated
==1040381== 
==1040381== LEAK SUMMARY:
==1040381==    definitely lost: 0 bytes in 0 blocks
==1040381==    indirectly lost: 0 bytes in 0 blocks
==1040381==      possibly lost: 0 bytes in 0 blocks
==1040381==    still reachable: 88,190 bytes in 13 blocks
==1040381==         suppressed: 0 bytes in 0 blocks
==1040381== Rerun with --leak-check=full to see details of leaked memory
==1040381== 
==1040381== For lists of detected and suppressed errors, rerun with: -s
==1040381== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==1040419== Memcheck, a memory error detector
==1040419== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1040419== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==1040419== Command: ./src/html2xhtml -t frameset report/vuln/id:000001,sig:11,src:001386+001369,time:12316330,execs:2651995,op:splice,rep:16
==1040419== 
==1040419== Invalid read of size 4
==1040419==    at 0x40E911: elm_close (procesador.c:944)
==1040419==    by 0x410617: err_html_struct (procesador.c:1889)
==1040419==    by 0x40F20A: err_content_invalid (procesador.c:0)
==1040419==    by 0x40F20A: elm_close (procesador.c:959)
==1040419==    by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040419==    by 0x40DF7A: main (html2xhtml.c:117)
==1040419==  Address 0x6efc84 is not stack'd, malloc'd or (recently) free'd
==1040419== 
==1040419== 
==1040419== Process terminating with default action of signal 11 (SIGSEGV)
==1040419==  Access not within mapped region at address 0x6EFC84
==1040419==    at 0x40E911: elm_close (procesador.c:944)
==1040419==    by 0x410617: err_html_struct (procesador.c:1889)
==1040419==    by 0x40F20A: err_content_invalid (procesador.c:0)
==1040419==    by 0x40F20A: elm_close (procesador.c:959)
==1040419==    by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040419==    by 0x40DF7A: main (html2xhtml.c:117)
==1040419==  If you believe this happened as a result of a stack
==1040419==  overflow in your program's main thread (unlikely but
==1040419==  possible), you can try to increase the size of the
==1040419==  main thread stack using the --main-stacksize= flag.
==1040419==  The main thread stack size used in this run was 8388608.
==1040419== 
==1040419== HEAP SUMMARY:
==1040419==     in use at exit: 88,190 bytes in 13 blocks
==1040419==   total heap usage: 22 allocs, 9 frees, 2,218,413 bytes allocated
==1040419== 
==1040419== LEAK SUMMARY:
==1040419==    definitely lost: 0 bytes in 0 blocks
==1040419==    indirectly lost: 0 bytes in 0 blocks
==1040419==      possibly lost: 0 bytes in 0 blocks
==1040419==    still reachable: 88,190 bytes in 13 blocks
==1040419==         suppressed: 0 bytes in 0 blocks
==1040419== Rerun with --leak-check=full to see details of leaked memory
==1040419== 
==1040419== For lists of detected and suppressed errors, rerun with: -s
==1040419== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==1040433== Memcheck, a memory error detector
==1040433== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1040433== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==1040433== Command: ./src/html2xhtml -t frameset report/vuln/id:000002,sig:11,src:001386+001369,time:43142960,execs:4309058,op:splice,rep:8
==1040433== 
==1040433== Invalid read of size 4
==1040433==    at 0x40E911: elm_close (procesador.c:944)
==1040433==    by 0x410617: err_html_struct (procesador.c:1889)
==1040433==    by 0x40F20A: err_content_invalid (procesador.c:0)
==1040433==    by 0x40F20A: elm_close (procesador.c:959)
==1040433==    by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040433==    by 0x40DF7A: main (html2xhtml.c:117)
==1040433==  Address 0x6efc84 is not stack'd, malloc'd or (recently) free'd
==1040433== 
==1040433== 
==1040433== Process terminating with default action of signal 11 (SIGSEGV)
==1040433==  Access not within mapped region at address 0x6EFC84
==1040433==    at 0x40E911: elm_close (procesador.c:944)
==1040433==    by 0x410617: err_html_struct (procesador.c:1889)
==1040433==    by 0x40F20A: err_content_invalid (procesador.c:0)
==1040433==    by 0x40F20A: elm_close (procesador.c:959)
==1040433==    by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040433==    by 0x40DF7A: main (html2xhtml.c:117)
==1040433==  If you believe this happened as a result of a stack
==1040433==  overflow in your program's main thread (unlikely but
==1040433==  possible), you can try to increase the size of the
==1040433==  main thread stack using the --main-stacksize= flag.
==1040433==  The main thread stack size used in this run was 8388608.
==1040433== 
==1040433== HEAP SUMMARY:
==1040433==     in use at exit: 92,286 bytes in 14 blocks
==1040433==   total heap usage: 23 allocs, 9 frees, 2,222,509 bytes allocated
==1040433== 
==1040433== LEAK SUMMARY:
==1040433==    definitely lost: 0 bytes in 0 blocks
==1040433==    indirectly lost: 0 bytes in 0 blocks
==1040433==      possibly lost: 0 bytes in 0 blocks
==1040433==    still reachable: 92,286 bytes in 14 blocks
==1040433==         suppressed: 0 bytes in 0 blocks
==1040433== Rerun with --leak-check=full to see details of leaked memory
==1040433== 
==1040433== For lists of detected and suppressed errors, rerun with: -s
==1040433== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==1040439== Memcheck, a memory error detector
==1040439== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1040439== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==1040439== Command: ./src/html2xhtml -t frameset report/vuln/id:000003,sig:11,src:001386+001369,time:43143048,execs:4309129,op:splice,rep:8
==1040439== 
==1040439== Invalid read of size 4
==1040439==    at 0x40E911: elm_close (procesador.c:944)
==1040439==    by 0x410617: err_html_struct (procesador.c:1889)
==1040439==    by 0x40F20A: err_content_invalid (procesador.c:0)
==1040439==    by 0x40F20A: elm_close (procesador.c:959)
==1040439==    by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040439==    by 0x40DF7A: main (html2xhtml.c:117)
==1040439==  Address 0x6e7074 is not stack'd, malloc'd or (recently) free'd
==1040439== 
==1040439== 
==1040439== Process terminating with default action of signal 11 (SIGSEGV)
==1040439==  Access not within mapped region at address 0x6E7074
==1040439==    at 0x40E911: elm_close (procesador.c:944)
==1040439==    by 0x410617: err_html_struct (procesador.c:1889)
==1040439==    by 0x40F20A: err_content_invalid (procesador.c:0)
==1040439==    by 0x40F20A: elm_close (procesador.c:959)
==1040439==    by 0x40E7C4: saxEndDocument (procesador.c:233)
==1040439==    by 0x40DF7A: main (html2xhtml.c:117)
==1040439==  If you believe this happened as a result of a stack
==1040439==  overflow in your program's main thread (unlikely but
==1040439==  possible), you can try to increase the size of the
==1040439==  main thread stack using the --main-stacksize= flag.
==1040439==  The main thread stack size used in this run was 8388608.
==1040439== 
==1040439== HEAP SUMMARY:
==1040439==     in use at exit: 92,286 bytes in 14 blocks
==1040439==   total heap usage: 23 allocs, 9 frees, 2,222,509 bytes allocated
==1040439== 
==1040439== LEAK SUMMARY:
==1040439==    definitely lost: 0 bytes in 0 blocks
==1040439==    indirectly lost: 0 bytes in 0 blocks
==1040439==      possibly lost: 0 bytes in 0 blocks
==1040439==    still reachable: 92,286 bytes in 14 blocks
==1040439==         suppressed: 0 bytes in 0 blocks
==1040439== Rerun with --leak-check=full to see details of leaked memory
==1040439== 
==1040439== For lists of detected and suppressed errors, rerun with: -s
==1040439== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

GDB Backtrace and Source Code

I attached gdb to html2xhtml in an attempt to find where the Out of Bounds Read was taking place:

Taking a look at the segfault in GDB led me to the following function:

static void elm_close(tree_node_t *nodo)

A user could provide a malformed document with an invalid 'ELM_PTR(nodo).contenttype[doctype]', resulting in the following comparison in assembly:

cmp    dword ptr [rbp + rax*4 + 0xc], 4

This could be leveraged to read locations that they should not have access to. I have attached multiple crash files to help reproduce the issue.

Thanks again!

crashes.zip

CVE: Latest News

CVE-2023-50976: Transactions API Authorization by oleiman · Pull Request #14969 · redpanda-data/redpanda