Headline
CVE-2019-19921: [CVE-2019-19921]: Volume mount race condition with shared mounts · Issue #2197 · opencontainers/runc
runc through 1.0.0-rc9 has Incorrect Access Control leading to Escalation of Privileges, related to libcontainer/rootfs_linux.go. To exploit this, an attacker must be able to spawn two containers with custom volume-mount configurations, and be able to run custom images. (This vulnerability does not affect Docker due to an implementation detail that happens to block the attack.)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
leoluk opened this issue
Jan 1, 2020
· 11 comments · Fixed by #2207
Comments
Disclosed in #2190.
Here’s the original report to [email protected]:
Hi all,
an attacker who controls the container image for two containers that share a volume can race volume mounts during container initialization, by adding a symlink to the rootfs that points to a directory on the volume. The second container won’t be able to see the actual mount, but it can race it by modifying the mount point on the volume.
This can be exploited for a full container breakout by racing readonly/mask mounts, allowing writes to dangerous paths like /proc/sys/kernel/core_pattern.
Example:
- The rootfs of container A has a symlink /proc -> /evil/level1
- Container A specifies a named volume mounted to /evil
- Container B, started before container A, shares this named volume and repeatedly swaps /evil/level1 and /evil/level1~
- Container A mounts procfs to /evil/level1~/level2, but when it remounts /proc/sys, it does so at /evil/level1/level2/sys.
This can reliably be reproduced using runc and podman on Fedora 30 (takes about 0-5s to win the race for me): https://gist.github.com/leoluk/82965ad9df58247202aa0e1878439092
SELinux would ordinarily prevent the exploit by disallowing container_t from writing usermodehelper_t, but it can be disabled by symlinking /proc/self/task/1/attr/exec to something benign like /proc/self/sched (bypassing the procfs check). AppArmor can be disabled similarly.
Docker specifies the mounts in a different order and mounts procfs after it mounts the volumes, mounting over the /proc symlink, which appears to prevent at least the /proc approach. I haven’t tested other runc usage scenarios, for instance, k8s+cri-o might be vulnerable as well.
Fabian of Cure53 (in CC) created a minimal PoC that uses runc directly: https://gist.github.com/LiveOverflow/c937820b688922eb127fb760ce06dab9
There are other container init steps after the volume mount that can be raced, obvious ones being utils.CloseExecFrom and the AppArmor/SELinux attrs but there might be others, especially in mountToRootfs (like tricking remount into mounting the rootfs as rshared if there’s another volume that specifies the flag, but I haven’t tried that).
This is similar to the vulnerability I reported that Adam Iwaniuk disclosed during their Dragon Sector CTF (#2128) and a similar crun one (containers/crun#111).
The fix for the mounts is probably what Aleksa outlined here, using /proc/self/fd to resolve the path: containers/crun#111 (comment)
My proposed (“stop the bleeding”) patch was something like the following:
commit 81a9af6677b1f87e70b87e9a655cb4f4d06a0503 (HEAD -> fix-double-volume-attack) Author: Aleksa Sarai [email protected] Date: Sat Dec 21 23:40:17 2019 +1100
rootfs: do not permit /proc mounts to non-directories
mount(2) will blindly follow symlinks, which is a problem because it
allows a malicious container to trick runc into mounting /proc to an
entirely different location (and thus within the attacker's control for
a rename-exchange attack).
This is just a hotfix, and the more complete fix would be finish
libpathrs and port runc to it (to avoid these types of attacks entirely,
and defend against a variety of other /proc-related attacks).
Fixes: CVE-YYYY-XXXX
Signed-off-by: Aleksa Sarai <[email protected]>
diff --git a/libcontainer/rootfs_linux.go b/libcontainer/rootfs_linux.go index 291021440a1a…6e896bc4fdaa 100644 — a/libcontainer/rootfs_linux.go +++ b/libcontainer/rootfs_linux.go @@ -297,17 +297,49 @@ func mountToRootfs(m *configs.Mount, rootfs, mountLabel string, enableCgroupns b dest = filepath.Join(rootfs, dest) }
// For "special" filesystems, we have to be quite careful about mounting --
// we must make sure that the destination is what we expect. This is done
// by opening the destination as an O\_PATH descriptor, and using the
// /proc/self/fd/... as the mount target. Unfortunately this is actually
// possible to bypass with a little bit of thought, but the complete
// solution for this will be to port runc to libpathrs. switch m.Device {
- case "proc", "sysfs":
case "proc", "sysfs", "mqueue":
// NOTE: If the container controls any part of dest, this is unsafe. if err := os.MkdirAll(dest, 0755); err != nil { return err }
destFd, err := unix.Open(dest, unix.O\_PATH|unix.O\_CLOEXEC, 0)
if err != nil {
return err
}
defer unix.Close(destFd)
// Check that the path is exactly what we expect.
// NOTE: If the path contains an attacker-controlled bind-mount, this
// check won't do anything. In addition, if procfs is fraudulent,
// it will also be useless. As above, the solution is to switch
// to libpathrs.
destFdPath := fmt.Sprintf("/proc/self/fd/%d", destFd)
destUnsafePath, err := os.Readlink(destFdPath)
if err != nil {
return err
}
if destUnsafePath != dest {
return fmt.Errorf("detected possible breakout: trying to mount '%s' on '%s' was actually targeted to '%s'", m.Device, dest, destUnsafePath)
}
// Okay, now we can use destFdPath.
dest = destFdPath
m.Destination = destFdPath
}
// Now actually do the mount.
switch m.Device {
case "proc", "sysfs": // Selinux kernels do not support labeling of /proc or /sys return mountPropagate(m, rootfs, "") case "mqueue":
- if err := os.MkdirAll(dest, 0755); err != nil { - return err - } if err := mountPropagate(m, rootfs, mountLabel); err != nil { // older kernels do not support labeling of /dev/mqueue if err := mountPropagate(m, rootfs, “”); err != nil {
Unfortunately this is not sufficient if / is shared with another container, because then you can do the same trick (but this time on / directly). It also needs some more work to work around the fact that there are m.Destination-based checks elsewhere in rootfs_linux.go.
Your patch does stop the bleeding, though - most runc use cases do not share the rootfs. Mounting a volume on / breaks all kinds of things. Haven’t managed to do anything useful using either cri-o or podman.
Alright, I’ll prepare a PR. Thanks @leoluk – and sorry for the response time issues (as well as how the disclosure happened).
any ETA on the workaround to unblock rc10?
I’ve been off the face of the earth for the past 2ish weeks. I will prepare a PR tomorrow.
#2207 contains a very simplified version of the above patch (the patch I posted above doesn’t work because rootfs_linux.go has a very fun relationship with pathnames that I don’t have time to debug right now).
CameronNemo added a commit to CameronNemo/void-packages that referenced this issue
Jan 24, 2020
CameronNemo added a commit to CameronNemo/void-packages that referenced this issue
Jan 24, 2020
Hoshpak pushed a commit to void-linux/void-packages that referenced this issue
Jan 24, 2020
This was referenced
Jan 24, 2020
atweiden pushed a commit to atweiden/voidpkgs that referenced this issue
Jan 24, 2020
Hi,
I’m part of the Debian Long Term Support (LTS) team, and I’m attempting to fix CVE-2019-19921 in our past releases that package "runc".
(apologizes for digging up this old issue :))
I’m still able to reproduce the vulnerability (using the runc reproducer linked in the original topic), in the following situations:
- backporting the fix 2fc03cc to 1.0.0~rc6 (Debian 10 “buster"/"old-stable”)
- more annoyingly, with 1.0.0~rc93, as shipped in Debian 11 "bullseye"/current; for reference the fix was pushed to rc10
AFAICS the fix does make the exploit less likely, but does not stop it entirely: within a few minutes I’m still able to overwrite my root system’s /proc/sys/kernel/core_pattern from container-2.
Is this expected (as in, it’s a mitigation but not a bullet-proof fix)?
Or is there a follow-up fix that I missed?
Thanks for your attention and best regards.
2fc03cc should completely prevent the exploit. It adds a check to avoid mounting procfs to /proc in the rootfs if the target is something other than a directory or absent, which makes it impossible to point it to an attacker-controlled bind mount. It’s not possible to race /proc itself in this setup (the rootfs is not attacker-accessible during early setup).
Either there’s a regression or something’s wrong with the Debian backport.
Thanks for your fast feedback!
Debian might have different dependency versions, because it mostly removes vendor/* and uses the packaged versions.
Thus I tried with a Ubuntu Focal (20.04) VM where ‘runc’ is built with the built-in vendor/*, to make sure if that was the reason.
Interestingly:
- 1.0.0~rc10-0ubuntu1 correctly blocs the mount attempt early and 'runc run container-[12]' fails (“must be mounted on ordinary directory”)
- 1.0.0~rc95-0ubuntu1~20.04.2 is vulnerable to the PoC
- 1.1.0-0ubuntu1~20.04.2 is vulnerable to the PoC
So AFAICS, despite the presence of the fix in all versions, some other commit re-introduced the issue.
(and similarly the fix alone didn’t appear to fix ~rc6 in my previous message)
If you’ve got further insights I’d be grateful :)
Otherwise I can try and bisect to pinpoint when the fix lost its effectiveness (probably tomorrow).
After a bit of digging, ironically it looks like the fix for this vulnerability (CVE-2019-19921) was broken by the one for CVE-2021-30465: 0ca91f4
This sounds like a regression as you suspected.
Do you want me to open a new ticket for this?
And register a new CVE (if you confirm)?
Beuc mentioned this issue
Feb 24, 2023
Related news
gRPC contains a vulnerability that allows hpack table accounting errors could lead to unwanted disconnects between clients and servers in exceptional cases/ Three vectors were found that allow the following DOS attacks: - Unbounded memory buffering in the HPACK parser - Unbounded CPU consumption in the HPACK parser The unbounded CPU consumption is down to a copy that occurred per-input-block in the parser, and because that could be unbounded due to the memory copy bug we end up with an O(n^2) parsing loop, with n selected by the client. The unbounded memory buffering bugs: - The header size limit check was behind the string reading code, so we needed to first buffer up to a 4 gigabyte string before rejecting it as longer than 8 or 16kb. - HPACK varints have an encoding quirk whereby an infinite number of 0’s can be added at the start of an integer. gRPC’s hpack parser needed to read all of them before concluding a parse. - gRPC’s metadata overflow check was performed per frame, so ...
Red Hat Security Advisory 2023-4093-01 - Red Hat OpenShift Container Platform is Red Hat's cloud computing Kubernetes application platform solution designed for on-premise or private cloud deployments. This advisory contains the RPM packages for Red Hat OpenShift Container Platform 4.13.5. Issues addressed include a denial of service vulnerability.
Red Hat OpenShift Container Platform release 4.13.5 is now available with updates to packages and images that fix several bugs and add enhancements. This release includes a security update for Red Hat OpenShift Container Platform 4.13. Red Hat Product Security has rated this update as having a security impact of Important. A Common Vulnerability Scoring System (CVSS) base score, which gives a detailed severity rating, is available for each vulnerability from the CVE link(s) in the References section.This content is licensed under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). If you distribute this content, or a modified version of it, you must provide attribution to Red Hat Inc. and provide a link to the original. Related CVEs: * CVE-2023-1260: An authentication bypass vulnerability was discovered in kube-apiserver. This issue could allow a remote, authenticated attacker who has been given permissions "update, patch" the "po...
Red Hat Security Advisory 2023-3612-01 - Red Hat OpenShift Container Platform is Red Hat's cloud computing Kubernetes application platform solution designed for on-premise or private cloud deployments. This advisory contains the RPM packages for Red Hat OpenShift Container Platform 4.13.4. Issues addressed include a denial of service vulnerability.
Dell VxRail, version(s) 8.0.100 and earlier contain a denial-of-service vulnerability in the upgrade functionality. A remote unauthenticated attacker could potentially exploit this vulnerability, leading to degraded performance and system malfunction.
Red Hat Security Advisory 2023-1326-01 - Red Hat OpenShift Container Platform is Red Hat's cloud computing Kubernetes application platform solution designed for on-premise or private cloud deployments. This advisory contains the container images for Red Hat OpenShift Container Platform 4.13.0. Issues addressed include bypass, denial of service, information leakage, out of bounds read, and remote SQL injection vulnerabilities.
Red Hat OpenShift Container Platform release 4.13.0 is now available with updates to packages and images that fix several bugs and add enhancements. This release includes a security update for Red Hat OpenShift Container Platform 4.13. Red Hat Product Security has rated this update as having a security impact of Important. A Common Vulnerability Scoring System (CVSS) base score, which gives a detailed severity rating, is available for each vulnerability from the CVE link(s) in the References section.This content is licensed under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). If you distribute this content, or a modified version of it, you must provide attribution to Red Hat Inc. and provide a link to the original. Related CVEs: * CVE-2021-4235: A flaw was found in go-yaml. This issue occurs due to unbounded alias chasing, where a maliciously crafted YAML file can cause the system to consume significant system resources. If p...
A parsing vulnerability for the MessageSet type in the ProtocolBuffers versions prior to and including 3.16.1, 3.17.3, 3.18.2, 3.19.4, 3.20.1 and 3.21.5 for protobuf-cpp, and versions prior to and including 3.16.1, 3.17.3, 3.18.2, 3.19.4, 3.20.1 and 4.21.5 for protobuf-python can lead to out of memory failures. A specially crafted message with multiple key-value per elements creates parsing issues, and can lead to a Denial of Service against services receiving unsanitized input. We recommend upgrading to versions 3.18.3, 3.19.5, 3.20.2, 3.21.6 for protobuf-cpp and 3.18.3, 3.19.5, 3.20.2, 4.21.6 for protobuf-python. Versions for 3.16 and 3.17 are no longer updated.
Dell Unity, Dell UnityVSA, and Dell Unity XT versions prior to 5.2.0.0.5.173 contain a plain-text password storage vulnerability when certain off-array tools are run on the system. The credentials of a user with high privileges are stored in plain text. A local malicious user with high privileges may use the exposed password to gain access with the privileges of the compromised user.
runc before 1.0.0-rc95 allows a Container Filesystem Breakout via Directory Traversal. To exploit the vulnerability, an attacker must be able to create multiple containers with a fairly specific mount configuration. The problem occurs via a symlink-exchange attack that relies on a race condition.
In Docker before versions 9.03.15, 20.10.3 there is a vulnerability in which pulling an intentionally malformed Docker image manifest crashes the dockerd daemon. Versions 20.10.3 and 19.03.15 contain patches that prevent the daemon from crashing.