Security
Headlines
HeadlinesLatestCVEs

Headline

CVE-2022-2590: [PATCH v1] mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW

A race condition was found in the way the Linux kernel’s memory subsystem handled the copy-on-write (COW) breakage of private read-only shared memory mappings. This flaw allows an unprivileged, local user to gain write access to read-only memory mappings, increasing their privileges on the system.

CVE
#google#linux#js#git#perl

From: David Hildenbrand [email protected] To: [email protected] Cc: [email protected], David Hildenbrand [email protected], [email protected], Linus Torvalds [email protected], Andrew Morton [email protected], Greg Kroah-Hartman [email protected], Axel Rasmussen [email protected], Peter Xu [email protected], Hugh Dickins [email protected], Andrea Arcangeli [email protected], Matthew Wilcox [email protected], Vlastimil Babka [email protected], John Hubbard [email protected], Jason Gunthorpe [email protected] Subject: [PATCH v1] mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW Date: Mon, 8 Aug 2022 09:32:32 +0200 [thread overview] Message-ID: [email protected] (raw)

Ever since the Dirty COW (CVE-2016-5195) security issue happened, we know that FOLL_FORCE can be possibly dangerous, especially if there are races that can be exploited by user space.

Right now, it would be sufficient to have some code that sets a PTE of a R/O-mapped shared page dirty, in order for it to erroneously become writable by FOLL_FORCE. The implications of setting a write-protected PTE dirty might not be immediately obvious to everyone.

And in fact ever since commit 9ae0f87d009c (“mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte”), we can use UFFDIO_CONTINUE to map a shmem page R/O while marking the pte dirty. This can be used by unprivileged user space to modify tmpfs/shmem file content even if the user does not have write permissions to the file – Dirty COW restricted to tmpfs/shmem (CVE-2022-2590).

To fix such security issues for good, the insight is that we really only need that fancy retry logic (FOLL_COW) for COW mappings that are not writable (!VM_WRITE). And in a COW mapping, we really only broke COW if we have an exclusive anonymous page mapped. If we have something else mapped, or the mapped anonymous page might be shared (!PageAnonExclusive), we have to trigger a write fault to break COW. If we don’t find an exclusive anonymous page when we retry, we have to trigger COW breaking once again because something intervened.

Let’s move away from this mandatory-retry + dirty handling and rely on our PageAnonExclusive() flag for making a similar decision, to use the same COW logic as in other kernel parts here as well. In case we stumble over a PTE in a COW mapping that does not map an exclusive anonymous page, COW was not properly broken and we have to trigger a fake write-fault to break COW.

Just like we do in can_change_pte_writable() added via commit 64fe24a3e05e (“mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection”) and commit 76aefad628aa ("mm/mprotect: fix soft-dirty check in can_change_pte_writable()"), take care of softdirty and uffd-wp manually.

For example, a write() via /proc/self/mem to a uffd-wp-protected range has to fail instead of silently granting write access and bypassing the userspace fault handler. Note that FOLL_FORCE is not only used for debug access, but also triggered by applications without debug intentions, for example, when pinning pages via RDMA.

This fixes CVE-2022-2590. Note that only x86_64 and aarch64 are affected, because only those support CONFIG_HAVE_ARCH_USERFAULTFD_MINOR.

Fortunately, FOLL_COW is no longer required to handle FOLL_FORCE. So let’s just get rid of it.

Note 1: We don’t check for the PTE being dirty because it doesn’t matter for making a “was COWed” decision anymore, and whoever modifies the page has to set the page dirty either way.

Note 2: Kernels before extended uffd-wp support and before PageAnonExclusive (< 5.19) can simply revert the problematic commit instead and be safe regarding UFFDIO_CONTINUE. A backport to v5.19 requires minor adjustments due to lack of vma_soft_dirty_enabled().

Fixes: 9ae0f87d009c (“mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte”) Cc: [email protected] # 5.16+ Cc: Linus Torvalds [email protected] Cc: Andrew Morton [email protected] Cc: Greg Kroah-Hartman [email protected] Cc: Axel Rasmussen [email protected] Cc: Peter Xu [email protected] Cc: Hugh Dickins [email protected] Cc: Andrea Arcangeli [email protected] Cc: Matthew Wilcox [email protected] Cc: Vlastimil Babka [email protected] Cc: John Hubbard [email protected] Cc: Jason Gunthorpe [email protected] Signed-off-by: David Hildenbrand [email protected]


Against upstream from yesterday instead of v5.19 because I wanted to reference the mprotect commit IDs and can_change_pte_writable(), and I wanted to directly use vma_soft_dirty_enabled().

I have a working reproducer that I’ll post to oss-security in one week. Of course, that reproducer no longer triggers with that commit and my ptrace testing indicated that FOLL_FORCE seems to continue working as expected.


include/linux/mm.h | 1 - mm/gup.c | 62 ++++++++++++++++++++++++++++±---------------- mm/huge_memory.c | 45 ++++++++++++++++±--------------- 3 files changed, 63 insertions(+), 45 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h index 18e01474cf6b…2222ed598112 100644 — a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2885,7 +2885,6 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */ #define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */ #define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */ -#define FOLL_COW 0x4000 /* internal GUP flag */ #define FOLL_ANON 0x8000 /* don’t do file mappings */ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ diff --git a/mm/gup.c b/mm/gup.c index 732825157430…7a0b207f566f 100644 — a/mm/gup.c +++ b/mm/gup.c @@ -478,14 +478,34 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, return -EEXIST; }

-/*

  • * FOLL_FORCE can write to even unwritable pte’s, but only
  • * after we’ve gone through a COW cycle and they are dirty.
  • */ -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags) -{
  • return pte_write(pte) ||
  •   ((flags & FOLL\_FORCE) && (flags & FOLL\_COW) && pte\_dirty(pte));
    

+/* FOLL_FORCE can write to even unwritable PTEs in COW mappings. */ +static inline bool can_follow_write_pte(pte_t pte, struct page *page,

  •               struct vm\_area\_struct \*vma,
    
  •               unsigned int flags)
    

+{

  • if (pte_write(pte))
  •   return true;
    
  • if (!(flags & FOLL_FORCE))
  •   return false;
    
  • /*
  • * See check_vma_flags(): only COW mappings need that special
  • * “force” handling when they lack VM_WRITE.
  • */
  • if (vma->vm_flags & VM_WRITE)
  •   return false;
    
  • VM_BUG_ON(!is_cow_mapping(vma->vm_flags));
  • /*
  • * See can_change_pte_writable(): we broke COW and could map the page
  • * writable if we have an exclusive anonymous page and a write-fault
  • * isn’t require for other reasons.
  • */
  • if (!page || !PageAnon(page) || !PageAnonExclusive(page))
  •   return false;
    
  • if (vma_soft_dirty_enabled(vma) && !pte_soft_dirty(pte))
  •   return false;
    
  • return !userfaultfd_pte_wp(vma, pte); }

static struct page *follow_page_pte(struct vm_area_struct *vma, @@ -528,12 +548,19 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, } if ((flags & FOLL_NUMA) && pte_protnone(pte)) goto no_page; - if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) {

  •   pte\_unmap\_unlock(ptep, ptl);
    
  •   return NULL;
    
  • }

    page = vm_normal_page(vma, address, pte);

  • /*
  • * We only care about anon pages in can_follow_write_pte() and don’t
  • * have to worry about pte_devmap() because they are never anon.
  • */
  • if ((flags & FOLL_WRITE) &&
  •   !can\_follow\_write\_pte(pte, page, vma, flags)) {
    
  •   page = NULL;
    
  •   goto out;
    
  • }
  • if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) { /* * Only return device mapping pages in the FOLL_GET or FOLL_PIN @@ -986,17 +1013,6 @@ static int faultin_page(struct vm_area_struct *vma, return -EBUSY; }

- /*

  • * The VM_FAULT_WRITE bit tells us that do_wp_page has broken COW when
  • * necessary, even if maybe_mkwrite decided not to set pte_write. We
  • * can thus safely do subsequent page lookups as if they were reads.
  • * But only do so when looping for pte_write is futile: in some cases
  • * userspace may also be wanting to write to the gotten user page,
  • * which a read fault here might prevent (a readonly page might get
  • * reCOWed by userspace write).
  • */
  • if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE))
  •   \*flags |= FOLL\_COW;
    
    return 0; }

diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8a7c1b344abe…352b5220e95e 100644 — a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1040,12 +1040,6 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,

assert\_spin\_locked(pmd\_lockptr(mm, pmd));

- /*

  • * When we COW a devmap PMD entry, we split it into PTEs, so we should
  • * not be in this function with `flags & FOLL_COW` set.
  • */
  • WARN_ONCE(flags & FOLL_COW, “mm: In follow_devmap_pmd with FOLL_COW set”);
  • /* FOLL_GET and FOLL_PIN are mutually exclusive. */ if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == (FOLL_PIN | FOLL_GET))) @@ -1395,14 +1389,23 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) return VM_FAULT_FALLBACK; }

-/*

  • * FOLL_FORCE can write to even unwritable pmd’s, but only
  • * after we’ve gone through a COW cycle and they are dirty.
  • */ -static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags) +/* See can_follow_write_pte() on FOLL_FORCE details. */ +static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page,
  •               struct vm\_area\_struct \*vma,
    
  •               unsigned int flags)
    

{ - return pmd_write(pmd) ||

  •      ((flags & FOLL\_FORCE) && (flags & FOLL\_COW) && pmd\_dirty(pmd));
    
  • if (pmd_write(pmd))
  •   return true;
    
  • if (!(flags & FOLL_FORCE))
  •   return false;
    
  • if (vma->vm_flags & VM_WRITE)
  •   return false;
    
  • VM_BUG_ON(!is_cow_mapping(vma->vm_flags));
  • if (!page || !PageAnon(page) || !PageAnonExclusive(page))
  •   return false;
    
  • if (vma_soft_dirty_enabled(vma) && !pmd_soft_dirty(pmd))
  •   return false;
    
  • return !userfaultfd_huge_pmd_wp(vma, pmd); }

struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, @@ -1411,12 +1414,16 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, unsigned int flags) { struct mm_struct *mm = vma->vm_mm; - struct page *page = NULL;

  • struct page *page;

    assert_spin_locked(pmd_lockptr(mm, pmd));

- if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags))

  •   goto out;
    
  • page = pmd_page(*pmd);

  • VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);

  • if ((flags & FOLL_WRITE) &&

  •   !can\_follow\_write\_pmd(\*pmd, page, vma, flags))
    
  •   return NULL;
    

    /* Avoid dumping huge zero page */ if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd)) @@ -1424,10 +1431,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,

    /* Full NUMA hinting faults to serialise migration in fault paths */ if ((flags & FOLL_NUMA) && pmd_protnone(*pmd)) - goto out;

  • page = pmd_page(*pmd);
  • VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
  •   return NULL;
    

    if (!pmd_write(*pmd) && gup_must_unshare(flags, page)) return ERR_PTR(-EMLINK); @@ -1444,7 +1448,6 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page);

-out: return page; }

base-commit: 1612c382ffbdf1f673caec76502b1c00e6d35363

2.35.3

next reply other threads:[~2022-08-08 7:32 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-08-08 7:32 David Hildenbrand [this message] 2022-08-08 16:02 ` David Hildenbrand 2022-08-09 18:27 ` Linus Torvalds 2022-08-09 18:45 ` David Hildenbrand 2022-08-09 18:59 ` Linus Torvalds 2022-08-09 19:07 ` Jason Gunthorpe 2022-08-09 19:21 ` Linus Torvalds 2022-08-09 21:16 ` David Laight 2022-08-11 7:13 ` [PATCH] sched/all: Change BUG_ON() instances to WARN_ON() Ingo Molnar 2022-08-11 20:43 ` Linus Torvalds 2022-08-11 21:28 ` Matthew Wilcox 2022-08-11 23:22 ` Jason Gunthorpe 2022-08-14 1:10 ` John Hubbard 2022-08-12 9:29 ` [PATCH v2] sched/all: Change all BUG_ON() instances in the scheduler to WARN_ON_ONCE() Ingo Molnar [not found] ` <[email protected]> 2022-08-15 22:12 ` John Hubbard 2022-08-21 11:28 ` Ingo Molnar 2022-08-09 18:40 ` [PATCH v1] mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW Linus Torvalds 2022-08-09 18:48 ` Jason Gunthorpe 2022-08-09 18:53 ` David Hildenbrand 2022-08-09 19:07 ` Linus Torvalds 2022-08-09 19:20 ` David Hildenbrand 2022-08-09 18:48 ` Linus Torvalds 2022-08-09 19:09 ` David Hildenbrand 2022-08-09 20:00 ` Linus Torvalds 2022-08-09 20:06 ` David Hildenbrand 2022-08-09 20:07 ` David Hildenbrand 2022-08-09 20:14 ` Linus Torvalds 2022-08-09 20:20 ` David Hildenbrand 2022-08-09 20:30 ` Linus Torvalds 2022-08-09 20:38 ` Linus Torvalds 2022-08-09 20:42 ` David Hildenbrand 2022-08-09 20:20 ` Linus Torvalds 2022-08-09 20:23 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email using any one of the following methods:

* Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox

Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the –to, –cc, and –in-reply-to switches of git-send-email(1):

git send-email \ –[email protected] \ –[email protected] \ –[email protected] \ –[email protected] \ –[email protected] \ –[email protected] \ –[email protected] \ –[email protected] \ –[email protected] \ –[email protected] \ –[email protected] \ –[email protected] \ –[email protected] \ –[email protected] \ –[email protected] \ –[email protected] \ –subject=’Re: [PATCH v1] mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW’ \ /path/to/YOUR_REPLY

https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).

Related news

DiCal-RED 4009 Outdated Third Party Components

DiCal-RED version 4009 makes use of unmaintained third party components with their own vulnerabilities.

RedJuliett Cyber Espionage Campaign Hits 75 Taiwanese Organizations

A likely China-linked state-sponsored threat actor has been linked to a cyber espionage campaign targeting government, academic, technology, and diplomatic organizations in Taiwan between November 2023 and April 2024. Recorded Future's Insikt Group is tracking the activity under the name RedJuliett, describing it as a cluster that operates Fuzhou, China, to support Beijing's intelligence

Ubuntu Security Notice USN-6071-1

Ubuntu Security Notice 6071-1 - It was discovered that the Traffic-Control Index implementation in the Linux kernel did not properly perform filter deactivation in some situations. A local attacker could possibly use this to gain elevated privileges. Please note that with the fix for this CVE, kernel support for the TCINDEX classifier has been removed. Lin Ma discovered a race condition in the io_uring subsystem in the Linux kernel, leading to a null pointer dereference vulnerability. A local attacker could use this to cause a denial of service.

CVE-2022-2590: security - CVE-2022-2590: Linux kernel: Modifying shmem/tmpfs files without write permissions

A race condition was found in the way the Linux kernel's memory subsystem handled the copy-on-write (COW) breakage of private read-only shared memory mappings. This flaw allows an unprivileged, local user to gain write access to read-only memory mappings, increasing their privileges on the system.

CVE-2022-26482: Security Center

An issue was discovered in Poly EagleEye Director II before 2.2.2.1. os.system command injection can be achieved by an admin.

CVE-2022-29855: Security Advisories

Mitel 6800 and 6900 Series SIP phone devices through 2022-04-27 have "undocumented functionality." A vulnerability in Mitel 6800 Series and 6900 Series SIP phones excluding 6970, versions 5.1 SP8 (5.1.0.8016) and earlier, and 6.0 (6.0.0.368) through 6.1 HF4 (6.1.0.165), could allow a unauthenticated attacker with physical access to the phone to gain root access due to insufficient access control for test functionality during system startup. A successful exploit could allow access to sensitive information and code execution.

CVE-2022-29855: Security Advisories

Mitel 6800 and 6900 Series SIP phone devices through 2022-04-27 have "undocumented functionality." A vulnerability in Mitel 6800 Series and 6900 Series SIP phones excluding 6970, versions 5.1 SP8 (5.1.0.8016) and earlier, and 6.0 (6.0.0.368) through 6.1 HF4 (6.1.0.165), could allow a unauthenticated attacker with physical access to the phone to gain root access due to insufficient access control for test functionality during system startup. A successful exploit could allow access to sensitive information and code execution.

CVE-2018-3064: CPU July 2018

Vulnerability in the MySQL Server component of Oracle MySQL (subcomponent: InnoDB). Supported versions that are affected are 5.6.40 and prior, 5.7.22 and prior and 8.0.11 and prior. Easily exploitable vulnerability allows low privileged attacker with network access via multiple protocols to compromise MySQL Server. Successful attacks of this vulnerability can result in unauthorized ability to cause a hang or frequently repeatable crash (complete DOS) of MySQL Server as well as unauthorized update, insert or delete access to some of MySQL Server accessible data. CVSS 3.0 Base Score 7.1 (Integrity and Availability impacts). CVSS Vector: (CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:L/A:H).

CVE-2016-5195

Race condition in mm/gup.c in the Linux kernel 2.x through 4.x before 4.8.3 allows local users to gain privileges by leveraging incorrect handling of a copy-on-write (COW) feature to write to a read-only memory mapping, as exploited in the wild in October 2016, aka "Dirty COW."

CVE: Latest News

CVE-2023-50976: Transactions API Authorization by oleiman · Pull Request #14969 · redpanda-data/redpanda
CVE-2023-6905
CVE-2023-6903
CVE-2023-6904
CVE-2023-3907