Headline
CVE-2023-49298: dnode_is_dirty: check dnode and its data for dirtiness by robn · Pull Request #15571 · openzfs/zfs
OpenZFS through 2.1.13 and 2.2.x through 2.2.1, in certain scenarios involving applications that try to rely on efficient copying of file data, can replace file contents with zero-valued bytes and thus potentially disable security mechanisms. NOTE: this issue is not always security related, but can be security related in realistic situations. A possible example is cp, from a recent GNU Core Utilities (coreutils) version, when attempting to preserve a rule set for denying unauthorized access. (One might use cp when configuring access control, such as with the /etc/hosts.deny file specified in the IBM Support reference.) NOTE: this issue occurs less often in version 2.2.1, and in versions before 2.1.4, because of the default configuration in those versions.
Motivation and Context
Closes #15526.
Description
Over its history this the dirty dnode test has been changed between checking for a dnodes being on os_dirty_dnodes (dn_dirty_link) and dn_dirty_record.
de198f2 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
2531ce3 Revert “Report holes when there are only metadata changes”
ec4f9b8 Report holes when there are only metadata changes
454365b Fix dirty check in dmu_offset_next()
66aca24 SEEK_HOLE should not block on txg_wait_synced()
Also illumos/illumos-gate@c543ec060d illumos/illumos-gate@2bcf0248e9
It turns out both are actually required.
In the case of appending data to a newly created file, the dnode proper is dirtied (at least to change the blocksize) and dirty records are added. Thus, a single logical operation is represented by separate dirty indicators, and must not be separated.
The incorrect dirty check becomes a problem when the first block of a file is being appended to while another process is calling lseek to skip holes. It can happen that the dnode part is written out and undirtied first, while dirty records are still on the dnode. In this case, lseek(fd, 0, SEEK_DATA) would not know that the file is dirty, and would go to dnode_next_offset(). Since the object has no data blocks yet, it returns ESRCH, indicating no data found, which results in ENXIO being returned to lseek()'s caller.
Since coreutils 9.2, cp performs sparse copies by default, that is, it uses SEEK_DATA and SEEK_HOLE against the source file and attempts to replicate the holes in the target. When it hits the bug, its initial search for data fails, and it goes on to call fallocate() to create a hole over the entire destination file.
This has come up more recently as users upgrade their systems, getting OpenZFS 2.2 as well as a newer coreutils. However, this problem has been reproduced against 2.1, as well as on FreeBSD 13 and 14.
This change simply updates the dirty check to check both types of dirty. If there’s anything dirty at all, we immediately go to the “wait for sync” stage, It doesn’t really matter after that; both changes are on disk, so the dirty fields should be correct.
How Has This Been Tested?
@tonyhutter produced a repro script in #15526 which has been extremely useful. Its not perfect, but it can usually trip the issue in a couple of minutes. With the patch in place, no one has been able to trigger the issue. @rincebrain has been driving it reasonably hard (I think), no hits.
Full test suite run has passed.
I’ve done some general sanity and stress tests on customer workloads. They don’t exercise the bug, but they all did fine, so this maybe hasn’t broken anything.
I’d like to write a test for this case, since its bitten a few times in the past, but it requires lseek() to be called after the dnode proper being undirtied but before the dbufs are undirtied. I don’t have that kind of control from outside. Ideas welcome.
Types of changes
- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
- Performance enhancement (non-breaking change which improves efficiency)
- Code cleanup (non-breaking change which makes code smaller or more readable)
- Breaking change (fix or feature that would cause existing functionality to change)
- Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
- Documentation (a change to man pages or other documentation)
Checklist:
- My code follows the OpenZFS code style requirements.
- I have updated the documentation accordingly.
- I have read the contributing document.
- I have added tests to cover my changes.
- I have run the ZFS Test Suite with this change applied.
- All commit messages are properly formatted and contain Signed-off-by.