Headline
CVE-2023-31439: Releases · systemd/systemd
An issue was discovered in systemd 253. An attacker can modify the contents of past events in a sealed log file and then adjust the file such that checking the integrity shows no error, despite modifications.
systemd v253
systemd System and Service Manager****CHANGES WITH 253:****Announcements of Future Feature Removals and Incompatible Changes:
* We intend to remove cgroup v1 support from systemd release after the
end of 2023. If you run services that make explicit use of cgroup v1
features (i.e. the "legacy hierarchy" with separate hierarchies for
each controller), please implement compatibility with cgroup v2 (i.e.
the "unified hierarchy") sooner rather than later. Most of Linux
userspace has been ported over already.
* We intend to remove support for split-usr (/usr mounted separately
during boot) and unmerged-usr (parallel directories /bin and
/usr/bin, /lib and /usr/lib, etc). This will happen in the second
half of 2023, in the first release that falls into that time window.
For more details, see:
https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html
* We intend to change behaviour w.r.t. units of the per-user service
manager and sandboxing options, so that they work without having to
manually enable PrivateUsers= as well, which is not required for
system units. To make this work, we will implicitly enable user
namespaces (PrivateUsers=yes) when a sandboxing option is enabled in a
user unit. The drawback is that system users will no longer be visible
(and appear as 'nobody') to the user unit when a sandboxing option is
enabled. By definition a sandboxed user unit should run with reduced
privileges, so impact should be small. This will remove a great source
of confusion that has been reported by users over the years, due to
how these options require an extra setting to be manually enabled when
used in the per-user service manager, as opposed as to the system
service manager. We plan to enable this change in the next release
later this year. For more details, see:
https://lists.freedesktop.org/archives/systemd-devel/2022-December/048682.html
Deprecations and incompatible changes:
* systemctl will now warn when invoked without /proc/ mounted
(e.g. when invoked after chroot() into an directory tree without the
API mount points like /proc/ being set up.) Operation in such an
environment is not fully supported.
* The return value of 'systemctl is-active|is-enabled|is-failed' for
unknown units is changed: previously 1 or 3 were returned, but now 4
(EXIT_PROGRAM_OR_SERVICES_STATUS_UNKNOWN) is used as documented.
* 'udevadm hwdb' subcommand is deprecated and will emit a warning.
systemd-hwdb (added in 2014) should be used instead.
* 'bootctl --json' now outputs a single JSON array, instead of a stream
of newline-separated JSON objects.
* Udev rules in 60-evdev.rules have been changed to load hwdb
properties for all modalias patterns. Previously only the first
matching pattern was used. This could change what properties are
assigned if the user has more and less specific patterns that could
match the same device, but it is expected that the change will have
no effect for most users.
* systemd-networkd-wait-online exits successfully when all interfaces
are ready or unmanaged. Previously, if neither '--any' nor
'--interface=' options were used, at least one interface had to be in
configured state. This change allows the case where systemd-networkd
is enabled, but no interfaces are configured, to be handled
gracefully. It may occur in particular when a different network
manager is also enabled and used.
* Some compatibility helpers were dropped: EmergencyAction= in the user
manager, as well as measuring kernel command line into PCR 8 in
systemd-stub, along with the -Defi-tpm-pcr-compat compile-time
option.
* The '-Dupdate-helper-user-timeout=' build-time option has been
renamed to '-Dupdate-helper-user-timeout-sec=', and now takes an
integer as parameter instead of a string.
* The DDI image dissection logic (which backs RootImage= in service
unit files, the --image= switch in various tools such as
systemd-nspawn, as well as systemd-dissect) will now only mount file
systems of types btrfs, ext4, xfs, erofs, squashfs, vfat. This list
can be overridden via the $SYSTEMD_DISSECT_FILE_SYSTEMS environment
variable. These file systems are fairly well supported and maintained
in current kernels, while others are usually more niche, exotic or
legacy and thus typically do not receive the same level of security
support and fixes.
* The default per-link multicast DNS mode is changed to "yes"
(that was previously "no"). As the default global multicast DNS mode
has been "yes" (but can be changed by the build option), now the
multicast DNS is enabled on all links by default. You can disable the
multicast DNS on all links by setting MulticastDNS= in resolved.conf,
or on an interface by calling "resolvectl mdns INTERFACE no".
New components:
* A tool 'ukify' tool to build, measure, and sign Unified Kernel Images
(UKIs) has been added. This replaces functionality provided by
'dracut --uefi' and extends it with automatic calculation of PE file
offsets, insertion of signed PCR policies generated by
systemd-measure, support for initrd concatenation, signing of the
embedded Linux image and the combined image with sbsign, and
heuristics to autodetect the kernel uname and verify the splash
image.
Changes in systemd and units:
* A new service type Type=notify-reload is defined. When such a unit is
reloaded a UNIX process signal (typically SIGHUP) is sent to the main
service process. The manager will then wait until it receives a
"RELOADING=1" followed by a "READY=1" notification from the unit as
response (via sd_notify()). Otherwise, this type is the same as
Type=notify. A new setting ReloadSignal= may be used to change the
signal to send from the default of SIGHUP.
[email protected], systemd-networkd.service, systemd-udevd.service, and
systemd-logind have been updated to this type.
* Initrd environments which are not on a pure memory file system (e.g.
overlayfs combination as opposed to tmpfs) are now supported. With
this change, during the initrd → host transition ("switch root")
systemd will erase all files of the initrd only when the initrd is
backed by a memory file system such as tmpfs.
* New per-unit MemoryZSwapMax= option has been added to configure
memory.zswap.max cgroup properties (the maximum amount of zswap
used).
* A new LogFilterPatterns= option has been added for units. It may be
used to specify accept/deny regular expressions for log messages
generated by the unit, that shall be enforced by systemd-journald.
Rejected messages are neither stored in the journal nor forwarded.
This option may be used to suppress noisy or uninteresting messages
from units.
* The manager has a new
org.freedesktop.systemd1.Manager.GetUnitByPIDFD() D-Bus method to
query process ownership via a PIDFD, which is more resilient against
PID recycling issues.
* Scope units now support OOMPolicy=. Login session scopes default to
OOMPolicy=continue, allowing login scopes to survive the OOM killer
terminating some processes in the scope.
* systemd-fstab-generator now supports x-systemd.makefs option for
/sysroot/ (in the initrd).
* The maximum rate at which daemon reloads are executed can now be
limited with the new ReloadLimitIntervalSec=/ReloadLimitBurst=
options. (Or the equivalent on the kernel command line:
systemd.reload_limit_interval_sec=/systemd.reload_limit_burst=). In
addition, systemd now logs the originating unit and PID when a reload
request is received over D-Bus.
* When enabling a swap device systemd will now reinitialize the device
when the page size of the swap space does not match the page size of
the running kernel. Note that this requires the 'swapon' utility to
provide the '--fixpgsz' option, as implemented by util-linux, and it
is not supported by busybox at the time of writing.
* systemd now executes generator programs in a mount namespace
"sandbox" with most of the file system read-only and write access
restricted to the output directories, and with a temporary /tmp/
mount provided. This provides a safeguard against programming errors
in the generators, but also fixes here-docs in shells, which
previously didn't work in early boot when /tmp/ wasn't available
yet. (This feature has no security implications, because the code is
still privileged and can trivially exit the sandbox.)
* The system manager will now parse a new "vmm.notify_socket"
system credential, which may be supplied to a VM via SMBIOS. If
found, the manager will send a "READY=1" notification on the
specified socket after boot is comple...
systemd v253-rc3
systemd System and Service Manager****CHANGES WITH 253 in spe:****Announcements of Future Feature Removals and Incompatible Changes:
* We intend to remove cgroup v1 support from systemd release after the
end of 2023. If you run services that make explicit use of cgroup v1
features (i.e. the "legacy hierarchy" with separate hierarchies for
each controller), please implement compatibility with cgroup v2 (i.e.
the "unified hierarchy") sooner rather than later. Most of Linux
userspace has been ported over already.
* We intend to remove support for split-usr (/usr mounted separately
during boot) and unmerged-usr (parallel directories /bin and
/usr/bin, /lib and /usr/lib, etc). This will happen in the second
half of 2023, in the first release that falls into that time window.
For more details, see:
https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html
* We intend to change behaviour w.r.t. units of the per-user service
manager and sandboxing options, so that they work without having to
manually enable PrivateUsers= as well, which is not required for
system units. To make this work, we will implicitly enable user
namespaces (PrivateUsers=yes) when a sandboxing option is enabled in a
user unit. The drawback is that system users will no longer be visible
(and appear as 'nobody') to the user unit when a sandboxing option is
enabled. By definition a sandboxed user unit should run with reduced
privileges, so impact should be small. This will remove a great source
of confusion that has been reported by users over the years, due to
how these options require an extra setting to be manually enabled when
used in the per-user service manager, as opposed as to the system
service manager. We plan to enable this change in the next release
later this year. For more details, see:
https://lists.freedesktop.org/archives/systemd-devel/2022-December/048682.html
Deprecations and incompatible changes:
* systemctl will now warn when invoked without /proc/ mounted
(e.g. when invoked after chroot() into an directory tree without the
API mount points like /proc/ being set up.) Operation in such an
environment is not fully supported.
* The return value of 'systemctl is-active|is-enabled|is-failed' for
unknown units is changed: previously 1 or 3 were returned, but now 4
(EXIT_PROGRAM_OR_SERVICES_STATUS_UNKNOWN) is used as documented.
* 'udevadm hwdb' subcommand is deprecated and will emit a warning.
systemd-hwdb (added in 2014) should be used instead.
* 'bootctl --json' now outputs a single JSON array, instead of a stream
of newline-separated JSON objects.
* Udev rules in 60-evdev.rules have been changed to load hwdb
properties for all modalias patterns. Previously only the first
matching pattern was used. This could change what properties are
assigned if the user has more and less specific patterns that could
match the same device, but it is expected that the change will have
no effect for most users.
* systemd-networkd-wait-online exits successfully when all interfaces
are ready or unmanaged. Previously, if neither '--any' nor
'--interface=' options were used, at least one interface had to be in
configured state. This change allows the case where systemd-networkd
is enabled, but no interfaces are configured, to be handled
gracefully. It may occur in particular when a different network
manager is also enabled and used.
* Some compatibility helpers were dropped: EmergencyAction= in the user
manager, as well as measuring kernel command line into PCR 8 in
systemd-stub, along with the -Defi-tpm-pcr-compat compile-time
option.
* The '-Dupdate-helper-user-timeout=' build-time option has been
renamed to '-Dupdate-helper-user-timeout-sec=', and now takes an
integer as parameter instead of a string.
* The DDI image dissection logic (which backs RootImage= in service
unit files, the --image= switch in various tools such as
systemd-nspawn, as well as systemd-dissect) will now only mount file
systems of types btrfs, ext4, xfs, erofs, squashfs, vfat. This list
can be overridden via the $SYSTEMD_DISSECT_FILE_SYSTEMS environment
variable. These file systems are fairly well supported and maintained
in current kernels, while others are usually more niche, exotic or
legacy and thus typically do not receive the same level of security
support and fixes.
New components:
* A tool 'ukify' tool to build, measure, and sign Unified Kernel Images
(UKIs) has been added. This replaces functionality provided by
'dracut --uefi' and extends it with automatic calculation of PE file
offsets, insertion of signed PCR policies generated by
systemd-measure, support for initrd concatenation, signing of the
embedded Linux image and the combined image with sbsign, and
heuristics to autodetect the kernel uname and verify the splash
image.
Changes in systemd and units:
* A new service type Type=notify-reload is defined. When such a unit is
reloaded a UNIX process signal (typically SIGHUP) is sent to the main
service process. The manager will then wait until it receives a
"RELOADING=1" followed by a "READY=1" notification from the unit as
response (via sd_notify()). Otherwise, this type is the same as
Type=notify. A new setting ReloadSignal= may be used to change the
signal to send from the default of SIGHUP.
[email protected], systemd-networkd.service, systemd-udevd.service, and
systemd-logind have been updated to this type.
* Initrd environments which are not on a pure memory file system (e.g.
overlayfs combination as opposed to tmpfs) are now supported. With
this change, during the initrd → host transition ("switch root")
systemd will erase all files of the initrd only when the initrd is
backed by a memory file system such as tmpfs.
* New per-unit MemoryZSwapMax= option has been added to configure
memory.zswap.max cgroup properties (the maximum amount of zswap
used).
* A new LogFilterPatterns= option has been added for units. It may be
used to specify accept/deny regular expressions for log messages
generated by the unit, that shall be enforced by systemd-journald.
Rejected messages are neither stored in the journal nor forwarded.
This option may be used to suppress noisy or uninteresting messages
from units.
* The manager has a new
org.freedesktop.systemd1.Manager.GetUnitByPIDFD() D-Bus method to
query process ownership via a PIDFD, which is more resilient against
PID recycling issues.
* Scope units now support OOMPolicy=. Login session scopes default to
OOMPolicy=continue, allowing login scopes to survive the OOM killer
terminating some processes in the scope.
* systemd-fstab-generator now supports x-systemd.makefs option for
/sysroot/ (in the initrd).
* The maximum rate at which daemon reloads are executed can now be
limited with the new ReloadLimitIntervalSec=/ReloadLimitBurst=
options. (Or the equivalent on the kernel command line:
systemd.reload_limit_interval_sec=/systemd.reload_limit_burst=). In
addition, systemd now logs the originating unit and PID when a reload
request is received over D-Bus.
* When enabling a swap device systemd will now reinitialize the device
when the page size of the swap space does not match the page size of
the running kernel. Note that this requires the 'swapon' utility to
provide the '--fixpgsz' option, as implemented by util-linux, and it
is not supported by busybox at the time of writing.
* systemd now executes generator programs in a mount namespace
"sandbox" with most of the file system read-only and write access
restricted to the output directories, and with a temporary /tmp/
mount provided. This provides a safeguard against programming errors
in the generators, but also fixes here-docs in shells, which
previously didn't work in early boot when /tmp/ wasn't available
yet. (This feature has no security implications, because the code is
still privileged and can trivially exit the sandbox.)
* The system manager manager will now parse a new "vmm.notify_socket"
system credential, which may be supplied to a VM via SMBIOS. If
found, the manager will send a "READY=1" notification on the
specified socket after boot is complete. This allows readiness
notification to be sent from a VM guest to the VM host over a VSOCK
socket.
* The sample PAM configuration file for [email protected] now
includes a call to pam_namespace. This puts children of [email protected]
in the expected namespace. (Many distributions replace their file
with something custom, so this change has limited effect.)
* A new e...
systemd v253-rc2
systemd System and Service Manager****CHANGES WITH 253 in spe:****Deprecations and incompatible changes:
* systemctl will now warn when invoked without /proc/ mounted
(e.g. when invoked after chroot() into an directory tree without the
API mount points like /proc/ being set up.) Operation in such an
environment is not fully supported.
* The return value of 'systemctl is-active|is-enabled|is-failed' for
unknown units is changed: previously 1 or 3 were returned, but now 4
(EXIT_PROGRAM_OR_SERVICES_STATUS_UNKNOWN) is used as documented.
* 'udevadm hwdb' subcommand is deprecated and will emit a warning.
systemd-hwdb (added in 2014) should be used instead.
* 'bootctl --json' now outputs a single JSON array, instead of a stream
of newline-separated JSON objects.
* Udev rules in 60-evdev.rules have been changed to load hwdb
properties for all modalias patterns. Previously only the first
matching pattern was used. This could change what properties are
assigned if the user has more and less specific patterns that could
match the same device, but it is expected that the change will have
no effect for most users.
* systemd-networkd-wait-online exits successfully when all interfaces
are ready or unmanaged. Previously, if neither '--any' nor
'--interface=' options were used, at least one interface had to be in
configured state. This change allows the case, where systemd-networkd
is enabled but no interfaces are configured, to be handled
gracefully. It may occur in particular when a different network
manager is also enabled and used.
* Some compatibility helpers were dropped: EmergencyAction= in the user
manager, as well as measuring kernel command line into PCR 8 in
systemd-stub, along with the -Defi-tpm-pcr-compat compile-time
option.
* The '-Dupdate-helper-user-timeout=' build-time option has been
renamed to '-Dupdate-helper-user-timeout-sec=', and now takes an
integer as parameter instead of a string.
* The DDI image dissection logic (which backs RootImage= in service
unit files, the --image= switch in various tools such as
systemd-nspawn, as well as systemd-dissect) will now only mount file
systems of types btrfs, ext4, xfs, erofs, squashfs, vfat. This list
can be overridden via the $SYSTEMD_DISSECT_FILE_SYSTEMS environment
variable. These file systems are fairly well supported and maintained
in current kernels, while others are usually more niche, exotic or
legacy and thus typically do not receive the same level of security
support and fixes.
New components:
* A tool 'ukify' tool to build, measure, and sign Unified Kernel Images
(UKIs) has been added. This replaces functionality provided by
'dracut --uefi' and extends it with automatic calculation of PE file
offsets, insertion of signed PCR policies generated by
systemd-measure, support for initrd concatenation, signing of the
embedded Linux image and the combined image with sbsign, and
heuristics to autodetect the kernel uname and verify the splash
image.
Changes in systemd and units:
* A new service type Type=notify-reload is defined. When such a unit is
reloaded a UNIX process signal (typically SIGHUP) is sent to the main
service process. The manager will then wait until it receives a
"RELOADING=1" followed by a "READY=1" notification from the unit as
response (via sd_notify()). Otherwise, this type is the same as
Type=notify. A new setting ReloadSignal= may be used to change the
signal to send from the default of SIGHUP.
[email protected], systemd-networkd.service, systemd-udevd.service, and
systemd-logind have been updated to this type.
* Initrd environments which are not on a pure memory file system (e.g.
overlayfs combination as opposed to tmpfs) are now supported. With
this change, during the initrd → host transition ("switch root")
systemd will no longer erase all files of the initrd unless it's
backed by a memory file system such as tmpfs.
* New per-unit MemoryZSwapMax= option has been added to configure
memory.zswap.max cgroup properties (the maximum amount of zswap
used).
* A new LogFilterPatterns= option has been added for units. It may be
used to specify accept/deny regular expressions for log messages
generated by the unit, that shall be enforced by systemd-journald.
Rejected messages are neither stored in the journal nor forwarded.
This option may be used to suppress noisy or uninteresting messages
from units.
* The manager has a new
org.freedesktop.systemd1.Manager.GetUnitByPIDFD() D-Bus method to
query process ownership via a PIDFD, which is more resilient against
PID recycling issues.
* Scope units now support OOMPolicy=. Login session scopes default to
OOMPolicy=continue, allowing login scopes to survive the OOM killer
terminating some processes in the scope.
* systemd-fstab-generator now supports x-systemd.makefs option for
/sysroot/ (in the initrd).
* The maximum rate at which daemon reloads are executed can now be
limited with the new ReloadLimitIntervalSec=/ReloadLimitBurst=
options. (Or the equivalent on the kernel command line:
systemd.reload_limit_interval_sec=/systemd.reload_limit_burst=). In
addition, systemd now logs the originating unit and PID when a reload
request is received over D-Bus.
* When enabling a swap device systemd will now reinitialize the device
when the page size of the swap space does not match the page size of
the running kernel.
* systemd now executes generator programs in a mount namespace
"sandbox" with most of the file system read-only and write access
restricted to the output directories, and with a temporary /tmp/
mount provided. This provides a safeguard against programming errors
in the generators, but also fixes here-docs in shells, which
previously didn't work in early boot when /tmp/ wasn't available
yet. (This feature has no security implications, because the code is
still privileged and can trivially exit the sandbox.)
* The system manager manager will now parse a new "vmm.notify_socket"
system credential, which may be supplied to a VM via SMBIOS. If
found, it will send a "READY=1" notification on the specified socket
after boot is complete. This allows readiness notification to be sent
from a VM guest to the VM host over a VSOCK socket.
* The sample PAM configuration file for [email protected] now
includes a call to pam_namespace. This puts children of [email protected]
in the expected namespace. (Many distributions replace their file
with something custom, so this change has limited effect.)
* A new environment variable $SYSTEMD_DEFAULT_MOUNT_RATE_LIMIT_BURST
can can be used to override the mount units burst late limit for
parsing '/proc/self/mountinfo', which was introduced in
v249. Defaults to 5.
* Drop-ins for init.scope changing control group resource limits are
now applied, while they were previously ignored.
* New build-time configuration options '-Ddefault-timeout-sec=' and
'-Ddefault-user-timeout-sec=' have been added, to let distributions
choose the default timeout for starting/stopping/aborting system and
user units respectively.
* Service units gained a new setting OpenFile= which may be used to
open arbitrary files in the file system (or connect to arbitrary
AF_UNIX sockets in the file system), and pass the open file
descriptor to the invoked process via the usual file descriptor
passing protocol. This is useful to give unprivileged services access
to select files which have restrictive access modes that would
normally not allow this. It's also useful in case RootDirectory= or
RootImage= is used to allow access to files from the host environment
(which is after all not visible from the service if these two options
are used.)
Changes in udev:
* The new net naming scheme "v253" has been introduced. In the new
scheme, ID_NET_NAME_PATH is also set for USB devices not connected via
a PCI bus. This extends the coverage of predictable interface names
in some embedded systems.
The "amba" bus path is now included in ID_NET_NAME_PATH, resulting in
a more informative path on some embedded systems.
* Partition block devices will now also get symlinks in
/dev/disk/by-diskseq/<seq>-part<n>, which may be used to reference
block device nodes via the kernel's "diskseq" value. Previously those
symlinks were only created for the main block device.
* A new operator '-=' is supported for SYMLINK variables. This allows
symlinks to be unconfigured even if an earlier rule added them.
* 'udevadm --trigger --settle' now also works for network devices
...
systemd v253-rc1
systemd System and Service Manager****CHANGES WITH 253 in spe:****Deprecations and incompatible changes
* systemctl will now warn when invoked without /proc mounted (e.g. when
invoked after chroot into an image without the API mount points like
/proc being set up.) Operation in such an environment is not fully
supported.
* The return value of 'systemctl is-active|is-enabled|is-failed' for
unknown units is changed: previously 1 or 3 were returned, but now 4
(EXIT_PROGRAM_OR_SERVICES_STATUS_UNKNOWN) is used as documented.
* 'udevadm hwdb' subcommand is deprecated and will emit a warning.
systemd-hwdb (added in 2014) should be used instead.
* 'bootctl --json' now outputs well-formed JSON, instead of a stream
of newline-separated JSON objects.
* Udev rules in 60-evdev.rules have been changed to load hwdb properties
for all modalias patterns. Previously only the first matching pattern
was used. This could change what properties are assigned if the user
has more and less specific patterns that could match the same device,
but it is expected that the change will have no effect for most users.
* systemd-networkd-wait-online exits successfully when all interfaces
are ready or unmanaged. Previously, if neither '--any' nor
'--interface=' options were used, at least one interface had to be in
configured state. This change allows the case, where systemd-networkd
is enabled but no interfaces are configured, to be handled
gracefully. It may occur in particular when a different network
manager is also enabled and used.
* Some compatibility helpers were dropped: EmergencyAction= in the user
manager, measuring kernel command line into PCR 8 along with the
-Defi-tpm-pcr-compat compile-time option.
New components:
* A tool 'ukify' tool to build, measure, and sign Unified Kernel Images
(UKIs) has been added. This replaces functionality provided by
'dracut --uefi' and extends it with automatic calculation of offsets,
insertion of signed PCR policies generated by systemd-measure,
support for initrd concatenation, signing of the embedded Linux image
and the combined image with sbsign, and heuristics to autodetect the
kernel uname and verify the splash image.
Changes in systemd and units:
* A new unit type Type=notify-reload is defined. When such a unit is
reloaded via a signal, the manager will wait until it receives a
"READY=1" notification from the unit. Otherwise, this type is the
same as Type=notify.
[email protected], systemd-networkd.service, systemd-udevd.service, and
systemd-logind have been updated to this type; their reloads are now
synchronuous.
* Initrd environments which are not on a temporary file system (for
example an overlayfs combination) are now supported. Systemd will only
skip removal of the files in the initrd if it doesn't detect a
temporary file system.
* New MemoryZSwapMax= option has been added to configure
memory.zswap.max cgroup properties (the maximum amount of zswap used).
* New LogFilterPatterns= option can be used to specify regexp
accept/deny patterns for log entries generated by the unit. Based on
the option value, the manager sets the
user.journald_log_filter_patterns extended attribute on the unit
cgroup. systemd-journald checks for this attribute when receiving
messages, and will filter messages by matching the MESSAGE= part.
Rejected messages are neither stored in the journal nor forwarded.
This option can be used to filter noisy or uninteresting messages
from units.
* The manager has a new
org.freedesktop.systemd1.Manager.GetUnitByPIDFD() method to query
process ownership via a PIDFD, which is more resilient against PID
recycling issues.
* Scope units now support OOMPolicy=. Login session scopes default to
OOMPolicy=continue, allowing login scopes to survive the OOM killer
terminating some processes in the scope.
* systemd-fstab-generator now supports x-systemd.makefs option for
/sysroot (in the initrd).
* The maximum rate at which daemon reloads are executed can now be
limited with the new ReloadLimitIntervalSec=/ReloadLimitBurst=
options. (Or the equivalent on the kernel command line:
systemd.reload_limit_interval_sec=/systemd.reload_limit_burst=).
In addition, systemd now logs the originating unit and PID when
a reload request is received over D-Bus.
* When enabling a swap device, instead of failing, systemd will now
reinitialize the device when the page size of the swap space does not
match the page size of the running kernel.
* Systemd now executes generators in a mount namespace "sandbox" with
most of the file system read-only, but with write access to the
output directories, and with a temporary /tmp/ mount provided. This
provides a safeguard against programming errors in the generators,
but also fixes here-docs in shells, which previously didn't work in
early boot when /tmp/ wasn't available yet. (This feature has no
security implications, because the code is still privileged and can
trivially exit the sandbox.)
* The manager will load the vmm.notify_socket credential. If found,
it will send a "READY=1" notification on the specified socket after
boot is complete. This allows readiness notification to be sent
from a VM guest to the host over a VSOCK socket.
* The sample PAM configuration file for [email protected] now
includes a call to pam_namespace. This puts children of [email protected]
in the expected namespace. (Many distributions replace their file
with something custom, so this change has limited effect.)
* A new environment variable $SYSTEMD_DEFAULT_MOUNT_RATE_LIMIT_BURST can
can be used to override the mount units burst late limit for parsing
'/proc/self/mountinfo', which was introduced in v249. Defaults to 5.
* Drop-ins for init.scope changing control cgroup resource limits are
now applied, while they were previously ignored.
Changes in udev:
* The new net naming scheme "v253" has been introduced. In the new
scheme, ID_NET_NAME_PATH is also set for USB devices not connected via
a PCI bus. This extends the coverage of predictable interface names
in some embedded systems.
The "amba" bus path is now included in ID_NET_NAME_PATH, resulting in
a more informative path on some embedded systems.
* Block partitions will now also get symlinks in
/dev/disk/by-diskseq/<seq>-part<n>, which may be used to reference
block device nodes via the kernel's "diskseq" value. Previously those
symlinks were only created for the main block device.
* A new operator '-=' is supported for SYMLINK variables. This allows
symlinks to be unconfigured even if an earlier rule added them.
* 'udevadm --trigger --settle' now also works for network devices
that are being renamed.
Changes in sd-boot, bootctl, and the Boot Loader Specification:
* systemd-boot now passes its random seed directly to the kernel's RNG
via the LINUX_EFI_RANDOM_SEED_TABLE_GUID configuration table, which
means the RNG gets seeded very early in boot before userspace has
started.
* systemd-boot will pass a random seed when secure boot is enabled if
it can additionally get a random seed from EFI itself, via EFI's RNG
protocol or a prior seed in LINUX_EFI_RANDOM_SEED_TABLE_GUID from a
preceding bootloader.
* systemd-boot-system-token.service was renamed to
systemd-boot-random-seed.service and extended to always save the
random seed to ESP on every boot when a compatible boot loader is
used. This allows a refreshed random seed to be used in the boot
loader.
* systemd-boot handles various seed inputs using a domain- and
field-separated hashing scheme.
* systemd-boot's 'random-seed-mode' option has been removed. A system
token is now always required to be present for random seeds to be
used.
* systemd-boot now supports being loaded not from the ESP, for example
for direct kernel boot under QEMU or when embedded into the firmware.
* systemd-boot now parses SMBIOS info to detect virtualization. This
information is used to skip some warnings which are not useful in a
VM and to conditionalize other aspects of behaviour.
* systemd-stub now processes random seeds in the same way as
systemd-boot, in case a unified kernel image is being used from a
different bootloader than systemd-boot.
* bootctl will now generate a system token on all EFI systems, even
virtualized ones, and is activated in the case that the system token
is missing from either sd-boot and sd-stub booted systems.
* bootctl now implements two new verbs: 'kernel-identify' prints the
type of a kernel image, and 'kern...
systemd v252
systemd System and Service Manager****CHANGES WITH 252 🎃:****Announcements of Future Feature Removals:
* We intend to remove cgroup v1 support from systemd release after the
end of 2023. If you run services that make explicit use of cgroup v1
features (i.e. the "legacy hierarchy" with separate hierarchies for
each controller), please implement compatibility with cgroup v2 (i.e.
the "unified hierarchy") sooner rather than later. Most of Linux
userspace has been ported over already.
* We intend to remove support for split-usr (/usr mounted separately
during boot) and unmerged-usr (parallel directories /bin and
/usr/bin, /lib and /usr/lib, etc). This will happen in the second
half of 2023, in the first release that falls into that time window.
For more details, see:
https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html
Compatibility Breaks:
* ConditionKernelVersion= checks that use the '=' or '!=' operators
will now do simple string comparisons (instead of version comparisons
á la stverscmp()). Version comparisons are still done for the
ordering operators '<', '>', '<=', '>='. Moreover, if no operator is
specified, a shell-style glob match is now done. This creates a minor
incompatibility compared to older systemd versions when the '*', '?',
'[', ']' characters are used, as these will now match as shell globs
instead of literally. Given that kernel version strings typically do
not include these characters we expect little breakage through this
change.
* The service manager will now read the SELinux label used for SELinux
access checks from the unit file at the time it loads the file.
Previously, the label would be read at the moment of the access
check, which was problematic since at that time the unit file might
already have been updated or removed.
New Features:
* systemd-measure is a new tool for calculating and signing expected
TPM2 PCR values for a given unified kernel image (UKI) booted via
sd-stub. The public key used for the signature and the signed
expected PCR information can be embedded inside the UKI. This
information can be extracted from the UKI by external tools and code
in the image itself and is made available to userspace in the booted
kernel.
systemd-cryptsetup, systemd-cryptenroll, and systemd-creds have been
updated to make use of this information if available in the booted
kernel: when locking an encrypted volume/credential to the TPM
systemd-cryptenroll/systemd-creds will use the public key to bind the
volume/credential to any kernel that carries PCR information signed
by the same key pair. When unlocking such volumes/credentials
systemd-cryptsetup/systemd-creds will use the signature embedded in
the booted UKI to gain access.
Binding TPM-based disk encryption to public keys/signatures of PCR
values — instead of literal PCR values — addresses the inherent
"brittleness" of traditional PCR-bound TPM disk encryption schemes:
disks remain accessible even if the UKI is updated, without any TPM
specific preparation during the OS update — as long as each UKI
carries the necessary PCR signature information.
Net effect: if you boot a properly prepared kernel, TPM-bound disk
encryption now defaults to be locked to kernels which carry PCR
signatures from the same key pair. Example: if a hypothetical distro
FooOS prepares its UKIs like this, TPM-based disk encryption is now –
by default – bound to only FooOS kernels, and encrypted volumes bound
to the TPM cannot be unlocked on kernels from other sources. (But do
note this behaviour requires preparation/enabling in the UKI, and of
course users can always enroll non-TPM ways to unlock the volume.)
* systemd-pcrphase is a new tool that is invoked at six places during
system runtime, and measures additional words into TPM2 PCR 11, to
mark milestones of the boot process. This allows binding access to
specific TPM2-encrypted secrets to specific phases of the boot
process. (Example: LUKS2 disk encryption key only accessible in the
initrd, but not later.)
Changes in systemd itself, i.e. the manager and units
* The cpu controller is delegated to user manager units by default, and
CPUWeight= settings are applied to the top-level user slice units
(app.slice, background.slice, session.slice). This provides a degree
of resource isolation between different user services competing for
the CPU.
* Systemd can optionally do a full preset in the "first boot" condition
(instead of just enable-only). This behaviour is controlled by the
compile-time option -Dfirst-boot-full-preset. Right now it defaults
to 'false', but the plan is to switch it to 'true' for the subsequent
release.
* Drop-ins are now allowed for transient units too.
* Systemd will set the taint flag 'support-ended' if it detects that
the OS image is past its end-of-support date. This date is declared
in a new /etc/os-release field SUPPORT_END= described below.
* Two new settings ConditionCredential= and AssertCredential= can be
used to skip or fail units if a certain system credential is not
provided.
* ConditionMemory= accepts size suffixes (K, M, G, T, …).
* DefaultSmackProcessLabel= can be used in system.conf and user.conf to
specify the SMACK security label to use when not specified in a unit
file.
* DefaultDeviceTimeoutSec= can be used in system.conf and user.conf to
specify the default timeout when waiting for device units to
activate.
* C.UTF-8 is used as the default locale if nothing else has been
configured.
* [Condition|Assert]Firmware= have been extended to support certain
SMBIOS fields. For example
ConditionFirmware=smbios-field(board_name = "Custom Board")
conditionalizes the unit to run only when
/sys/class/dmi/id/board_name contains "Custom Board" (without the
quotes).
* ConditionFirstBoot= now correctly evaluates as true only during the
boot phase of the first boot. A unit executed later, after booting
has completed, will no longer evaluate this condition as true.
* Socket units will now create sockets in the SELinuxContext= of the
associated service unit, if any.
* Boot phase transitions (start initrd → exit initrd → boot complete →
shutdown) will be measured into TPM2 PCR 11, so that secrets can be
bound to a specific runtime phase. E.g.: a LUKS encryption key can be
unsealed only in the initrd.
* Service credentials (i.e. SetCredential=/LoadCredential=/…) will now
also be provided to ExecStartPre= processes.
* Various units are now correctly ordered against
initrd-switch-root.target where previously a conflict without
ordering was configured. A stop job for those units would be queued,
but without the ordering it could be executed only after
initrd-switch-root.service, leading to units not being restarted in
the host system as expected.
* In order to fully support the IPMI watchdog driver, which has not yet
been ported to the new common watchdog device interface,
/dev/watchdog0 will be tried first and systemd will silently fallback
to /dev/watchdog if it is not found.
* New watchdog-related D-Bus properties are now published by systemd:
WatchdogDevice, WatchdogLastPingTimestamp,
WatchdogLastPingTimestampMonotonic.
* At shutdown, API virtual files systems (proc, sys, etc.) will be
unmounted lazily.
* At shutdown, systemd will now log about processes blocking unmounting
of file systems.
* A new meson build option 'clock-valid-range-usec-max' was added to
allow disabling system time correction if RTC returns a timestamp far
in the future.
* Propagated restart jobs will no longer be discarded while a unit is
activating.
* PID 1 will now import system credentials from SMBIOS Type 11 fields
("OEM vendor strings"), in addition to qemu_fwcfg. This provides a
simple, fast and generic path for supplying credentials to a VM,
without involving external tools such as cloud-init/ignition.
* The CPUWeight= setting of unit files now accepts a new special value
"idle", which configures "idle" level scheduling for the unit.
* Service processes that are activated due to a .timer or .path unit
triggering will now receive information about this via environment
variables. Note that this is information is lossy, as activation
might be coalesced and only one of the activating triggers will be
reported. This is hence more suited for debugging or tracing rather
than for behaviour decisions.
* The riscv_flush_icache(2) system call has been added to the list of
system calls allowed by default when ...
systemd v252-rc3
systemd System and Service Manager****CHANGES WITH 252 in spe:****Announcements of Future Feature Removals:
* We intend to remove cgroup v1 support from systemd release after the
end of 2023. If you run services that make explicit use of cgroup v1
features (i.e. the "legacy hierarchy" with separate hierarchies for
each controller), please implement compatibility with cgroup v2 (i.e.
the "unified hierarchy") sooner rather than later. Most of Linux
userspace has been ported over already.
* We intend to remove support for split-usr (/usr mounted separately
during boot) and unmerged-usr (parallel directories /bin and
/usr/bin, /lib and /usr/lib, etc). This will happen in the second
half of 2023, in the first release that falls into that time window.
For more details, see:
https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html
Compatibility Breaks:
* ConditionKernelVersion= checks that use the '=' or '!=' operators
will now do simple string comparisons (instead of version comparisons
á la stverscmp()). Version comparisons are still done for the
ordering operators '<', '>', '<=', '>='. Moreover, if no operator is
specified, a shell-style glob match is now done. This creates a minor
incompatibility compared to older systemd versions when the '*', '?',
'[', ']' characters are used, as these will now match as shell globs
instead of literally. Given that kernel version strings typically do
not include these characters we expect little breakage through this
change.
* The service manager will now read the SELinux label used for SELinux
access checks from the unit file at the time it loads the file.
Previously, the label would be read at the moment of the access
check, which was problematic since at that time the unit file might
already have been updated or removed.
New Features:
* systemd-measure is a new tool for calculating and signing expected
TPM2 PCR values for a given unified kernel image (UKI) booted via
sd-stub. The public key used for the signature and the signed
expected PCR information can be embedded inside the UKI. This
information can be extracted from the UKI by external tools and code
in the image itself and is made available to userspace in the booted
kernel.
systemd-cryptsetup, systemd-cryptenroll, and systemd-creds have been
updated to make use of this information if available in the booted
kernel: when locking an encrypted volume/credential to the TPM
systemd-cryptenroll/systemd-creds will use the public key to bind the
volume/credential to any kernel that carries PCR information signed
by the same key pair. When unlocking such volumes/credentials
systemd-cryptsetup/systemd-creds will use the signature embedded in
the booted UKI to gain access.
Binding TPM-based disk encryption to public keys/signatures of PCR
values — instead of literal PCR values — addresses the inherent
"brittleness" of traditional PCR-bound TPM disk encryption schemes:
disks remain accessible even if the UKI is updated, without any TPM
specific preparation during the OS update — as long as each UKI
carries the necessary PCR signature information.
Net effect: if you boot a properly prepared kernel, TPM-bound disk
encryption now defaults to be locked to kernels which carry PCR
signatures from the same key pair. Example: if a hypothetical distro
FooOS prepares its UKIs like this, TPM-based disk encryption is now –
by default – bound to only FooOS kernels, and encrypted volumes bound
to the TPM cannot be unlocked on kernels from other sources. (But do
note this behaviour requires preparation/enabling in the UKI, and of
course users can always enroll non-TPM ways to unlock the volume.)
* systemd-pcrphase is a new tool that is invoked at six places during
system runtime, and measures additional words into TPM2 PCR 11, to
mark milestones of the boot process. This allows binding access to
specific TPM2-encrypted secrets to specific phases of the boot
process. (Example: LUKS2 disk encryption key only accessible in the
initrd, but not later.)
Changes in systemd itself, i.e. the manager and units
* The cpu controller is delegated to user manager units by default, and
CPUWeight= settings are applied to the top-level user slice units
(app.slice, background.slice, session.slice). This provides a degree
of resource isolation between different user services competing for
the CPU.
* Systemd can optionally do a full preset in the "first boot" condition
(instead of just enable-only). This behaviour is controlled by the
compile-time option -Dfirst-boot-full-preset. Right now it defaults
to 'false', but the plan is to switch it to 'true' for the subsequent
release.
* Drop-ins are now allowed for transient units too.
* Systemd will set the taint flag 'support-ended' if it detects that
the OS image is past its end-of-support date. This date is declared
in a new /etc/os-release field SUPPORT_END= described below.
* Two new settings ConditionCredential= and AssertCredential= can be
used to skip or fail units if a certain system credential is not
provided.
* ConditionMemory= accepts size suffixes (K, M, G, T, …).
* DefaultSmackProcessLabel= can be used in system.conf and user.conf to
specify the SMACK security label to use when not specified in a unit
file.
* DefaultDeviceTimeoutSec= can be used in system.conf and user.conf to
specify the default timeout when waiting for device units to
activate.
* C.UTF-8 is used as the default locale if nothing else has been
configured.
* [Condition|Assert]Firmware= have been extended to support certain
SMBIOS fields. For example
ConditionFirmware=smbios-field(board_name = "Custom Board")
conditionalizes the unit to run only when
/sys/class/dmi/id/board_name contains "Custom Board" (without the
quotes).
* ConditionFirstBoot= now correctly evaluates as true only during the
boot phase of the first boot. A unit executed later, after booting
has completed, will no longer evaluate this condition as true.
* Socket units will now create sockets in the SELinuxContext= of the
associated service unit, if any.
* Boot phase transitions (start initrd → exit initrd → boot complete →
shutdown) will be measured into TPM2 PCR 11, so that secrets can be
bound to a specific runtime phase. E.g.: a LUKS encryption key can be
unsealed only in the initrd.
* Service credentials (i.e. SetCredential=/LoadCredential=/…) will now
also be provided to ExecStartPre= processes.
* Various units are now correctly ordered against
initrd-switch-root.target where previously a conflict without
ordering was configured. A stop job for those units would be queued,
but without the ordering it could be executed only after
initrd-switch-root.service, leading to units not being restarted in
the host system as expected.
* In order to fully support the IPMI watchdog driver, which has not yet
been ported to the new common watchdog device interface,
/dev/watchdog0 will be tried first and systemd will silently fallback
to /dev/watchdog if it is not found.
* New watchdog-related D-Bus properties are now published by systemd:
WatchdogDevice, WatchdogLastPingTimestamp,
WatchdogLastPingTimestampMonotonic.
* At shutdown, API virtual files systems (proc, sys, etc.) will be
unmounted lazily.
* At shutdown, systemd will now log about processes blocking unmounting
of file systems.
* A new meson build option 'clock-valid-range-usec-max' was added to
allow disabling system time correction if RTC returns a timestamp far
in the future.
* Propagated restart jobs will no longer be discarded while a unit is
activating.
* PID 1 will now import system credentials from SMBIOS Type 11 fields
("OEM vendor strings"), in addition to qemu_fwcfg. This provides a
simple, fast and generic path for supplying credentials to a VM,
without involving external tools such as cloud-init/ignition.
* The CPUWeight= setting of unit files now accepts a new special value
"idle", which configures "idle" level scheduling for the unit.
* Service processes that are activated due to a .timer or .path unit
triggering will now receive information about this via environment
variables. Note that this is information is lossy, as activation
might be coalesced and only one of the activating triggers will be
reported. This is hence more suited for debugging or tracing rather
than for behaviour decisions.
* The riscv_flush_icache(2) system call has been added to the list of
system calls allowed by default when ...
systemd v252-rc2
CHANGES WITH 252 in spe:****Announcements of Future Feature Removals:
* We intend to remove cgroup v1 support from systemd release after the
end of 2023. If you run services that make explicit use of cgroup v1
features (i.e. the "legacy hierarchy" with separate hierarchies for
each controller), please implement compatibility with cgroup v2 (i.e.
the "unified hierarchy") sooner rather than later. Most of Linux
userspace has been ported over already.
* We intend to remove support for split-usr (/usr mounted separately
during boot) and unmerged-usr (parallel directories /bin and
/usr/bin, /lib and /usr/lib, etc). This will happen in the second
half of 2023, in the first release that falls into that time window.
For more details, see:
https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html
Compatibility Breaks:
* ConditionKernelVersion= checks that use the '=' or '!=' operators
will now do simple string comparisons (instead of version comparisons
á la stverscmp()). Version comparisons are still done for the
ordering operators '<', '>', '<=', '>='. Moreover, if no operator is
specified, a shell-style glob match is now done. This creates a minor
incompatibility compared to older systemd versions when the '*', '?',
'[', ']' characters are used, as these will now match as shell globs
instead of literally. Given that kernel version strings typically do
not include these characters we expect little breakage through this
change.
* The service manager will now read the SELinux label used for SELinux
access checks from the unit file at the time it loads the file.
Previously, the label would be read at the moment of the access
check, which was problematic since at that time the unit file might
already have been updated or removed.
New Features:
* systemd-measure is a new tool for calculating and signing expected
TPM2 PCR values for a given unified kernel image (UKI) booted via
sd-stub. The public key used for the signature and the signed
expected PCR information can be embedded inside the UKI. This
information can be extracted from the UKI by external tools and code
in the image itself and is made available to userspace in the booted
kernel.
systemd-cryptsetup, systemd-cryptenroll, and systemd-creds have been
updated to make use of this information if available in the booted
kernel: when locking an encrypted volume/credential to the TPM
systemd-cryptenroll/systemd-creds will use the public key to bind the
volume/credential to any kernel that carries PCR information signed
by the same key pair. When unlocking such volumes/credentials
systemd-cryptsetup/systemd-creds will use the signature embedded in
the booted UKI to gain access.
Binding TPM-based disk encryption to public keys/signatures of PCR
values — instead of literal PCR values — addresses the inherent
"brittleness" of traditional PCR-bound TPM disk encryption schemes:
disks remain accessible even if the UKI is updated, without any TPM
specific preparation during the OS update — as long as each UKI
carries the necessary PCR signature information.
Net effect: if you boot a properly prepared kernel, TPM-bound disk
encryption now defaults to be locked to kernels which carry PCR
signatures from the same key pair. Example: if a hypothetical distro
FooOS prepares its UKIs like this, TPM-based disk encryption is now –
by default – bound to only FooOS kernels, and encrypted volumes bound
to the TPM cannot be unlocked on kernels from other sources. (But do
note this behaviour requires preparation/enabling in the UKI, and of
course users can always enroll non-TPM ways to unlock the volume.)
* systemd-pcrphase is a new tool that is invoked at six places during
system runtime, and measures additional words into TPM2 PCR 11, to
mark milestones of the boot process. This allows binding access to
specific TPM2-encrypted secrets to specific phases of the boot
process. (Example: LUKS2 disk encryption key only accessible in the
initrd, but not later.)
Changes in systemd itself, i.e. the manager and units
* The cpu controller is delegated to user manager units by default, and
CPUWeight= settings are applied to the top-level user slice units
(app.slice, background.slice, session.slice). This provides a degree
of resource isolation between different user services competing for
the CPU.
* Systemd can optionally do a full preset in the "first boot" condition
(instead of just enable-only). This behaviour is controlled by the
compile-time option -Dfirst-boot-full-preset. Right now it defaults
to 'false', but the plan is to switch it to 'true' for the subsequent
release.
* Drop-ins are now allowed for transient units too.
* Systemd will set the taint flag 'support-ended' if it detects that
the OS image is past its end-of-support date. This date is declared
in a new /etc/os-release field SUPPORT_END= described below.
* Two new settings ConditionCredential= and AssertCredential= can be
used to skip or fail units if a certain system credential is not
provided.
* ConditionMemory= accepts size suffixes (K, M, G, T, …).
* DefaultSmackProcessLabel= can be used in system.conf and user.conf to
specify the SMACK security label to use when not specified in a unit
file.
* DefaultDeviceTimeoutSec= can be used in system.conf and user.conf to
specify the default timeout when waiting for device units to
activate.
* C.UTF-8 is used as the default locale if nothing else has been
configured.
* [Condition|Assert]Firmware= have been extended to support certain
SMBIOS fields. For example
ConditionFirmware=smbios-field(board_name = "Custom Board")
conditionalizes the unit to run only when
/sys/class/dmi/id/board_name contains "Custom Board" (without the
quotes).
* ConditionFirstBoot= now correctly evaluates as true only during the
boot phase of the first boot. A unit executed later, after booting
has completed, will no longer evaluate this condition as true.
* Socket units will now create sockets in the SELinuxContext= of the
associated service unit, if any.
* Boot phase transitions (start initrd → exit initrd → boot complete →
shutdown) will be measured into TPM2 PCR 11, so that secrets can be
bound to a specific runtime phase. E.g.: a LUKS encryption key can be
unsealed only in the initrd.
* Service credentials (i.e. SetCredential=/LoadCredential=/…) will now
also be provided to ExecStartPre= processes.
* Various units are now correctly ordered against
initrd-switch-root.target where previously a conflict without
ordering was configured. A stop job for those units would be queued,
but without the ordering it could be executed only after
initrd-switch-root.service, leading to units not being restarted in
the host system as expected.
* In order to fully support the IPMI watchdog driver, which has not yet
been ported to the new common watchdog device interface,
/dev/watchdog0 will be tried first and systemd will silently fallback
to /dev/watchdog if it is not found.
* New watchdog-related D-Bus properties are now published by systemd:
WatchdogDevice, WatchdogLastPingTimestamp,
WatchdogLastPingTimestampMonotonic.
* At shutdown, API virtual files systems (proc, sys, etc.) will be
unmounted lazily.
* At shutdown, systemd will now log about processes blocking unmounting
of file systems.
* A new meson build option 'clock-valid-range-usec-max' was added to
allow disabling system time correction if RTC returns a timestamp far
in the future.
* Propagated restart jobs will no longer be discarded while a unit is
activating.
* PID 1 will now import system credentials from SMBIOS Type 11 fields
("OEM vendor strings"), in addition to qemu_fwcfg. This provides a
simple, fast and generic path for supplying credentials to a VM,
without involving external tools such as cloud-init/ignition.
* The CPUWeight= setting of unit files now accepts a new special value
"idle", which configures "idle" level scheduling for the unit.
* Service processes that are activated due to a .timer or .path unit
triggering will now receive information about this via environment
variables. Note that this is information is lossy, as activation
might be coalesced and only one of the activating triggers will be
reported. This is hence more suited for debugging or tracing rather
than for behaviour decisions.
* The riscv_flush_icache(2) system call has been added to the list of
system calls allowed by default when SystemCallFilter= is used.
...
systemd v252-rc1
CHANGES WITH 252 in spe:****Announcement of Future Feature Removal:
* We intend to remove cgroup v1 support from systemd release after the
end of 2023. If you run services that make explicit use of cgroup v1
features (i.e. the "legacy hierarchy" with separate hierarchies for
each controller), please implement compatibility with cgroup v2 (i.e.
the "unified hierarchy") sooner rather than later. Most of Linux
userspace has been ported over already.
* We intend to remove support for split-usr (/usr mounted separately
during boot) and unmerged-usr (parallel directories /bin and
/usr/bin, /lib and /usr/lib, etc). This will happen in the second
half of 2023, in the first release that falls into that time window.
For more details, see:
https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html
Compatibility Breaks:
* ConditionKernelVersion= checks that use the '=' or '!=' operators
will now do simple string comparisons (instead of version comparisons
á la stverscmp()). Version comparisons are still done for the
ordering operators '<', '>', '<=', '>='. Moreover, if no operator is
specified, a shell-style glob match is now done. This creates a minor
incompatibility compared to older systemd versions when the '*', '?',
'[', ']' characters are used, as these will now match as shell globs
instead of literally. Given that kernel version strings typically do
not include these characters we expect little breakage through this
change.
* The service manager will now read the SELinux label used for SELinux
access checks from the unit file at the time it loads the file.
Previously, the label would be read at the moment of the access
check, which was problematic since at that time the unit file might
already have been updated or removed.
New Features:
* systemd-measure is a new tool for precalculating and signing expected
TPM2 PCR values seen once a given unified kernel image (UKI) with
systemd-stub is booted. This is useful for implementing TPM2 policies
for LUKS encrypted volumes and encrypted system/service credentials,
that robustly bind to kernels carrying appropriate PCR signature
information. The signed expected PCR information may be embedded
inside UKI images for this purpose so that it is automatically
available in userspace, once the UKI is booted.
systemd-cryptsetup, systemd-cryptenroll and systemd-creds have been
updated to make use of this information if available in the booted
kernel.
Net effect: if you boot a properly prepared kernel, TPM-bound disk
encryption now defaults to be locked to kernels which carry PCR
signatures from the same signature key pair. Example: if a
hypothetical distro FooOS prepares its UKI kernels like this,
TPM-based disk encryption is now – by default – bound to only FooOS
kernels, and encrypted volumes bound to the TPM cannot be unlocked on
other kernels from other sources. (But do note this behaviour
requires preparation/enabling in the UKI, and of course users can
always enroll non-TPM ways to unlock the volume.)
Binding TPM-based disk encryption to public keys/signatures of PCR
values — instead of literal PCR values — addresses the inherent
"brittleness" of traditional PCR-bound TPM disk encryption schemes:
disks remain accessible even if the UKI image is updated, without any
prepartion during the update scheme — as long as each UKI carries the
necessary PCR signature information.
* systemd-pcrphase is a new tool that is invoked at 4 places during
system runtime, and measures additional words into TPM2 PCR 11, to
mark milestones of the boot process. This allows binding access to
specific TPM2-encrypted secrets to specific phases of the boot
process. (Think: LUKS2 disk encryption key only accessible in the
initrd, but not later.)
Changes in systemd itself, i.e. the manager, and units
* The cpu controller is delegated to user manager units by default, and
CPUWeight= settings are applied to the top-level user slice units
(app.slice, background.slice, session.slice). This provides a degree
of resource isolation between different user services competing for
the CPU.
* Systemd can optionally do a full preset in the "first boot" condition
(instead of just enable-only). This behaviour is controlled by the
compile-time option -Dfirst-boot-full-preset. Right now it defaults
to 'false', but the plan is to switch it to 'true' for the subsequent
release.
* Systemd will set the taint flag 'support-ended' if it detects that
the OS image is past its end-of-support date. This date is declared
in a new /etc/os-release field SUPPORT_END= described below.
* Two new settings ConditionCredential= and AssertCredential= can be
used to skip or fail units if a certain system credential is not
provided.
* ConditionMemory= accepts size suffixes (K, M, G, T, …).
* DefaultSmackProcessLabel= can be used in system.conf and user.conf to
specify the SMACK security label to use when not specified in a unit
file.
* DefaultDeviceTimeoutSec= can be used in system.conf and user.conf to
specify the default timeout when waiting for device units to
activate.
* C.UTF-8 is used as the default locale if nothing else has been
configured.
* [Condition|Assert]Firmware= have been extended to support certain
SMBIOS fields. For example
ConditionFirmware=smbios-field(board_name = "Custom Board")
conditionalizes the unit to run only when
/sys/class/dmi/id/board_name contains "Custom Board" (without the
quotes).
* ConditionFirstBoot= now correctly evaluates as true only during the
boot phase of the first boot. A unit executed later, after booting
has completed, will no longer evaluate this condition as true.
* Socket units will now create sockets in the SELinuxContext= of the
associated service unit, if any.
* Boot phase transitions (start initrd → exit initrd → boot complete →
shutdown) will be measured into TPM2 PCR 11, so that secrets can be
bound to a specific runtime phase. E.g.: a LUKS encryption key can be
unsealed only in the initrd.
* Service credentials (i.e. SetCredential=/LoadCredential=/…) will now
also be provided to ExecStartPre= processes.
* Various units are now correctly ordered against
initrd-switch-root.target where previously a conflict without
ordering was configured. A stop job for those units would be queued,
but without the ordering it could be executed only after
initrd-switch-root.service, leading to units not being restarted in
the host system as expected.
* In order to fully support the IPMI watchdog driver, which has not yet
been ported to the new common watchdog device interface,
/dev/watchdog0 will be tried first and systemd will silently fallback
to /dev/watchdog if it is not found.
* New watchdog-related D-Bus properties are now published by systemd:
WatchdogDevice, WatchdogLastPingTimestamp,
WatchdogLastPingTimestampMonotonic.
* At shutdown, API virtual files systems (proc, sys, etc.) will be
unmounted lazily.
* At shutdown, systemd will now log about processes blocking unmounting
of file systems.
* A new meson build option 'clock-valid-range-usec-max' was added to
allow disabling system time correction if RTC returns a timestamp far
in the future.
* Propagated restart jobs will no longer be discarded while a unit is
activating.
* PID 1 will now import system credentials from SMBIOS Type 11 fields
("OEM vendor strings"), in addition to qemu_fwcfg. This provides a
simple, fast and generic path for supplying credentials to a VM,
without involving external tools such as cloud-init/ignition.
* The CPUWeight= setting of unit files now accepts a new special value
"idle", which configures "idle" level scheduling for the unit.
* Service processes that are activated due to a .timer or .path unit
triggering will now receive information about this via environment
variables. Note that this is information is lossy, as activation
might be coalesced and only one of the activating triggers will be
reported. This is hence more suited for debugging or tracing rather
than for behaviour decisions.
Changes in sd-boot, bootctl, and the Boot Loader Specification:
* The Boot Loader Specification has been cleaned up and clarified.
Various corner cases in version string comparisons have been fixed
(e.g. comparisons for empty strings). Boot counting is now part of
the main specification.
* New PCRs measurements are performed during boot: PCR 11 for the the
kernel+initrd combo, PCR 13 for any sysext images. If a m...
systemd v251
systemd System and Service Manager****CHANGES WITH 251:
Backwards-incompatible changes:
* The minimum kernel version required has been bumped from 3.13 to 4.15,
and CLOCK_BOOTTIME is now assumed to always exist.
* C11 with GNU extensions (aka "gnu11") is now used to build our
components. Public API headers are still restricted to ISO C89.
* In v250, a systemd-networkd feature that automatically configures
routes to addresses specified in AllowedIPs= was added and enabled by
default. However, this causes network connectivity issues in many
existing setups. Hence, it has been disabled by default since
systemd-stable 250.3. The feature can still be used by explicitly
configuring RouteTable= setting in .netdev files.
* Jobs started via StartUnitWithFlags() will no longer return 'skipped'
when a Condition*= check does not succeed, restoring the JobRemoved
signal to the behaviour it had before v250.
* The org.freedesktop.portable1 methods GetMetadataWithExtensions() and
GetImageMetadataWithExtensions() have been fixed to provide an extra
return parameter, containing the actual extension release metadata.
The current implementation was judged to be broken and unusable, and
thus the usual procedure of adding a new set of methods was skipped,
and backward compatibility broken instead on the assumption that
nobody can be affected given the current state of this interface.
* All kernels supported by systemd mix RDRAND (or similar) into the
entropy pool at early boot. This means that on those systems, even if
/dev/urandom is not yet initialized, it still returns bytes that that
are at least as high quality as RDRAND. For that reason, we no longer
have reason to invoke RDRAND from systemd itself, which has
historically been a source of bugs. Furthermore, kernels ≥5.6 provide
the getrandom(GRND_INSECURE) interface for returning random bytes
before the entropy pool is initialized without warning into kmsg,
which is what we attempt to use if available. systemd's direct usage
of RDRAND has been removed. x86 systems ≥Broadwell that are running
an older kernel may experience kmsg warnings that were not seen with
250. For newer kernels, non-x86 systems, or older x86 systems, there
should be no visible changes.
* sd-boot will now measure the kernel command line into TPM PCR 12
rather than PCR 8. This improves usefulness of the measurements on
systems where sd-boot is chainloaded from Grub. Grub measures all
commands its executes into PCR 8, which makes it very hard to use
reasonably, hence separate ourselves from that and use PCR 12
instead, which is what certain Ubuntu editions already do. To retain
compatibility with systems running older systemd systems a new meson
option 'efi-tpm-pcr-compat' has been added (which defaults to false).
If enabled, the measurement is done twice: into the new-style PCR 12
*and* the old-style PCR 8. It's strongly advised to migrate all users
to PCR 12 for this purpose in the long run, as we intend to remove
this compatibility feature in two year's time.
* busctl capture now writes output in the newer pcapng format instead
of pcap.
* An udev rule that imported hwdb matches for USB devices with
lowercase hexadecimal vendor/product ID digits was added in systemd
250. This has been reverted, since uppercase hexadecimal digits are
supposed to be used, and we already had a rule that with the
appropriate match.
Users might need to adjust their local hwdb entries.
* arch_prctl(2) has been moved to the @default set in the syscall filters
(as exposed via the SystemCallFilter= setting in service unit files).
It is apparently used by the linker now.
* The tmpfiles entries that create the /run/systemd/netif directory and
its subdirectories were moved from tmpfiles.d/systemd.conf to
tmpfiles.d/systemd-network.conf.
Users might need to adjust their files that override tmpfiles.d/systemd.conf
to account for this change.
* The requirement for Portable Services images to contain a well-formed
os-release file (i.e.: contain at least an ID field) is now enforced.
This applies to base images and extensions, and also to systemd-sysext.
Changes in the Boot Loader Specification, kernel-install and sd-boot:
* kernel-install's and bootctl's Boot Loader Specification Type #1
entry generation logic has been reworked. The user may now pick
explicitly by which "token" string to name the installation's boot
entries, via the new /etc/kernel/entry-token file or the new
--entry-token= switch to bootctl. By default — as before — the
entries are named after the local machine ID. However, in "golden
image" environments, where the machine ID shall be initialized on
first boot (as opposed to at installation time before first boot) the
machine ID will not be available at build time. In this case the
--entry-token= switch to bootctl (or the /etc/kernel/entry-token
file) may be used to override the "token" for the entries, for
example the IMAGE_ID= or ID= fields from /etc/os-release. This will
make the OS images independent of any machine ID, and ensure that the
images will not carry any identifiable information before first boot,
but on the other hand means that multiple parallel installations of
the very same image on the same disk cannot be supported.
Summary: if you are building golden images that shall acquire
identity information exclusively on first boot, make sure to both
remove /etc/machine-id *and* to write /etc/kernel/entry-token to the
value of the IMAGE_ID= or ID= field of /etc/os-release or another
suitable identifier before deploying the image.
* The Boot Loader Specification has been extended with
/loader/entries.srel file located in the EFI System Partition (ESP)
that disambiguates the format of the entries in the /loader/entries/
directory (in order to discern them from incompatible uses of this
directory by other projects). For entries that follow the
Specification, the string "type1" is stored in this file.
bootctl will now write this file automatically when installing the
systemd-boot boot loader.
* kernel-install supports a new initrd_generator= setting in
/etc/kernel/install.conf, that is exported as
$KERNEL_INSTALL_INITRD_GENERATOR to kernel-install plugins. This
allows choosing different initrd generators.
* kernel-install will now create a "staging area" (an initially-empty
directory to gather files for a Boot Loader Specification Type #1
entry). The path to this directory is exported as
$KERNEL_INSTALL_STAGING_AREA to kernel-install plugins, which should
drop files there instead of writing them directly to the final
location. kernel-install will move them when all files have been
prepared successfully.
* New option sort-key= has been added to the Boot Loader Specification
to override the sorting order of the entries in the boot menu. It is
read by sd-boot and bootctl, and will be written by kernel-install,
with the default value of IMAGE_ID= or ID= fields from
os-release. Together, this means that on multiboot installations,
entries should be grouped and sorted in a predictable way.
* The sort order of boot entries has been updated: entries which have
the new field sort-key= are sorted by it first, and all entries
without it are ordered later. After that, entries are sorted by
version so that newest entries are towards the beginning of the list.
* The kernel-install tool gained a new 'inspect' verb which shows the
paths and other settings used.
* sd-boot can now optionally beep when the menu is shown and menu
entries are selected, which can be useful on machines without a
working display. (Controllable via a loader.conf setting.)
* The --make-machine-id-directory= switch to bootctl has been replaced
by --make-entry-directory=, given that the entry directory is not
necessarily named after the machine ID, but after some other suitable
ID as selected via --entry-token= described above. The old name of
the option is still understood to maximize compatibility.
* 'bootctl list' gained support for a new --json= switch to output boot
menu entries in JSON format.
* 'bootctl is-installed' now supports the --graceful, and various verbs
omit output with the new option --quiet.
Changes in systemd-homed:
* Starting with v250 systemd-homed uses UID/GID mapping on the mounts
of activated home directories it manages (if the kernel and selected
file systems support it). So far it mapped three UID ranges: the
range from 0…60000, the user's own UID, and the range 60514…65534,
leaving everything else unmapped (in other words, the 16bit UID range
...
v251-rc3
Backwards-incompatible changes:
* The minimum kernel version required has been bumped from 3.13 to 4.15,
and CLOCK_BOOTTIME is now assumed to always exist.
* C11 with GNU extensions (aka "gnu11") is now used to build our
components. Public API headers are still restricted to ISO C89.
* In v250, a systemd-networkd feature that automatically configures
routes to addresses specified in AllowedIPs= was added and enabled by
default. However, this causes network connectivity issues in many
existing setups. Hence, it has been disabled by default since
systemd-stable 250.3. The feature can still be used by explicitly
configuring RouteTable= setting in .netdev files.
* Jobs started via StartUnitWithFlags() will no longer return 'skipped'
when a Condition*= check does not succeed, restoring the JobRemoved
signal to the behaviour it had before v250.
* The org.freedesktop.portable1 methods GetMetadataWithExtensions() and
GetImageMetadataWithExtensions() have been fixed to provide an extra
return parameter, containing the actual extension release metadata.
The current implementation was judged to be broken and unusable, and
thus the usual procedure of adding a new set of methods was skipped,
and backward compatibility broken instead on the assumption that
nobody can be affected given the current state of this interface.
* All kernels supported by systemd mix RDRAND (or similar) into the
entropy pool at early boot. This means that on those systems, even if
/dev/urandom is not yet initialized, it still returns bytes that that
are at least as high quality as RDRAND. For that reason, we no longer
have reason to invoke RDRAND from systemd itself, which has
historically been a source of bugs. Furthermore, kernels ≥5.6 provide
the getrandom(GRND_INSECURE) interface for returning random bytes
before the entropy pool is initialized without warning into kmsg,
which is what we attempt to use if available. systemd's direct usage
of RDRAND has been removed. x86 systems ≥Broadwell that are running
an older kernel may experience kmsg warnings that were not seen with
250. For newer kernels, non-x86 systems, or older x86 systems, there
should be no visible changes.
* sd-boot will now measure the kernel command line into TPM PCR 12
rather than PCR 8. This improves usefulness of the measurements on
systems where sd-boot is chainloaded from Grub. Grub measures all
commands its executes into PCR 8, which makes it very hard to use
reasonably, hence separate ourselves from that and use PCR 12
instead, which is what certain Ubuntu editions already do. To retain
compatibility with systems running older systemd systems a new meson
option 'efi-tpm-pcr-compat' has been added (which defaults to false).
If enabled, the measurement is done twice: into the new-style PCR 12
*and* the old-style PCR 8. It's strongly advised to migrate all users
to PCR 12 for this purpose in the long run, as we intend to remove
this compatibility feature in two year's time.
* busctl capture now writes output in the newer pcapng format instead
of pcap.
* An udev rule that imported hwdb matches for USB devices with
lowercase hexadecimal vendor/product ID digits was added in systemd
250. This has been reverted, since uppercase hexadecimal digits are
supposed to be used, and we already had a rule that with the
appropriate match.
Users might need to adjust their local hwdb entries.
* arch_prctl(2) has been moved to the @default set in the syscall filters
(as exposed via the SystemCallFilter= setting in service unit files).
It is apparently used by the linker now.
* The tmpfiles entries that create the /run/systemd/netif directory and
its subdirectories were moved from tmpfiles.d/systemd.conf to
tmpfiles.d/systemd-network.conf.
Users might need to adjust their files that override tmpfiles.d/systemd.conf
to account for this change.
Changes in the Boot Loader Specification, kernel-install and sd-boot:
* kernel-install's and bootctl's Boot Loader Specification Type #1
entry generation logic has been reworked. The user may now pick
explicitly by which "token" string to name the installation's boot
entries, via the new /etc/kernel/entry-token file or the new
--entry-token= switch to bootctl. By default — as before — the
entries are named after the local machine ID. However, in "golden
image" environments, where the machine ID shall be initialized on
first boot (as opposed to at installation time before first boot) the
machine ID will not be available at build time. In this case the
--entry-token= switch to bootctl (or the /etc/kernel/entry-token
file) may be used to override the "token" for the entries, for
example the IMAGE_ID= or ID= fields from /etc/os-release. This will
make the OS images independent of any machine ID, and ensure that the
images will not carry any identifiable information before first boot,
but on the other hand means that multiple parallel installations of
the very same image on the same disk cannot be supported.
Summary: if you are building golden images that shall acquire
identity information exclusively on first boot, make sure to both
remove /etc/machine-id *and* to write /etc/kernel/entry-token to the
value of the IMAGE_ID= or ID= field of /etc/os-release or another
suitable identifier before deploying the image.
* The Boot Loader Specification has been extended with
/loader/entries.srel file located in the EFI System Partition (ESP)
that disambiguates the format of the entries in the /loader/entries/
directory (in order to discern them from incompatible uses of this
directory by other projects). For entries that follow the
Specification, the string "type1" is stored in this file.
bootctl will now write this file automatically when installing the
systemd-boot boot loader.
* kernel-install supports a new initrd_generator= setting in
/etc/kernel/install.conf, that is exported as
$KERNEL_INSTALL_INITRD_GENERATOR to kernel-install plugins. This
allows choosing different initrd generators.
* kernel-install will now create a "staging area" (an initially-empty
directory to gather files for a Boot Loader Specification Type #1
entry). The path to this directory is exported as
$KERNEL_INSTALL_STAGING_AREA to kernel-install plugins, which should
drop files there instead of writing them directly to the final
location. kernel-install will move them when all files have been
prepared successfully.
* New option sort-key= has been added to the Boot Loader Specification
to override the sorting order of the entries in the boot menu. It is
read by sd-boot and bootctl, and will be written by kernel-install,
with the default value of IMAGE_ID= or ID= fields from
os-release. Together, this means that on multiboot installations,
entries should be grouped and sorted in a predictable way.
* The sort order of boot entries has been updated: entries which have
the new field sort-key= are sorted by it first, and all entries
without it are ordered later. After that, entries are sorted by
version so that newest entries are towards the beginning of the list.
* The kernel-install tool gained a new 'inspect' verb which shows the
paths and other settings used.
* sd-boot can now optionally beep when the menu is shown and menu
entries are selected, which can be useful on machines without a
working display. (Controllable via a loader.conf setting.)
* The --make-machine-id-directory= switch to bootctl has been replaced
by --make-entry-directory=, given that the entry directory is not
necessarily named after the machine ID, but after some other suitable
ID as selected via --entry-token= described above. The old name of
the option is still understood to maximize compatibility.
* 'bootctl list' gained support for a new --json= switch to output boot
menu entries in JSON format.
* 'bootctl is-installed' now supports the --graceful, and various verbs
omit output with the new option --quiet.
Changes in systemd-homed:
* Starting with v250 systemd-homed uses UID/GID mapping on the mounts
of activated home directories it manages (if the kernel and selected
file systems support it). So far it mapped three UID ranges: the
range from 0…60000, the user's own UID, and the range 60514…65534,
leaving everything else unmapped (in other words, the 16bit UID range
is mapped almost fully, with the exception of the UID subrange used
for systemd-homed users, with one exception: the user's own UID).
Unmapped UIDs may not be used for file ownership in the home
directory — any chown() attempts with them will fail. With this
re...