Headline
RHEL confidential virtual machines on Azure: A technical deep dive
The Red Hat Enterprise Linux 9.2 CVM Preview image for Azure confidential VMs has been released, and it represents an important step forward in confidential virtual machines. In this article, I focus on the changes Implemented to support the emerging confidential computing use-case, and some of the expected changes in the future.
For this article, I’m using confidential virtual machines (CVMs) with the Technology Preview of Red Hat Enterprise Linux 9.2, running as a guest on Microsoft Azure confidential VMs. This builds on my previous post in which I discussed the high-level requirements fo
The Red Hat Enterprise Linux 9.2 CVM Preview image for Azure confidential VMs has been released, and it represents an important step forward in confidential virtual machines. In this article, I focus on the changes Implemented to support the emerging confidential computing use-case, and some of the expected changes in the future.
For this article, I’m using confidential virtual machines (CVMs) with the Technology Preview of Red Hat Enterprise Linux 9.2, running as a guest on Microsoft Azure confidential VMs. This builds on my previous post in which I discussed the high-level requirements for a Linux operating system to support a CVM use-case.
Azure Confidential VMs
In July, 2022, Microsoft announced the general availability of AMD SEV-SNP CVMs on Azure, with two new instance types: DCasv5 and ECasv5. These instance types provide strong isolation between the host and the running VM, and offer integrity guarantees for the state of the running VM. Additionally, these VMs provide:
- SecureBoot capabilities.
- A dedicated virtual Trusted Platform Module (vTPM) instance for each VM
- Cloud-based confidential disk encryption prior to first boot
These three features are crucial in providing the main advantage of CVM, the “confidentiality” validator that means data inside the VM maintains protection both in run time and at rest.
Data protection at run time
A VM enabled with SEV-SNP has its memory and CPU register state protected from the hypervisor and the host OS. Additionally, there are memory integrity guarantees. For example, a malicious or compromised hypervisor cannot swap two encrypted memory pages in an attempt to confuse the guest.
While all these protections are transparent to the userspace applications running inside of the VM, the guest operating system is aware of them. In particular, the guest kernel is responsible for sharing specific pieces of information, such as certain memory pages and CPU register values, with the hypervisor explicitly to allow input/output operations. Red Hat Enterprise Linux 9 gained support for AMD SEV-SNP technology on Azure in the 9.1 release.
Data protection at rest
In an Azure CVM, data protection at rest is achieved by using confidential disk encryption. The technology allows all important parts of the root volume to be protected with an encryption key, available only to the guest operating system. The key and the data on the root volume are never revealed to the hypervisor or host operating system. This way, even if the host running the VM gets compromised, your data remains safe. Red Hat Enterprise Linux 9 adds support for confidential disk encryption as a Technology Preview in its 9.2 release.
Confidential disk encryption in Azure
With confidential disk encryption, the root volume of a VM is pre-encrypted prior to the first boot on the target host. As I described in my previous article, some parts of the volume must remain unencrypted. In particular, all the code required to obtain the key to the encrypted part of the volume and perform the decryption must remain as cleartext. Red Hat Enterprise Linux 9 supports encrypted volumes with LUKS technology. When the root volume of the operating system is encrypted, the decryption must happen in the initramfs; meaning that the bare minimum unencrypted volume must include all pieces of the boot chain up to the initramfs:
The key to the encrypted part needs to be revealed during the initramfs boot phase so the boot sequence can continue to the operating system. This also means that all the code running in the VM up to that point must be verified to prevent a possible attack from a compromised host trying to reveal the encryption key. For example, if the initramfs is not verified, then the host can try injecting some malicious code into it to steal the key.
The established standard for boot time code verification is UEFI SecureBoot, which guarantees that only code trusted by the platform vendor is executed when the system boots. Traditionally, in Red Hat Enterprise Linux 9, the SecureBoot scheme covers the following:
- First stage bootloader (shim): Signed by the Microsoft key, this allows for booting in different environments that have Microsoft CA enlisted in the UEFI Secure variables. Shim ships with Red Hat CA embedded so that other parts of the boot chain do not have to be signed by Microsoft.
- Second stage bootloader (GNU GRUB): Signed by the Red Hat key.
- Red Hat Enterprise Linux kernel: Signed by the Red Hat key.
Initramfs is not covered by SecureBoot as normally. It’s built on the target system to include all required drivers and tools. Additionally, the Linux kernel command line (cmdline) is normally set in the bootloader configuration, or provided directly in the bootloader. To add both of these artifacts to the SecureBoot coverage, Red Hat Enterprise Linux 9.2 introduces support for the Unified Kernel Image (UKI) technology. This image combines the Linux kernel, initramfs, and the kernel command line into one UEFI binary, which is signed by the Red Hat key.
Unified kernel image (UKI)
Red Hat Enterprise Linux unified kernel image (UKI) uses systemd-stub as its base building block. Essentially, systemd-stub is a simplistic UEFI application (PE binary) that unpacks the Linux kernel, the initramfs, and the kernel command line (all of which are present as UKI sections) and transitions into the Linux kernel. In Red Hat Enterprise Linux 9.2, the UKI image can be built with the same dracut tool normally used to build initramfs images, for example:
$ dracut --conf /path/to/dracut.conf --kver 5.14.0-284.10.1.el9_2.x86_64
–uefi --kernel-image /boot/vmlinuz-5.14.0-284.10.1.el9_2.x86_64
–kernel-cmdline="console=ttyS0" /tmp/uki.efi
A dedicated ukify tool to produce UKIs was introduced in systemd-253. This tool is expected to be shipped with Red Hat Enterprise Linux in the future.
The main benefit of the UKI is that it can be wholly signed by the operating system vendor key, extending SecureBoot protection to the initramfs and the kernel command line. As of Red Hat Enterprise Linux 9.2, the Red Hat build infrastructure incorporates a UKI build into the regular kernel build process. The resulting image is shipped as a separate kernel-uki-virt package. With some adjustment to the boot sequence, you can use kernel-uki-virt instead of the traditional kernel-core to boot a UEFI-booted VM.
Normally, the Linux kernel is loaded by the second stage bootloader (GNU GRUB, in Red Hat Enterprise Linux 9). However, GRUB doesn’t currently support booting UKI. The work for support is happening in the upstream community, but even then GRUB may not be the best choice on a CVM. It is, in some ways, too "powerful", allowing the user to perform many things before transitioning to the Linux kernel. This includes modifying kernel parameters, accessing files on different volumes, and more. All of these actions would require auditing to ensure that confidentiality is preserved.
Red Hat Enterprise Linux 9.2 uses a simpler boot scheme for an Azure CVM. It boots a UKI directly from shim:
In this scenario, boot entries are managed from within the booted operating system with the efibootmgr tool. For example, to add a new UKI boot entry:
$ sudo efibootmgr -c -d /dev/sda -p 2 \ -L “TestUKI” -l “\EFI\redhat\shimx64.efi” \ -u “\EFI\Linux\vmlinuz-5.14.0-284.11.1.el9_2.x86_64-virt.efi”
To create the initial boot entry to be used upon first boot of the VM, you can use the shim fallback feature. This feature requires a CSV file (/boot/efi/EFI/redhat/BOOTX64.CSV) containing:
- Binary to boot (located in /boot/efi/EFI/redhat).
- Entry you want to create (this will be visible in UEFI boot menu).
- Path to the UKI as shim’s parameter. Generally, this can be anything that the booting binary (the first parameter you provided) accepts as an argument.
- Human-readable description.
For example:
shimx64.efi,redhat,\EFI\Linux\vmlinuz-5.14.0-282.el9.x86_64-virt.efi ,UKI
Upon the first boot, and when no valid boot entries are present in UEFI variables, shim uses this data to create a new boot entry, and reboot the system. The next boot uses the specified UKI.
Red Hat Enterprise Linux 9.2 Azure CVM images include an experimental rhel-cvm-update-tool package to automatically add new UKIs when the kernel-uki-virt package is updated. The tool is triggered automatically upon a kernel-uki-virt package update. When a new UKI is installed, it creates a new boot entry in the UEFI variables and changes the boot order. Also, the tool is responsible for ensuring that the newly installed UKI is able to unlock the encrypted root volume upon the next boot.
While UKI is a powerful concept for improving boot time security, it also brings some limitations. In particular, both the initramfs and the kernel command line become immutable and cannot be changed on the target system without rebuilding the UKI.
Traditionally, initramfs is built on a target system to add drivers required to discover and mount the root volume. The kernel-uki-virt package ships with the most common storage drivers for virtualized and cloud deployments: NVMe, Virtio, VMbus and Xen. It also includes drivers for Ext4, XFS, and VFAT filesystems.
The most common reason to change the kernel command line is to specify the UUID (or a device name) for the root volume, for example:
root=UUID=3a0ba52e-fd50-486f-afe7-b1faaa791e17
Because this is not possible when UKI is used, Red Hat Enterprise Linux 9.2 relies on the systemd-gpt-auto-generator systemd feature to discover required partitions on the root volume. The discovery is done by observing partition type GUIDs (not filesystem UUIDs!). The minimal set for a guest:
EFI System Partition (/boot/efi), GUID: c12a7328-f81f-11d2-ba4b-00a0c93ec93b x86_64 Root Partition, GUID: 4f68bce3-e8cd-4db1-96e7-fbcaf984b709
When the root partition volume contains a LUKS container, it is automatically unlocked with the systemd-cryptsetup tool. For an Azure CVM, the key to unlock the partition comes from a sealed (encrypted) object stored in the LUKS metadata. The object is unsealed (decrypted) by the vTPM device when the system is discovered to be in a known good state. For a Red Hat Enterprise Linux 9.2 Azure CVM image, the root partition has an Ext4 filesystem when confidential disk encryption is not used and contains a LUKS container otherwise.
The upstream systemd community is currently working on a “signed extensions” mechanism to make it possible to customize the kernel command line or the initramfs for specific use-cases when these objects can’t remain the same for all deployments and have to be changed on the target system.
Initial root volume key sealing
With confidential disk encryption in Azure, the root volume is encrypted before the image is placed in the particular host where the CVM is intended to run. This is done to prevent a malicious or compromised host from modifying a VM’s settings or executables, creating a backdoor for stealing data during runtime. The key to the root volume must also be protected from the host. In particular, it must not be revealed to the VM without checking that it is in a known good state. For example, when a VM is booted using the traditional kernel and not the UKI, revealing the key to the root volume is dangerous because a malicious or compromised host can alter the initramfs loaded from the unencrypted storage. Just verifying that the VM was booted with SecureBoot enabled is insufficient, because SecureBoot protection only covers the kernel itself.
To verify that the booting VM is in a known good state, Red Hat Enterprise Linux 9.2 CVM image for Azure uses a “measured boot” process:
The key to the root volume is stored as a sealed TPM object in LUKS metadata along with the PCR policy. This allows the target vTPM device to unseal it if and only if values of the selected PCR registers match expectation. In Red Hat Enterprise Linux 9.2 Azure CVM image, PCR4 and PCR7 are the required set.
- PCR7 keeps the information about the SecureBoot state, SecureBoot variables, and all certificates used to sign the shim and UKI. This guarantees that the system was booted with SecureBoot enabled, and that the certificates used to verify shim and UKI binaries are not changed.
- PCR4 contains hashes for the shim and UKI binaries that were loaded. This guarantees that the traditional RHEL kernel with an arbitrary initramfs cannot be used to extract the root volume key.
To perform the initial encryption of Red Hat Enterprise Linux CVM image, and to seal the root volume key, Microsoft Azure uses an experimental encrypt-rhel-image tool. The image must have the kernel-uki-virt package installed, and must use a partitioning scheme compatible with systemd-gpt-auto-generator. When encrypted, the root volume is converted to a LUKS container with a special tpm2-import token:
$ sudo cryptsetup luksDump /dev/sda3 LUKS header information Version: 2 Epoch: 10 Metadata area: 16384 [bytes] Keyslots area: 33521664 [bytes] UUID: b0612dd1-5ca2-4287-a0ae-12626fbba093 … Keyslots: 1: luks2 Key: 256 bits Priority: normal Cipher: aes-xts-plain64 … Tokens: 1: tpm2-import Keyslot: 1 …
Because the format of remotely sealed TPM objects is different from locally created ones, this token has to be converted to the standard systemd-tpm2 token, which is compatible with systemd-cryptsetup. This happens upon the first boot of the VM on the target host:
$ sudo cryptsetup luksDump /dev/sda3 LUKS header information Version: 2 Epoch: 12 Metadata area: 16384 [bytes] Keyslots area: 33521664 [bytes] UUID: b0612dd1-5ca2-4287-a0ae-12626fbba093 … Keyslots: 1: luks2 Key: 256 bits Priority: normal Cipher: aes-xts-plain64 … Tokens: 0: systemd-tpm2 tpm2-hash-pcrs: 4+7 tpm2-pcr-bank: sha256 tpm2-pubkey: (null) tpm2-pubkey-pcrs: n/a tpm2-primary-alg: ecc tpm2-blob: 00 9e 00 20 db f5 23 78 93 f5 ab d1 61 14 05 f6 … tpm2-policy-hash: a8 2d ab b0 d2 bf eb 78 26 e1 07 ba db fe c6 05 … tpm2-pin: false Keyslot: 1 …
The upstream systemd community is currently working on adding remote sealing capabilities to the standard systemd-cryptenroll/systemd-cryptsetup tools so that the additional step of importing the token won’t be necessary in the future.
While PCR7 normally remains unchanged during the lifetime of the VM, PCR4 changes every time a kernel or shim package is updated. This requires the root volume key to be re-sealed against the new expected state of the PCR registers. This is done through creating a new LUKS keyslot and a new LUKS token containing the sealed TPM object and the PCR policy. To avoid the need to do this manually, Red Hat Enterprise Linux 9.2 CVM image for Azure ships with an experimental rhel-cvm-update-tool package.
Root volume resize
A VM on Microsoft Azure is normally provisioned from various VM images, including public Marketplace images, private Compute Galleries, private Virtual Machine Images, and so on. The image always has a certain (default) size but the desired size of the root volume may be larger. This is especially important for CVM because confidential disk encryption can slow down the initial VM provisioning when the VM image is using a large root volume size. Traditionally, the root partition of the VM is resized to the size of the root volume by using the growpart feature in the cloud-init package. In Red Hat Enterprise Linux 9.2, the cloud-init package was enhanced to add support for resizing encrypted partitions as well.
Resizing a LUKS container requires access to one of its encryption keys because the new block must be encrypted by the LUKS master key. This applies both to online (when a LUKS container is already open) and offline resizing. Currently, cloud-init consumes cleartext passwords stored in the /cc_growpart_keydata file:
# cat /cc_growpart_keydata {"key": "dWUxT2s4ZzF2T05FUHE1Q1dhcmlXVW1aOEFEYlh6NzU=", "slot": 2}
Note that the file with the cleartext password is stored on the encrypted volume so that it’s inaccessible to the host. The file and the corresponding LUKS keyslot is created at the time of root volume encryption. The keyslot must be different from the keyslot used for automatic root volume decryption with vTPM and both the /cc_growpart_keydata file. The corresponding keyslot is removed when the VM is booted for the first time and the data is consumed.
The systemd project offers a pair of tools that can also be used for the same purpose: systemd-repart and systemd-growfs. When used during the initramfs boot phase, the key to resize a LUKS container can be obtained using vTPM, eliminating the need to have a file with a cleartext password. This solution seems to be superior to using cloud-init’s growpart feature, and is expected in future versions of Red Hat Enterprise Linux 9.
Red Hat Enterprise Linux 9.2 CVM Technology Preview limitations
Red Hat Enterprise Linux 9.2 CVM image for Azure is provided as a Technology Preview. We want to gather as much feedback as possible before recommending to use it for production workloads. Also, the image comes with these limitations:
- Kdump technology is unsupported. We are working on adding support for UKIs to the kexec-tools package.
- Kernel command line in the UKI does not support customization. The upstream systemd community is working on a “signed extensions” mechanism to make it possible.
- The experimental rhel-cvm-update-tool package is not shipped as part of Red Hat Enterprise Linux, so updates to it require manual actions.
We are actively working on overcoming these limitations and making Azure CVM images a fully supported technology in future release of Red Hat Enterprise Linux 9.
Conclusion
In this article, I’ve reviewed the changes made in the Red Hat Enterprise Linux 9.2 release in support of the Red Hat Enterprise Linux CVM TechPreview image. I discussed the importance of confidential disk encryption technology and how Red Hat Enterprise Linux enables it using SecureBoot, Measured boot, and UKI technologies. In my next article, I’ll describe in detail how to launch a CVM on Microsoft Azure using Red Hat Enterprise Linux CVM TechPreview image.