Headline
Confidential cluster: Running Red Hat OpenShift clusters on confidential nodes
This is the first of a series of articles in which we will share how confidential computing (a set of hardware and software technologies designed to protect data in use) can be integrated into the Red Hat OpenShift cluster. Our goal is to enhance data security, so all data processed by workloads running on OpenShift can remain confidential at every stage.In this article, we will focus on the public cloud and examine how confidential computing with OpenShift can effectively address the trust issues associated with cloud environments. Confidential computing removes some of the barriers that high
This is the first of a series of articles in which we will share how confidential computing (a set of hardware and software technologies designed to protect data in use) can be integrated into the Red Hat OpenShift cluster. Our goal is to enhance data security, so all data processed by workloads running on OpenShift can remain confidential at every stage.
In this article, we will focus on the public cloud and examine how confidential computing with OpenShift can effectively address the trust issues associated with cloud environments. Confidential computing removes some of the barriers that highly regulated and defense organizations face when considering public cloud adoption by tackling critical data privacy and security concerns. Specifically, we will discuss some of the common use cases where confidential clusters can be deployed. Red Hat is making a continuous investment in confidential computing, introducing support for a range of use cases in phases. This article describes what’s currently available with the OpenShift cluster and will touch on what to expect in the future.
We are assuming that the reader has some background knowledge of confidential computing and OpenShift. For additional details on those, we recommend reading confidential computing primer and Red Hat OpenShift Overview.
What is a confidential cluster?
Enterprises and users are increasingly drawn to cloud services’ convenience, flexibility and scalability. Cloud providers offer virtual machines (VMs) to isolate one tenant’s workloads from another tenant’s workloads. While a traditional VM can isolate the workload from other VMs running on the same host, it is still vulnerable to confidentiality-breaches from the hypervisor, virtual machine monitor and the host. Securing sensitive data from external and internal threats when running workloads on infrastructure managed by a third party—the cloud provider—remains a challenge.
One of the solutions to this challenge lies in the use of confidential computing. In the existing security model, we already have ways to implement security controls for data in transit (e.g. https, TLS, etc.) and when the data is at rest (e.g. encrypted disks). Confidential computing is a hardware-based technology that provides confidentiality when the data is in use.
Today, all the prominent architectures either already have or will soon have support for confidential computing in the processor. For instance, on x86, Intel provides Trust Domain Extensions (TDX), and AMD offers SEV-SNP (Secure Encrypted Virtualization—Secure Nested Pages—which supersedes its previous generations of the technologies SEV-ES and SEV). On s390x the feature is available as Secure Execution (SE), Power has a Protected Execution Facility (PEF), ARM is developing the Confidential Compute Architecture (CCA), and finally Confidential VM Extensions (COVE) were suggested for RISC-V platform. While the name, design and implementation vary between the manufacturers and the architectures, at the core, all are meant to solve the same problem.
Confidential computing leverages a Trusted Execution Environment (TEE) to protect the data when in use. A TEE is a confidential trust boundary of memory and CPU, that is trusted and protected. The hardware protects the TEE by isolation (limiting access from outside the TEE) and/or encryption. Therefore, data processed (data-in-use) within the TEE, and the code processing it, are protected.
A confidential virtual machine (CVM) acts as TEE, excluding the host, hypervisor and VM monitor from its trust boundaries. The diagram below illustrates the trust boundary changes for a user application running in a CVM.
Several security technologies are employed to protect the CVM and verify its integrity. Only trusted, digitally signed bootloaders and kernels are permitted to run on the CVM through a process known as Secure Boot. Additionally, the hardware generates a signed attestation report to confirm that the CVM is operating in confidential mode. A virtual Trusted Platform Module (vTPM) can also be utilized to measure and log components running at boot time, such as the bootloader, kernel and their signatures, a process referred to as “measured boot.”
A key concept in confidential computing is remote attestation. In simple terms, attestation provides proof that the CVM is indeed confidential and trustworthy before any sensitive data is loaded into its memory. With remote attestation, the CVM sends signed evidence to an attestation server, demonstrating its confidentiality. The server then verifies this evidence to determine whether the CVM can be trusted. This evidence includes the signed attestation report from the hardware, the signed measurements log from the vTPM (if available), and any other relevant items to be attested. Sensitive data (including keys and certificates) can be fetched after a successful attestation.
In an OpenShift cluster with confidential computing enabled, all nodes run on CVMs, so the memory for workloads and their management services is automatically hidden from the host. Further, every new node has to go through remote attestation similar to CVM. Upon successful attestation the node is trusted, secrets are shared and it is allowed to join the cluster (we will dive deeper into attestation for confidential clusters in one of our following articles).
This integration allows users to benefit from the scalability, flexibility and convenience of cloud services through OpenShift while retaining control over their sensitive data. By isolating the OpenShift cluster from the underlying infrastructure, the threat vector is significantly reduced. Consequently, workloads can be executed more securely throughout their entire lifecycle, minimizing the risk of exposure to the cloud provider or unauthorized parties. From the application owner’s perspective, the experience is nearly seamless, allowing them to launch a workload with confidential computing just as they would in a standard OpenShift cluster.
This opens up new possibilities for industries storing and processing sensitive information, such as financial services, healthcare and government agencies. They can now migrate their workload from on-prem to the cloud and also execute in the cloud in accordance with regulatory requirements. Some of these use cases are discussed in detail in the following section.
Main use cases for confidential cluster on public cloud
Talking to our customers, we are seeing several use cases for confidential clusters. The two main ones being Digital Sovereignty and secure cloud burst.
Digital Sovereignty
With the current state of cross-border security and increasing threats to private and governmental organizations, the risk of migrating workloads into the cloud has increased drastically. This has resulted in bringing Digital Sovereignty (for definition please refer to: Digital Sovereignty: Why it pays to focus on independence in digital transformation - PwC) into the spotlight, especially for public IT organizations. Additionally, an emphasis is given to the implementation of zero trust (for definition please refer to: What is Zero trust?). To achieve these objectives, we need a security environment that standardizes and continuously evaluates controls and their implementations. Confidential computing can support both of these use cases.
For Digital Sovereignty, it is important to identify a way to use the cloud compute power with all the advantages of that environment without the cloud concentration risk that comes with it. To do this, procedures need to be implemented which allow migrating from one environment to another while maintaining the necessary high level of security. With the help of confidential computing and especially with the help of confidential clusters, it is possible to move a complete environment from one cloud provider to another without sacrificing any security controls.
Additionally, zero trust is a base for Digital Sovereignty. Confidential computing adheres to this principle with the use of the remote attestation. Whenever a new cluster is moved or, more precisely, a new cluster is created (and the old cluster is destroyed), this new environment is remotely attested, making sure that the environment is running in a confidential mode. Deploying a confidential cluster helps to enable a zero trust architecture.
The ability to move a component, application or cluster from one cloud to the other is also part of several recent regulatory requirements, such as the European Digital Operational Resiliency Act (DORA). DORA requires that data in use is secured. For additional reference, please refer to Chapter II, Section II, Article 8, Paragraph 2 and 3 of DORA and the definition of securing data in use.
DORA also forces companies to document and test their exit strategy from one cloud provider to another in case of a catastrophic failure of the cloud environment. This means companies have to be able to move their whole cloud infrastructure from one environment (cloud provider) to another. OpenShift makes it easier to move components, applications and clusters between cloud providers. OpenShift also provides the same experience for developers and admins regardless of where OpenShift is deployed so they don’t have to spend time learning the nuances of each cloud provider’s Kubernetes distribution. Similarly, OpenShift makes it easier to move confidential clusters between cloud providers.
Secure cloud burst
There are different use cases when it comes to secure cloud burst scenarios in the public cloud. For example, when the compute requirements are not foreseeable, you need the ability to dynamically grow your available resources. In that case, growing into the public cloud is the easiest option. When you take advantage of the use of consumption based pricing models (like for GPUs) secure cloud burst becomes a cost sensitive model.
Different GPU providers have implemented the extension of a confidential computing model to the GPUs. For instance NVIDIA’s implementation—Confidential Computing on NVIDIA H100 GPUs for Secure and Trustworthy AI. Confidential computing becomes an essential technology if you want to securely cloud burst a whole application or even a whole cluster. This usually requires creating a new cluster, deploying the applications and then destroying the original cluster. When doing this with OpenShift, the advantage is that there is almost nothing you need to change in the configuration of the applications or cluster. The secure environment is being provided by the confidential cluster.
Current availability and future work
We are rolling out confidential clusters as a solution in multiple phases to enable potential customers and interested users to try it and provide early feedback. These phases are currently planned as follows:
Phase1
In Phase 1 users are able to deploy OpenShift clusters leveraging confidential computing technologies on:
- GCP with OpenShift 4.13 (available as Generally Available with AMD SEV-ES, the support was tested with the instance type n2d-standard-8)
- Azure with OpenShift 4.14 (available as Technology Preview with AMD SEV-SNP, the support was tested with the instance type Standard_DC8ads_v5)
- No attestation support is available for any of the platforms at the moment
Phase 2
In Phase 2 we will extend the solution to include remote attestation capabilities, which will enable the cluster to be deployed as fully confidential and potentially extend its availability on other cloud platforms.
Phase 3
In Phase 3 we will introduce confidential clusters as a supported product with full support on both Azure and GCP with AMD SEV-SNP and Intel TDX.
More details for the next phases will be shared in future articles.