A cloud-native infrastructure must accommodate a large, scalable mix of service-oriented applications and their dependent components. These applications and components are generally microservices-based. The key to sustaining their operation is to have the right platform infrastructure and a sustainable management and control plane. The reference design that this guide describes helps you specify infrastructure requirements for building an on-premises OpenShift Container Platform 4.6 solution. The following figure shows this design:
This architectural design recognizes four host types that make up every OpenShift Container Platform cluster: the bootstrap node, the control-plane nodes, the compute nodes, and the storage nodes.
Note: Red Hat official documentation does not refer to a CSAH node in the deployment process.
The CSAH nodes are not part of the cluster but they are required for OpenShift cluster administration and operation. CSAH nodes also provision DHCP, PXE, DNS, and HAProxy services for cluster operation. While a single CSAH node can be used for development and testing purposes, this approach does not provide resilient load-balancing. For resilient load-balancing, Dell Technologies recommends using two CSAH nodes running HAProxy and KeepAlived. Further, Dell Technologies strongly discourages directly logging in to a control-plane node to manage the cluster. The OpenShift CLI tool called oc and the authentication tokens that are required to administer the OpenShift cluster are installed on both CSAH nodes as part of the deployment process. For redundancy, it is recommended that you store backups of OpenShift authentication credentials outside the cluster.
Note: Control-plane nodes are deployed using immutable infrastructure, further driving the preference for an administration host that is external to the cluster.
The CSAH nodes manage the operation and installation of the container ecosystem cluster. Installation of the cluster begins with the creation of a bootstrap VM on the primary CSAH node to be used to install control-plane components on the nodes. The initial minimum cluster can consist of three nodes running both the control plane and applications, or three control-plane nodes and at least two compute nodes. OpenShift Container Platform requires three control-plane nodes in both scenarios.
Node components are installed and run on every node in the cluster; that is, on control-plane nodes and compute nodes. The components are responsible for all node run-time operations. Key components consist of:
Nodes that implement control-plane infrastructure management are called control-plane nodes. Three control-plane nodes establish a unified control plane for the operation of an OpenShift cluster. The control plane operates outside the application container workloads and is responsible for ensuring the overall continued viability, health, availability, and integrity of the container ecosystem. Removing control-plane nodes is not allowed.
OpenShift Container Platform also deploys additional control-plane infrastructure to manage OpenShift-specific cluster components.
The control plane provides the following functions:
Even though OpenShift Container Platform is resilient to node failure, it is recommended to take regular backups of the etcd data store. Because etcd backups are a blocking procedure, take them at off-peak hours in production environments. Keep in mind that when you update a cluster within minor versions (for example, from 4.6.2 to 4.6.3), you should take an etcd backup of the version of OpenShift Container Platform that is currently running on your cluster or clusters. Take etcd backups 24 hours after the cluster has been installed to let the initial rotation of certificates occur; otherwise, the etcd backup may contain expired certificates. For more information, see the Red Hat document .
Quorum requirements for etcd dictate that if enough control-plane nodes fail (and, as a result, most control planes are no longer operating), restoring from a previous cluster state becomes the only option for cluster recovery. For the cluster restoration steps, see . If most control-plane nodes are still operating, meaning that quorum can be achieved but no redundancy exists for further node failure, it is necessary to replace unhealthy etcd members. To perform this task, follow the steps in .
In an OpenShift cluster, application containers are deployed to run on compute nodes by default. The term “compute node” is arbitrary; nothing specific is required to run compute nodes, and applications can be run on control-plane nodes, if wanted. Cluster nodes advertise their resources and resource utilization so that the scheduler can allocate containers and pods to these nodes and maintain a reasonable workload distribution. The Kubelet service runs on all nodes in a Kubernetes cluster. This service receives container deployment requests and ensures that the requests are instantiated and put into operation. The Kubelet service also starts and stops container workloads and manages a service proxy that handles communication between pods that are running across compute nodes.
Logical constructs called MachineSets define compute node resources. MachineSets can be used to match requirements for a pod deployment to a matching compute node. OpenShift Container Platform supports defining multiple machine types, each of which defines a compute node target type.
Compute nodes can be added to or deleted from a cluster if doing so does not compromise the viability of the cluster. If the control-plane nodes are not designated as schedulable, at least two viable compute nodes must always be operating to run router pods that manage ingress networking traffic. Further, enough compute platform resources must be available to sustain the overall cluster application container workload.
Storage can be either provisioned from dedicated nodes or shared with compute services. Provisioning occurs on disk drives that are locally attached to servers that have been added to the cluster as compute nodes. For more information, see .
The Red Hat Data Services portfolio of solutions includes persistent software-defined storage (SDS) and data services that are integrated with and optimized for OpenShift Container Platform. As part of the portfolio, Red Hat OpenShift Data Foundation (formerly known as OpenShift Container Storage) delivers resilient and persistent SDS and data services based on Ceph, Rook, and NooBaa technologies.
Running as a Kubernetes service, OpenShift Data Foundation is engineered, tested, and certified to provide data services for OpenShift Container Platform on any infrastructure. OpenShift Data Foundation can be deployed within an OpenShift Container Platform cluster on existing worker nodes, infrastructure nodes, or dedicated nodes. Alternatively, OpenShift Data Foundation can be decoupled and managed as a separate, independently scalable data store, delivering data for one or many OpenShift Container Platform clusters. To streamline configuration options, Red Hat and Intel® have jointly developed three workload-optimized configurations for OpenShift Data Foundation external data nodes: edge, capacity, and performance. These configurations are optimized for Dell PowerEdge R750 servers, as described in Appendix A. It is also possible to use existing compute nodes if they meet OpenShift Data Foundation hardware requirements.
Note: At the time of publication of this guide, some Red Hat documentation and the operator and product interface of OpenShift Data Foundation may still use the product name OpenShift Container Storage for OpenShift Data Foundation.