Kubernetes has become a popular open-source platform for running containerized workloads. This reference architecture uses Kubernetes for many reasons. The primary factors were:
From a software perspective, Kubernetes has some parallels with the Linux kernel. Although capable on its own, it is highly configurable and extensible. Some extensions are typically desirable for a fully functional production system. There are plug-in components with multiple alternative implementations, and some integration with external systems (such as authentication) is typically required.
This similarity has led to the development of multiple Kubernetes distributions, which provide different perspectives on how Kubernetes should be configured, deployed, managed, and integrated with existing systems. Some of these distributions include Pivotal Container Service (PKS), Docker Enterprise, Rancher Kubernetes Engine, Ubuntu Kubernetes, and Red Hat OpenShift Container Platform.
Spark can be run under any of these Kubernetes distributions. In general, the Spark runtime environment is uniform across all the Kubernetes platforms, although the underlying details are different. This reference architecture uses Red Hat OpenShift Container Platform as its reference platform.