This document presents the key concepts of the solution, namely the modern data stack architecture with open table formats, the container platform, and the use of decoupled compute and storage. It describes the solution architecture and components, including Dell infrastructure and node configurations that support the functions of the modern data stack. The Dell storage configuration options for the modern data stack are described, including PowerStore for container runtime storage, PowerScale for HDFS object storage, and ECS and ObjectScale for S3 object storage.
In addition to the modern data stack, the container platform architecture and components are described. The container platform in this validated design is Red Hat OpenShift – the industry-leading hybrid cloud application platform powered by containers and Kubernetes. Other industry-leading container platforms, such as SUSE Rancher or Symcloud, could also be options.
Going beyond the Dell hardware infrastructure and Red Hat OpenShift Container Platform, other software components that were validated include the Starburst query engine, Apache Spark, Apache Kafka, and open table formats like Delta Lake and Apache Iceberg.
Lastly, guidelines are presented for sizing and scaling the solution based on various workload requirements.