The Dell Validated Design for Analytics — Modern Data Stack has been developed to address the needs of organizations deploying advanced analytics, and AI and ML workloads. It incorporates the concepts of a modern data stack architecture along with a container platform using decoupled compute and storage.
This document provides design guidance for data analytics infrastructure managers and architects by describing a predesigned, validated, and scalable architecture for advanced analytics and machine learning on Dell hardware infrastructure. Topics that were discussed include:
- The cluster architecture that was designed for this application, including cluster server and storage infrastructure and its role in the system
- The cluster physical and logical network designs
- Details of the PowerEdge server; PowerScale, ECS, ObjectScale, and PowerStore storage; and PowerSwitch networking configurations
- The recommended software infrastructure components that were used in the architecture, including the Symcloud Platform, Starburst Enterprise, and open table formats - Delta Lake and Apache Iceberg
- Examples of workload packaging, deployment, and validation, including Apache Spark, Apache Kafka, Starburst Enterprise, and Databricks Delta Sharing
- Cluster sizing and scaling guidance