The Dell Validated Design for Analytics — Data Lakehouse has been developed to address the needs of organizations deploying advanced analytics. It incorporates the concepts of a lakehouse architecture along with a container platform using decoupled compute and storage.
This document provides design guidance for data analytics infrastructure managers and architects by describing a predesigned, validated, and scalable architecture for advanced analytics on Dell hardware infrastructure. Topics that were discussed include:
- The cluster architecture that was designed for this application, including cluster server and storage infrastructure and its role in the system
- The cluster physical and logical network designs
- Details of the PowerEdge server, PowerScale storage, ECS storage, and PowerSwitch networking configurations
- The recommended software infrastructure components that are used in the architecture, including the Symcloud Platform
- Examples of workload packaging, deployment, and validation, including Apache Spark and Apache Kafka
- Cluster sizing and scaling guidance