In today’s data-driven world, organizations face the challenge of efficiently managing and delivering value from vast and diverse datasets. To address this challenge, a robust data management solution is essential.
This validated design offers an end-to-end on-premises or co-located modern data stack; an alternative to cloud-based modern data stacks. It includes the partnership between Dell Technologies and Red Hat to deliver a modern data platform powered by Dell infrastructure. This collaboration combines the latest generation Dell PowerEdge servers and software-defined storage with Red Hat OpenShift Container Platform, a cloud platform layer for the modern data stack. It also includes a suite of cutting-edge data management technologies like Delta Lake, Apache Iceberg, Delta Sharing, Data Mesh, and the Starburst query engine.
These integrated technologies enable customers to process, store, and analyze large-scale data workloads efficiently and cost-effectively. It empowers organizations to perform advanced analytics, including machine learning and artificial intelligence to get valuable insights and make sound business decisions.
The Dell Validated Design for Analytics — Modern Data Stack solution combines the strengths of PowerEdge servers, Dell object storage such as PowerScale, ECS, and ObjectScale, Red Hat OpenShift Container Platform, and advanced data technologies that include:
- Delta Lake and Apache Iceberg—Delta Lake and Apache Iceberg are ideal table formats for a modern data management platform. Each format has its own strengths and weaknesses. Delta Lake is best suited for data lakes and data pipelines. Iceberg is ideal for data warehousing and analytics with large tables in tens of petabytes of data.
- Delta Sharing—The industry’s first fully secured and compliant open-source protocol for data sharing. It enables organizations to seamlessly share existing large-scale datasets based on open-source formats Apache Parquet and Delta Lake without moving data while maintaining data consistency, security, and governance. Dell has successfully validated Delta Sharing with a Databricks instance on AWS. More information about how Dell validated Delta Sharing can be found in the Dell white paper Power Multicloud Data Analytics using Dell ECS and Databricks.
- Data mesh—A new data architecture that treats data as a product and each domain or business unit is responsible for managing and owning its data. By implementing a data mesh architecture, organizations can associate distinct data architecture designs across various hyperscalers or on-premises. These designs enable each team to leverage the best technology for their domain while ensuring compatibility and cohesiveness at the company level.
- Starburst query engine—Starburst is a fully supported and enterprise-grade distributed SQL query engine designed for high-performance analytics. It allows users to query large amounts of data stored in various data sources throughout an organization using standard SQL syntax. One of the key features of Starburst is its ability to run queries across different data sources simultaneously, in the same query. These sources include relational databases, NoSQL databases, object storage systems, and more. Response times are fast enough to support real-time analysis.
Some key benefits of this solution include:
- Red Hat OpenShift Container Platform—The industry-leading hybrid cloud application platform powered by containers and Kubernetes. It enables a cloud-native development environment together with a cloud operations experience, giving you the ability to choose where you build, deploy, and run applications, all through a consistent interface.
- Unified data repository—Centralize data from diverse sources into a single, scalable repository, and simplifying data access while supporting open table formats.
- Scalable and performant—Dell PowerEdge servers and object storage provide the scalability and performance needed to handle growing data volumes and analytics workloads.
- Advanced analytics—Enable data-driven decision-making with efficient analytics powered by the Starburst query engine.
- Time travel and schema evolution—Leverage time travel and schema evolution capabilities of Delta Lake and Apache Iceberg for agile data management.
The following figure shows a modern data stack architecture on Dell infrastructure.