Software components of the Dell Validated Design for Analytics — Modern Data Stack system include:
- Red Hat OpenShift Container Platform is the industry-leading hybrid cloud application platform powered by containers and Kubernetes. It enables a cloud-native development environment together with a cloud operations experience, giving you the ability to choose where you build, deploy, and run applications, all through a consistent interface.
- Starburst Enterprise is a fully supported and enterprise-grade distributed SQL query engine designed for high-performance analytics. It provides data virtualization and query federation across the enterprise, making it possible for all types of users to derive insights from various data sources.
- Delta Lake is an open table format that provides high-performance ACID table storage over object stores. Delta Lake leverages transaction logs and versioned Parquet files to enable update, insert, and delete operations on its tables.
- Apache Iceberg is an open table format designed for large, petabyte-scale tables. It uses SQL tables for big data, while making it possible for engines such as Spark or Trino to safely work with the same tables simultaneously.
- Apache Spark is an open-source, multipurpose unified analytics engine for large-scale data processing. It can be used for several use cases, but the most popular use case is to process streaming data in real time.
- Apache Kafka is a data processing platform that allows multiple client applications to publish and subscribe to real-time information with a scalable, distributed message broker architecture.
Partner software details shows the software components used for the modern data stack.
Type | Partner | Version |
Kubernetes platform | Red Hat OpenShift | 4.12 |
SQL query engine | Starburst Enterprise | 422-e |
Data processing engine | Apache Spark | 3.4.1 |
Streaming processing platform | Apache Kafka | 3.5.1 |
Open table format | Delta Lake | 2.4.0 and 3.0.0rc1 |
Apache Iceberg | 1.3.1 |