Based on the changing nature of data analytics and the requirement to access large amounts of disparate data, organizations need a new approach to data access. Data warehouses provided access to structured data, and data lakes handled the unstructured and semistructured data. At this stage, it was far too difficult to access both stores as an integrated entity. A modern data stack combines the best of data warehouses and data lakes. It supports business intelligence and machine learning technologies in one platform that can store all types of data and provide it with a cloud-like, multiresource, self-service interface for data scientists. See A self-service modern data stack interface for data scientists.
This validated design offers an on-premises or co-located alternative to cloud-based modern data stacks. It includes Spark, Kafka, Delta Lake by Databricks, and Symcloud Platform, which provides a cloud-native, self-service, on-demand platform. The solution is built on Intel-based Dell PowerEdge servers, Dell PowerSwitch networking, and Dell PowerScale or Dell ECS storage. See The Dell Validated Design for Analytics — Modern Data Stack. It delivers the ability to organize, orchestrate and automate compute, storage, and environments for the user from a single interface. That interface speeds access to results by removing many of the hurdles that data scientists and data engineers traditionally encounter when working with infrastructure.
This open data management platform combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses. The platform enables business intelligence and machine learning on structured, semistructured, and unstructured data from a single source.