Home > Workload Solutions > Data Analytics > Guides > Design Guide—Dell Validated Design for Analytics—Data Lakehouse > Concepts
Lakehouse architecture gained popularity in the late 2010s as an evolution of the well-established data warehouse and data lake architectures. The architecture provides the most significant capabilities of both data warehouses and data lakes in a single system, reducing cost and complexity without compromising functionality.
There is no formal definition of lakehouse architecture. It primarily describes a system that combines the open file formats and cost-effective scalable storage of data lakes with the ACID transactions and table-oriented schema definitions of data warehouses.
Lakehouse architecture is often based on a modern table format such as Delta Lake or Apache Iceberg. This format provides a table abstraction above the underlying storage layer in the lakehouse. The other key features of a lakehouse are:
The Dell Validated Design for Analytics — Data Lakehouse is built on a container platform. This approach provides agility, flexibility, and scalability while supporting diverse analytics workloads.
The container platform abstracts the machine level details and host operating system dependencies, exposing them as a pool of compute, storage, and communications resources. The platform also provides application orchestration capabilities to streamline the deployment and management of analytics workloads.
Applications on the platform use the industry standard Open Container Initiative (OCI) image format. The images include all the required software components, isolating them from operating system dependencies. These images are run in containers, providing a high level of run-time isolation from other applications.
Analytics workloads are packaged into application bundles that combine application images and deployment specifications. The bundles simplify application launching and specification of resource requirements.
The container platform is Symcloud Platform, which is based on Kubernetes.
This validated design for data analytics decouples compute and storage resources. This approach provides increased resource utilization, increased flexibility, and lower costs.
The system supports independent provisioning of storage and compute and enables the use of heterogenous storage and compute resources. This design provides better balance between storage and compute for varying workloads.
Decoupling also simplifies the life cycle and management of the system by allowing independent management, scaling, and upgrades of storage and compute resources.
The design also separates runtime storage from data lake storage. Symcloud Storage provides runtime storage. PowerScale with the HDFS protocol, or ECS with the S3 protocol, provides data lake storage.