Modern data stack architecture gained popularity in the late 2010s as an evolution of the well-established data warehouse and data lake architectures. The architecture provides the most significant capabilities of both data warehouses and data lakes in a single system, reducing cost and complexity without compromising functionality.
There is no formal definition of modern data stack architecture. It primarily describes a system that combines the open file formats and cost-effective scalable storage of data lakes with the ACID transactions and table-oriented schema definitions of data warehouses.
Modern data stack architecture is often based on a modern table format such as Delta Lake or Apache Iceberg. This format provides a table abstraction above the underlying storage layer in the modern data stack. The other key features of a modern data stack are:
- ACID transaction support for inserts, updates, and deletes
- Scalable metadata
- Schema enforcement and evolution
- Support for diverse datatypes ranging from unstructured to structured data
- Data versioning or time travel capability
- Support for SQL access
- Support for direct table access through APIs such as DataFrames
- Support for scalable storage using open file formats