Processing large amounts of data creates a significant amount of operational overhead after a while, especially when updates are made to existing data. This scenario is where a modern data stack can help, since it can process and upsert data incrementally. There are many features that Delta Lake and Iceberg provide on top of a data lake. One of the most powerful and useful features is to include relational database management system (RDBMS)-like features, such as ACID transactions, in a data lake. An existing data lake becomes a transactional data lake or modern data stack when any of these frameworks are added to it.
There are two common storage formats for big data workloads:
- Delta Lake storage format
- Iceberg storage format