Home > Workload Solutions > Data Analytics > White Papers > Change Data Capture on Dell Data Lakehouse using Debezium > Architecture Components
Federated Data sources: Various databases where CDC events originate. Debezium captures changes such as inserts, updates, and deletions from these federated source databases.
CDC Platform: Debezium extracts transactional changes from the federated databases. It can integrate with multiple messaging systems beyond Kafka, such as Apache Pulsar, Kinesis, or RabbitMQ.
Debezium Iceberg Consumer: A custom sink adapter built using Debezium’s Service Provider Interface (SPI). It consumes CDC events from Debezium, processes them, and writes the changes to Apache Iceberg tables.
Data Storage: Iceberg tables serve as the destination for the CDC data. Iceberg provides scalable and high-performance storage with features like schema evolution and time-travel. Each Debezium source topic maps to an individual Iceberg table, allowing for organized and efficient data management.
Analytical Processing: The DDAE performs high-performance data analysis and querying. It interacts with the Hive Metastore to access and manage the metadata associated with the Iceberg tables.
Metadata Management: The Hive Metastore acts as a central repository for metadata related to Iceberg tables. It maintains schema information, ensures data consistency, and supports schema evolution, providing seamless integration with various analytical tools.
Scalable Storage: Raw and processed data, including Iceberg tables, are stored in object storage. This provides a cost-effective and elastic solution for handling large volumes of data.