Kafka development began primarily to address several limitations of messaging systems, including queuing and the publish-subscribe patterns that are already well known and widely used by enterprise software architects. It is easy to see the relationship to prior architectures. The unit of data in Kafka is a still called a message, and Kafka messages are organized into topics that contained ordered lists of messages. Data Producers publish their messages to a topic, and Consumers subscribe to a topic. It is not hard to understand why even the newest versions of Apache Kafka are still compared with traditional messaging systems.
The goals for developing Kafka involved more than just creating a better messaging system. Even with the similarities noted above, Kafka has several important advancements that have contributed to the creation of a new class of software called streaming platforms. Kafka improved and extended the prior best practices of messaging systems by defining:
The flexibility of a Kafka-based streaming platform supports many streaming architectures. Common architectural patterns include microservice interfaces for existing or legacy systems, including:
This reference architecture is focused on applications for IIoT only. It focuses on replication, reliability, and stream processing for analytics at the edge.
The Kafka Streams API is a powerful, lightweight library that enables real-time data processing. Confluent KSQL is an alternative open source, stream processing API with SQL-like programming and storage. It provides an easy-to-use, yet powerful, interactive SQL interface for stream processing on Kafka. KSQL is scalable, elastic, fault-tolerant, and it supports a wide range of streaming operations, including:
MirrorMaker is a related open-source project that provides geo-replication support for Kafka clusters. MirrorMaker replicates events across multiple data centers or cloud regions. Common scenarios include:
For this IIoT solution, Dell EMC chooses to perform replication from the edge to the core using Confluent Replicator. Confluent Replicator is a more complete solution that handles topic configuration and data, and integrates with Kafka Connect and Control Center to improve availability, scalability, and ease of use.