Apache Kafka at scale with blazing-fast PowerFlex
Mon, 21 Oct 2024 16:14:17 -0000
|Read Time: 0 minutes
Apache Kafka is a leading messaging system in many real-time data architectures and is widely adopted for its scalable, performant, and resilient architecture. However, its performance is also tightly coupled with the infrastructure running it, particularly the disk and storage performance.
In this blog, we describe how to build a hyper-converged Kafka, Kubernetes, and PowerFlex infrastructure that can scale beyond 1000MB/s messaging throughput.
Architecture
The architecture for the benchmark is Confluent Kafka KRaft performance deployment on Kubernetes, utilizing Dell PowerFlex hyper-converged infrastructure.
We used the following hardware:
- 8 PowerFlex nodes on Dell PowerEdge R650 Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz MEM 256 GB.
- 2 Dell TOR switches S5248F – OS10
And the following software versions:
- Apache Kafka: 7.5
- Kubernetes: 1.30.3
- RHEL: 9.4
- PowerFlex: 4.6
- PowerFlex CSI driver: 2.11
Volume performance testing
One of the most common ways to test the performance of a block device is to use FIO (Flexible I/O tester). The key metrics measured by FIO are the throughput, latency of the IOPS, queue depth and overall Read/Write performance. FIO can produce different types of I/O workloads, from sequential reads to random writes, and a mixture of random reads/writes, or pretty much any profile a user may want. The FIO project offers many different examples: https://github.com/axboe/fio/tree/master/examples.
For this benchmark we used kubestr to easily run a FIO job against a Kubernetes cluster.
kubestr fio -s powerflex --fiofile pflex-fio-randrw.fio
The following graph provides performance insights from a test run on PowerFlex hyper-converged infrastructure. The test simulates a Kafka workload using a profile with a 50% random read and 50% random write. A 1 MB block size is used, and the test is repeated with various queue depths (0, 4, 8, 16, 32, 64) to assess how the storage system handles the increased workload concurrency. You can find the FIO configuration here.
Kafka
Deployment
As mentioned above, Kafka has been deployed in the same Kubernetes infrastructure as PowerFlex. For the sake of simplicity, we deployed a three replica instance using the Confluent image for Apache Kafka in KRaft mode with a zero configuration setup. The deployment configuration is available here. For Red Hat OpenShift you must add the privileges to the Kafka Service Account as follows:
oc adm policy add-scc-to-user privileged -z kafka -n kafka
Topic Creation
The number of partitions is critical for achieving high throughput. More partitions allow for greater parallelism while decreasing the strain on users and brokers. In the following example, 108 partitions are used.
So, let’s create the perf-test topic:
kubectl exec -ti -n kafka kafka-0 -- \ kafka-topics \ --bootstrap-server kafka-headless.kafka.svc.cluster.local:9092 \ --create \ --topic perf-test \ --partitions 108 \ --replication-factor 3
If everything works, it returns:
Created topic perf-test.
Producer Benchmark
The following command simulates producer workload and evaluates Kafka producer’s performance in terms of message throughput, latency, and network utilization:
kubectl exec -ti -n kafka kafka-0 -- \ kafka-producer-perf-test \ --topic perf-test \ --record-size 512 \ --producer-props bootstrap.servers=kafka-headless.kafka.svc.cluster.local:9092 \ --throughput 60000 \ --num-records 54000000 \ --producer-props acks=all linger.ms=10
If it works, the output looks similar to this:
299998 records sent, 59987.6 records/sec (29.29 MB/sec), 1.2 ms avg latency, 6.0 ms max latency. 300114 records sent, 60010.8 records/sec (29.30 MB/sec), 1.2 ms avg latency, 8.0 ms max latency. 299998 records sent, 59999.6 records/sec (29.30 MB/sec), 1.2 ms avg latency, 6.0 ms max latency. 54000000 records sent, 59999.200011 records/sec (29.30 MB/sec), 1.21 ms avg latency, 269.00 ms max latency, 1 ms 50th, 4 ms 95th, 4 ms 99th, 5 ms 99.9th.
The Kafka producer performs effectively with efficient batching and minimal latency. Messages are evenly distributed between partitions. The table below provides the test results:
Records Per Second | Throughput | Avg. Latency | Overall Throughput |
59999 | 29.30 MB\sec | 1.21 ms | 87.9 MB\sec |
Consumer Benchmark
Similar to the producer performance test tool, Kafka distributes a consumer performance test tool. This utility evaluates the consumer's ability to process messages, test throughput, fetch efficiency, and measure consumption rates.
In this case, we will launch three consumers in parallel:
kubectl exec -ti -n kafka kafka-0 -- \ kafka-consumer-perf-test \ --broker-list kafka-headless.kafka.svc.cluster.local:9092 \ --topic perf-test \ --messages 53950000 \ --group=perf-test-1 \ --show-detailed-stats \ --hide-header \ --timeout 60000
kubectl exec -ti -n kafka kafka-1 -- \ kafka-consumer-perf-test \ --broker-list kafka-headless.kafka.svc.cluster.local:9092 \ --consumer.config /config \ --topic perf-test \ --messages 53950000 \ --group=perf-test-2 \ --show-detailed-stats \ --hide-header \ --timeout 60000
kubectl exec -ti -n kafka kafka-2 -- \ kafka-consumer-perf-test \ --broker-list kafka-headless.kafka.svc.cluster.local:9092 \ --topic perf-test \ --messages 53950000 \ --group=perf-test-3 \ --show-detailed-stats \ --hide-header \ --timeout 60000
During this test, a throughput of 1000 MB/sec was observed, indicating consumer performance. With a high throughput consumers can handle large volumes of messages.
Conclusion
Running Kafka on Dell PowerFlex with Kubernetes offers a highly scalable, resilient, and performant solution for managing real-time data streaming applications. The Dell PowerFlexsoftware-defined infrastructure, combined with Kubernetes container orchestration, ensures seamless scaling of the Kafka cluster, robust storage management, and high availability.
Powerflex's adaptability in handling IO-intensive workloads, along with its lifecycle management and integration with Kubernetes, provides an ideal infrastructure for deploying and managing Kafka, enabling enterprises to handle large volumes of data with minimal latency and operational complexities.
Authors: Florian Coulombel, Syed Abrar