Apache Kafka is a leading messaging system in many real-time data architectures and is widely adopted for its scalable, performant, and resilient architecture. However, its performance is also tightly coupled with the infrastructure running it, particularly the disk and storage performance.

In this blog, we describe how to build a hyper-converged Kafka, Kubernetes, and PowerFlex infrastructure that can scale beyond 1000MB/s messaging throughput.

Architecture

The architecture for the benchmark is Confluent Kafka KRaft performance deployment on Kubernetes, utilizing Dell PowerFlex hyper-converged infrastructure.

We used the following hardware:

8 PowerFlex nodes on Dell PowerEdge R650 Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz MEM 256 GB.
2 Dell TOR switches S5248F – OS10

And the following software versions:

Apache Kafka: 7.5
Kubernetes: 1.30.3
RHEL: 9.4
PowerFlex: 4.6
PowerFlex CSI driver: 2.11

Volume performance testing

One of the most common ways to test the performance of a block device is to use FIO (Flexible I/O tester). The key metrics measured by FIO are the throughput, latency of the IOPS, queue depth and overall Read/Write performance. FIO can produce different types of I/O workloads, from sequential reads to random writes, and a mixture of random reads/writes, or pretty much any profile a user may want. The FIO project offers many different examples: https://github.com/axboe/fio/tree/master/examples.

For this benchmark we used kubestr to easily run a FIO job against a Kubernetes cluster.

kubestr fio -s powerflex --fiofile pflex-fio-randrw.fio

The following graph provides performance insights from a test run on PowerFlex hyper-converged infrastructure. The test simulates a Kafka workload using a profile with a 50% random read and 50% random write. A 1 MB block size is used, and the test is repeated with various queue depths (0, 4, 8, 16, 32, 64) to assess how the storage system handles the increased workload concurrency. You can find the FIO configuration here.

GB/s queue depth chart - achieved 10 GB/s bandwidth with 20 k IOPS

IOPS and latency from the PowerFlex Manager UI

Kafka

Deployment

As mentioned above, Kafka has been deployed in the same Kubernetes infrastructure as PowerFlex. For the sake of simplicity, we deployed a three replica instance using the Confluent image for Apache Kafka in KRaft mode with a zero configuration setup. The deployment configuration is available here. For Red Hat OpenShift you must add the privileges to the Kafka Service Account as follows:

oc adm policy add-scc-to-user privileged -z kafka -n kafka

Topic Creation

The number of partitions is critical for achieving high throughput. More partitions allow for greater parallelism while decreasing the strain on users and brokers. In the following example, 108 partitions are used.

So, let’s create the perf-test topic:

kubectl exec -ti -n kafka kafka-0 -- \
  kafka-topics \
  --bootstrap-server kafka-headless.kafka.svc.cluster.local:9092 \
  --create \
  --topic perf-test \
  --partitions 108 \
  --replication-factor 3

If everything works, it returns:

Created topic perf-test.

Producer Benchmark

The following command simulates producer workload and evaluates Kafka producer’s performance in terms of message throughput, latency, and network utilization:

kubectl exec -ti -n kafka kafka-0 -- \
  kafka-producer-perf-test \
  --topic perf-test \
  --record-size 512 \
  --producer-props bootstrap.servers=kafka-headless.kafka.svc.cluster.local:9092 \
  --throughput 60000 \
  --num-records 54000000 \
  --producer-props acks=all linger.ms=10

If it works, the output looks similar to this:

299998 records sent, 59987.6 records/sec (29.29 MB/sec), 1.2 ms avg latency, 6.0 ms max latency.
300114 records sent, 60010.8 records/sec (29.30 MB/sec), 1.2 ms avg latency, 8.0 ms max latency.
299998 records sent, 59999.6 records/sec (29.30 MB/sec), 1.2 ms avg latency, 6.0 ms max latency.
54000000 records sent, 59999.200011 records/sec (29.30 MB/sec), 1.21 ms avg latency, 269.00 ms max latency, 1 ms 50th, 4 ms 95th, 4 ms 99th, 5 ms 99.9th.

The Kafka producer performs effectively with efficient batching and minimal latency. Messages are evenly distributed between partitions. The table below provides the test results:

Records Per Second	Throughput	Avg. Latency	Overall Throughput
59999	29.30 MB\sec	1.21 ms	87.9 MB\sec

Consumer Benchmark

Similar to the producer performance test tool, Kafka distributes a consumer performance test tool. This utility evaluates the consumer's ability to process messages, test throughput, fetch efficiency, and measure consumption rates.

In this case, we will launch three consumers in parallel:

kubectl exec -ti -n kafka kafka-0 -- \
  kafka-consumer-perf-test \
  --broker-list kafka-headless.kafka.svc.cluster.local:9092 \
  --topic perf-test \
  --messages 53950000 \
  --group=perf-test-1 \
  --show-detailed-stats \
  --hide-header \
  --timeout 60000

kubectl exec -ti -n kafka kafka-1 -- \
  kafka-consumer-perf-test \
  --broker-list kafka-headless.kafka.svc.cluster.local:9092 \
  --consumer.config /config \
  --topic perf-test \
  --messages 53950000 \
  --group=perf-test-2 \
  --show-detailed-stats \
  --hide-header \
  --timeout 60000

kubectl exec -ti -n kafka kafka-2 -- \
  kafka-consumer-perf-test \
  --broker-list kafka-headless.kafka.svc.cluster.local:9092 \
  --topic perf-test \
  --messages 53950000 \
  --group=perf-test-3 \
  --show-detailed-stats \
  --hide-header \
  --timeout 60000

During this test, a throughput of 1000 MB/sec was observed, indicating consumer performance. With a high throughput consumers can handle large volumes of messages.

Conclusion

Running Kafka on Dell PowerFlex with Kubernetes offers a highly scalable, resilient, and performant solution for managing real-time data streaming applications. The Dell PowerFlexsoftware-defined infrastructure, combined with Kubernetes container orchestration, ensures seamless scaling of the Kafka cluster, robust storage management, and high availability.

Powerflex's adaptability in handling IO-intensive workloads, along with its lifecycle management and integration with Kubernetes, provides an ideal infrastructure for deploying and managing Kafka, enabling enterprises to handle large volumes of data with minimal latency and operational complexities.

Authors: Florian Coulombel, Syed Abrar

Your Browser is Out of Date

Apache Kafka at scale with blazing-fast PowerFlex