[Webinar] Kafka Without ZooKeeper: A Practical KRaft Demo | Register Now

Sep 30, 2025Read Time: 8 min

Scaling Kafka Streams Applications: Strategies for High-Volume Traffic

Written By

Bijoy ChoudhuryTechnical Lead, Cloud Enablement Engineering
Confluent Staff

Sep 30, 2025Read Time: 8 min

As the adoption of real-time data processing accelerates, the ability to scale stream processing applications to handle high-volume traffic is paramount. Apache Kafka®, the de facto standard for distributed event streaming, provides a powerful and scalable library in Kafka Streams for building such applications.

Scaling a Kafka Streams application effectively involves a multi-faceted approach that encompasses architectural design, configuration tuning, and diligent monitoring. This guide will walk you through the essential strategies and best practices to ensure your Kafka Streams applications can gracefully handle massive throughput.

Put these best practices into action and build a serverless Kafka Streams application—get started for free on Confluent Cloud.

Get Started Today

The Core Principle: Parallelism Through Partitioning in Apache Kafka® Topics

The fundamental concept underpinning the scalability of Kafka Streams is the direct relationship between the number of partitions in your input Kafka topics and the parallelism of your application.

The Direct Relationship Between Kafka Topics and Kafka Streams Tasks

The Direct Relationship Between Kafka Topics and Kafka Streams Tasks

The unit of parallelism in Kafka Streams is a task. Each task is responsible for processing a subset of the data from the input topics.

Partition-to-Task Assignment

Kafka Streams creates a number of tasks equal to the maximum number of partitions across all input topics per sub-topology. Each task is assigned one or more partitions from the input topics. For instance, if you have a program with a single sub-topology and two input topics, one with 10 partitions and another with 20, Kafka Streams will create 20 tasks.

When working with multiple sub-topologies in Kafka Streams, the partition-to-task assignment is determined independently for each sub-topology. Here’s how it works:

For each sub-topology, Kafka Streams calculates the number of tasks based on the highest partition count among all its input topics. Each task within a sub-topology is then assigned one or more partitions from those input topics. This means that if your application contains several sub-topologies, each one will have its own set of tasks, and the number of tasks per sub-topology matches the maximum partition count of its respective input topics.

For example, suppose your application has two sub-topologies:

Sub-topology A consumes from Topic X (10 partitions) and Topic Y (20 partitions).
Sub-topology B consumes from Topic Z (5 partitions).

Kafka Streams will create:

20 tasks for Sub-topology A (since Topic Y has the most partitions among its inputs).
5 tasks for Sub-topology B (since Topic Z is its only input).

Each task is responsible for processing one or more partitions from the input topics of its sub-topology. This approach ensures that each sub-topology can scale independently, according to the partitioning of its input topics.

Scaling Limit

The number of tasks dictates the maximum parallelism you can achieve. If you have 20 tasks, you can run up to 20 instances or threads of your application, each processing a single task concurrently. Any additional instances will remain idle, acting as backups that can take over if an active instance fails.

That’s why the first and most critical step in designing a scalable Kafka Streams application is to choose an appropriate number of partitions for your input topics based on your expected workload and future growth. Choosing the right number of partitions allows you to balance maximizing throughput and avoiding overprovisioning infrastructure.

Advanced Scaling Strategies for Kafka Streams: Scaling Out vs. Scaling Up

Once your topic partitioning strategy is in place, you can scale your Kafka Streams application in two primary ways: scaling out or up.

Scaling Out (Horizontal Scaling)

This is the most common method for scaling Kafka Streams. It involves running multiple instances of your application on different machines. Kafka Streams has built-in consumer group management that automatically handles the distribution of tasks (and their associated partitions) across all running instances of the application with the same application.id.

Kafka Streams tasks are distributed across application instances to distribute load

When to Scale Out:

Network-bound applications: If your application is limited by network bandwidth (e.g., sending and receiving large volumes of data)
Memory-bound applications: When your application manages large state stores—such as those used in extensive windowing or aggregations—that result in high disk I/O utilization. While Kafka Streams persist state to local disk (allowing the total state size to exceed available main memory), performance degrades if the active dataset cannot be cached effectively. Scaling out distributes the state across more instances, increasing the aggregate memory available for caching and reducing the frequency of costly disk reads.
Disk-bound applications: If the local state stores (by default, RocksDB) are causing I/O bottlenecks, or you need more overall disk space.

How to Scale Out: Simply launch a new instance of your application with the same application.id. Kafka Streams will trigger a rebalance, and the new instance will be assigned a share of the active tasks (and their underlying partitions) to process.

Scaling Up (Vertical Scaling)

This strategy involves increasing the resources (CPU, memory, disk) of the machines running your Kafka Streams application. To utilize these resources, you must increase the num.stream.threads configuration. This allows a single instance to process more tasks in parallel across multiple CPU cores, provided you have enough tasks available to keep those threads busy.

A single Kafka Streams instance uses multiple threads to process its assigned tasks in parallel.

When to Scale Up:

CPU-bound applications: If your processing logic is computationally intensive (e.g., complex transformations, encryption/decryption, or regular expression matching), adding more CPU cores and increasing stream threads can improve throughput.
Memory-bound workloads: If your application maintains large state stores, adding more main memory allows for larger caches (RocksDB block cache and OS page cache). This keeps more of your "working set" in RAM, significantly reducing the latency caused by disk I/O.
Storage-bound workloads: If your local state stores (RocksDB) are projected to exceed the current disk capacity. Scaling up to a larger disk prevents application crashes due to "out of disk space" errors and ensures sufficient headroom for state restoration and log compaction.

How to Scale Up:

Increase Resources: Start by provisioning more powerful machines with additional CPU cores, memory, and disk, while simultaneously updating your application configuration to utilize these new resources. Since Kafka Streams production deployments typically bind resource usage to specific limits, simply adding hardware is insufficient. You must explicitly increase settings like num.stream.threads to leverage additional cores, adjust JVM heap or RocksDB memory caps to use the extra RAM, and ensure the expanded disk volume is accessible to the state directory (state.dir) to allow local stores to grow without hitting capacity or IOPS limits; otherwise, the application will fail to take advantage of the upgraded infrastructure.
Increase Stream Threads: Adjust the num.stream.threads configuration parameter in your Kafka Streams application. This allows a single instance to process multiple tasks in parallel, with each thread handling one or more tasks. The default is one thread.

Fine-Tuning for Performance: Key Configuration Parameters

Beyond the fundamental scaling strategies, optimizing your Kafka Streams application's configuration is crucial for handling high-volume traffic. Here are some of the most impactful settings:

Configuration Parameter	Description	Best Practices for High Throughput
`num.stream.threads`	The number of stream threads to run within a single application instance.	For CPU-bound tasks, set this to a value less than or equal to the number of available CPU cores.
`rocksdb.config.setter`	Allows you to customize RocksDB settings for Kafka Streams state stores. This is essential for tuning performance, memory usage, and other RocksDB behaviors in your Kafka Streams applications.	Switch `CompactionStyle` to `UNIVERSAL` to maximize write speed, and increase write buffers and background jobs to prevent disk stalls during heavy loads. Read more about it here.
`cache.max.bytes.buffering`	The maximum memory used for buffering records across all threads.	Increasing this can improve throughput by reducing the frequency of writes to state stores and Kafka, but at the cost of higher memory consumption.
`commit.interval.ms`	The frequency with which to save the processing progress.	A larger interval can improve throughput but increases the amount of data reprocessed in case of a failure. The default is 30,000ms. For lower latency, this can be reduced.
`producer.` `and` `consumer.` `configs`	Underlying producer and consumer configurations.	Tune settings like `producer.batch.size`, `producer.linger.ms`, and `producer.compression.type` to optimize writes to Kafka. Adjust `consumer.fetch.min.bytes` and `consumer.max.poll.records` to control how much data a consumer fetches in each poll.

Special Considerations for Stateful vs. Stateless Kafka Streams Applications

The nature of your application's processing logic—whether it's stateless or stateful—influences scaling decisions.

Stateless Applications: These are generally easier to scale as they don't maintain any state. Scaling is primarily a matter of adding more instances to handle the increased load.
Stateful Applications: Applications that perform operations like joins, aggregations, and windowing maintain local state stores. The size and performance of these state stores become critical scaling factors. For stateful operations, especially joins, it's crucial to ensure that the input topics are co-partitioned, meaning they have the same number of partitions and that records with the same key are written to the same partition number in both topics. This allows Kafka Streams to perform the join locally without needing to repartition the data over the network, which is a costly operation.

Kafka Streams applications can perform joins, aggregations, windowing and other operations while maintaining local state stores

Kafka Streams applications can perform joins, aggregations, windowing and other operations while maintaining local state stores

Monitoring Kafka Streams Application Performance at Scale

To effectively scale your Kafka Streams application, you need to monitor its performance to identify bottlenecks and make informed decisions. Key metrics to watch include:

Consumer Lag: The number of messages in a partition that have not yet been consumed. High and increasing lag is a clear indicator that your application cannot keep up with the incoming data rate.
CPU Utilization: High CPU utilization might indicate that your application is CPU-bound and could benefit from scaling up or adding more stream threads.
Thread Utilization: Monitor the utilization of your stream threads to see if they are being used effectively.
End-to-End Latency: The time it takes for a message to be processed from the source topic to the sink topic.
State Store Size and I/O: For stateful applications, monitor the size of your local state stores and the I/O operations on the underlying disk.

If you run your Kafka Streams application on Confluent Cloud, the Confluent UI provides you with a rich dashboard of metrics for application monitoring.The new Kafka Streams page provides an at-a-glance health check of your entire Kafka Streams application. You can now see the overall application state, such as RUNNING, REBALANCING, PENDING_SHUTDOWN, or PENDING_ERROR, and drill down to view the state of each individual processing thread.

Start Building Robust Kafka Streams Apps

By understanding and applying these scaling principles, from topic partitioning and deployment strategies to fine-grained configuration and diligent monitoring, you can build robust and highly scalable Kafka Streams applications capable of handling the most demanding real-time data workloads. Ready to start building with serverless Kafka on Confluent Cloud?

Get Started Today

Advanced Strategies for Kafka Streams – FAQs

How do Kafka Streams applications scale with partitions?

Kafka Streams scales by assigning tasks to threads. While task generation is based on partition counts, complex applications are composed of multiple sub-topologies (often due to repartitioning), each generating its own set of tasks. Therefore, the true maximum parallelism is dictated by the total number of tasks across the entire topology, which may exceed the partition count of the initial input topic.

Should I scale Kafka Streams by adding more instances or more threads?

Both are valid: adding instances (scale out) distributes tasks across machines, while adding threads (scale up) uses more CPU cores in one instance. The choice depends on whether you are CPU, memory, disk I/O, or network-bound.

What configuration settings are most important for Kafka Streams performance?

Key configs include num.stream.threads (parallelism), cache.max.bytes.buffering (throughput optimization), rocksdb.config.setter (throughput optimization), and commit.interval.ms (latency vs durability trade-off).

How can I monitor Kafka Streams scalability in production?

Track consumer lag, CPU utilization, thread utilization, and state store I/O. High lag or rising end-to-end latency indicates that your application is falling behind and needs scaling.

Apache®, Apache Kafka®, and Kafka® are registered trademarks of the Apache Software Foundation. No endorsement by the Apache Software Foundation is implied by the use of these marks.

Bijoy Choudhury is a solutions engineering leader at Confluent, specializing in real-time data streaming, AI/ML integration, and enterprise-scale architectures. A veteran technical educator and architect, he focuses on driving customer success by leading a team of cloud enablement engineers to design and deliver high-impact proofs-of-concept and enable customers for use cases like real-time fraud detection and ML pipelines.

As a technical author and evangelist, Bijoy actively contributes to the community by writing blogs on new streaming features, delivering technical webinars, and speaking at events. Prior to Confluent, he was a Senior Solutions Architect at VMware, guiding enterprise customers in their cloud-native transformations using Kubernetes and VMware Tanzu. He also spent over six years at Pivotal Software as a Principal Technical Instructor, where he designed and delivered official courseware for the Spring Framework, Cloud Foundry, and GemFire.
This blog was a collaborative effort between multiple Confluent employees.

Did you like this blog post? Share it now

The BI Lag Problem and How Event-Driven Workflows Solve It

Sep 30, 2025

Learn how to automate BI with real-time streaming. Explore event-driven workflows that deliver instant insights and close the gap between data and action.

How Does Real-Time Streaming Prevent Fraud in Banking and Payments?

Sep 30, 2025

Discover how banks and payment providers use Apache Kafka® streaming to detect and block fraud in real time. Learn patterns for anomaly detection, risk mitigation, and trusted automation.

Scaling Kafka Streams Applications: Strategies for High-Volume Traffic

Get Started

Demo Center

Written By