[Virtual Event] Agentic AI Streamposium: Learn to Build Real-Time AI Agents & Apps | Register
In a previous blog post in this series, we introduced Kafka Copy Paste (KCP), an open source CLI tool that automates the discovery, provisioning, and data migration steps of moving your Apache Kafka® environment to Confluent Cloud. We walked through how KCP and Cluster Linking work together to reduce a process that traditionally took weeks to a matter of hours. At the end of that post, we hinted that automated client migration was coming soon.
That day has arrived.
With the addition of Confluent Cloud Gateway (Gateway) support, KCP now orchestrates the entire migration workflow end to end, from the first kcp discover command all the way through the final client cutover. This reduces the client cutover step from a weeks-long effort of cross-team coordination and setup to a few KCP commands. In this post, we'll show you how KCP and Gateway work together to make client migration seamless, safe, and rollback-friendly.
As we covered in the previous post, a typical Kafka migration breaks down into four steps:
Discover and Plan: Scan your source environment to inventory clusters, topics, schemas, connectors, access control lists (ACLs), and active clients. Generate cost and metrics reports to inform your migration plan.
Provision Infrastructure: Generate Terraform configurations for your Confluent Cloud environment, networking (AWS PrivateLink, Amazon Virtual Private Cloud [VPC] peering, AWS Transit Gateway), and Cluster Linking.
Migrate Data: Replicate topics via Cluster Linking and migrate ACLs, schemas, and connectors using KCP-generated Terraform and migration utilities.
Migrate Clients: Move your producers and consumers to the new Confluent Cloud cluster.
Steps 1 through 3 were fully automated in the previous post. Step 4, client migration, was the missing piece. Let's dive into how KCP now closes that gap.
If you've ever attempted a Kafka migration, you know that moving data is only half the battle. The real challenge is migrating the clients, producers, and consumers that your applications depend on. Here's why it's so difficult:
Coordination Across Teams: Client applications are often owned by different teams, each with their own release cycles and risk tolerance. Orchestrating a synchronized cutover is a project management nightmare.
Authentication Differences: Your source cluster likely uses a different authentication mechanism (SASL/SCRAM, mTLS) than your destination. Updating every client's auth configuration is tedious and error-prone.
Offset Management: Consumer group offsets on the source cluster don't translate directly to the destination. Without offset preservation, consumers either reprocess messages or skip them entirely.
Rollback Risk: If something goes wrong mid-migration, you need a reliable way to roll back without data loss or corruption.
Traditionally, teams had two options: a "big bang" cutover with planned downtime or a complex dual-produce setup requiring significant application changes and technical overhead. Neither is ideal.
To solve these problems, KCP now integrates with Gateway, a cloud-native Kafka proxy that acts as an intelligent routing layer between your clients and your Kafka clusters. It routes Kafka protocol messages, rewrites metadata and broker addresses, and presents virtualized endpoints to clients. This enables KCP to transparently redirect traffic from the source Kafka cluster to the destination Confluent Cloud cluster without any client-side changes. Here’s why using KCP, Gateway, and Cluster Linking together is so powerful:
Capability | Cluster Linking Alone | KCP + Gateway + Cluster Linking |
Offset Preservation | ✅ Consumer offset syncing supported | ✅ Consumer offsets synced via Cluster Linking, seamless failover |
Auth Translation | ❌ Clients must update auth configuration | ✅ Gateway translates auth transparently (e.g., SASL/SCRAM → API keys) |
Group-Based Migration | ❌ All-or-nothing cutover | ✅ Migrate subsets of clients and topics incrementally |
Rollback Support | ❌ Complex and risky | ✅ Rollback possible before topic promotion completes |
Client Code Changes | ⚠️ Bootstrap server and auth updates required | ✅ No client changes after initial Gateway onboarding |
Coordination Overhead | ⚠️ Requires cross-team synchronization for cutover | ✅ KCP orchestrates the cutover automatically |
Downtime From Topic Promotion | ⚠️ Requires planned downtime | ⚠️ Seconds of retriable errors |
Before diving into commands, let's discuss strategy. A successful client migration with KCP + Gateway follows four stages:
The first stage is deploying Gateway and configuring it with two streaming domains, with one pointing to your source cluster and one to your destination Confluent Cloud cluster. At this point, Gateway is deployed, but no clients are connected to it yet. Think of it as setting up the infrastructure before you start moving applications.
Rather than migrating everything at once, KCP supports group-based migration. A group is your logical unit of migration—a set of topics and the client applications that depend on them (both producers and consumers) that need to move together.
You define groups by identifying which topics are related and which client applications produce to or consume from them. For example, a payments group might include payments.transactions and payments.events topics as well as the services that produce and consume them.
This gives you:
Team Autonomy: Each application team migrates on their own schedule.
Reduced Blast Radius: If an issue occurs, it's isolated to a single group rather than the entire cluster.
Incremental Validation: You can verify that each group is healthy on the destination before moving to the next one.
Once you know your groups, create a Gateway route for each one. Each route gets its own bootstrap endpoint. Then share that group's bootstrap URL with the relevant client teams, who update their applications to point to Gateway instead of directly connecting to the source cluster.
The hands-on work required from you is minimal. Once the endpoints are ready, you simply distribute them, set a migration deadline, and continue focusing on your other priorities.
At this point, Gateway is proxying traffic to the source cluster. The client applications still produce and consume to the source cluster as before, but they now do it through Gateway. This decouples clients from the underlying cluster, setting the stage for a transparent cutover later.
For each group, execute the migration using three KCP commands as outlined in the migration flow shared below. The cutover moves all traffic in the group—both consumers and producers—simultaneously. Gateway routes to a single backend cluster at a time, so the group cuts over as a single migration unit. This means there will be no partial state in which clients are split between the source and destination clusters.
During the brief cutover window (~16 seconds), consumers experience zero effective downtime because their offsets are synced via Cluster Linking. The consumers pick up exactly where they left off on the destination cluster. Producers buffer during the block, and their pending writes succeed as soon as traffic is unblocked to the destination.
One of Gateway's most powerful features for migration is authentication swapping. Gateway can accept client connections authenticated with one mechanism and translate those credentials into a different mechanism when connecting to the destination cluster.
For example, your source cluster clients might authenticate using SASL/SCRAM with a username and password. Confluent Cloud uses SASL/PLAIN. Without Gateway, every client team would need to update their authentication configuration—a coordination nightmare.
With Gateway's auth translation, clients continue using their existing SASL/SCRAM credentials. Gateway handles the authentication itself, extracts the client's identity, looks up the corresponding destination credentials from a supported secret store (AWS Secrets Manager, HashiCorp Vault, or Azure Key Vault), and authenticates to the destination cluster on the client's behalf.
Supported translation paths include:
Client → Gateway | Gateway → Confluent Cloud |
SASL/PLAIN | SASL/PLAIN (API keys) |
SASL/PLAIN | SASL/OAUTHBEARER |
SASL/SCRAM-SHA-256/512 | SASL/OAUTHBEARER |
SASL/SCRAM-SHA-256/512 | SASL/PLAIN (API keys) |
mTLS (client certificates) | SASL/PLAIN (API keys) |
mTLS (client certificates) | SASL/OAUTHBEARER |
This means that most migrations require zero authentication changes on the client side. Gateway handles the translation transparently, so clients don't even know they're talking to a different cluster.
With your clients pointing to Gateway, groups defined, and Cluster Linking replicating data, the actual cutover for each group is three commands. In the examples below, we use Amazon Managed Streaming for Apache Kafka (MSK) as the source cluster.
This command is your preflight check. The --topics flag is where you define the scope of your migration group and specify the topics that belong to this group; KCP will validate that they're all being replicated by the cluster link. It also validates your Gateway streaming domains, Cluster Linking configuration, and consumer offset syncing. If everything checks out, it creates a migration plan and assigns a migration ID for tracking. No traffic is affected; this is purely validation.
You’ll also notice two important flags:
It’s important to note that KCP doesn’t generate these files for you. Instead, you create both the fenced and switchover variants yourself, based on your initial CR, before running kcp migration init. You then provide their file paths using these flags. The KCP repository includes working examples covering all supported authentication configurations.
Before cutting over, you need to be confident that your destination cluster is fully caught up. KCP connects to the Cluster Link REST API and shows real-time, per-topic replication lag in a live terminal dashboard. You can run this command for as long as needed until lag reaches an acceptable level.
What’s considered acceptable depends on your tolerance for downtime. A single high-lag topic can delay promotion and extend the cutover window for the entire group.
When you're ready, this single command orchestrates the entire cutover sequence:
Check Readiness: Are all topics caught up? If any topic has more unreplicated messages than your --lag-threshold allows, the cutover stops before anything changes.
Pause Traffic: KCP temporarily blocks all client traffic through Gateway. Clients see a retriable error and wait.
Promote Topics: Mirror topics on Confluent Cloud are promoted to regular, writable topics. Cluster Linking finishes replicating any remaining messages before each topic is promoted.
Switch and Resume: KCP updates the Gateway configuration in a single atomic update, switching the route from the source to Confluent Cloud and unblocking traffic. Clients reconnect and resume producing and consuming—now on the destination.
The typical producer blocking time is a few seconds. Consumers experience zero downtime because their offsets are synced via Cluster Linking. They seamlessly pick up on the destination exactly where they left off.
Rollback: Rollback is possible during the cutover sequence up until topic promotion completes. After the route is switched and traffic is unblocked, the cutover is complete, and the group is live on Confluent Cloud.
With this release, KCP now covers the complete migration journey:
What used to be a weeks-long, error-prone process is now a series of CLI commands with built-in validation, monitoring, and rollback at every step.
Today, KCP migrates all traffic in a group simultaneously; consumers and producers move together in a single cutover. We're working on traffic-plane-aware migration, which will let you migrate consumers and producers independently using a --traffic flag.
This will give you even more control over your migration strategy. Move consumers first to validate the destination, then switch producers when you're confident everything is working.
KCP and Gateway will also enable more granular principal and topic-based migrations to more precisely control the migration flow, reducing risk and blast radius even more.
Conclusion: From Weeks to Hours
Client migration has traditionally been the hardest and longest phase of Kafka migrations, driven more by coordination than technology and often taking weeks of effort. With KCP and Confluent Cloud Gateway, that changes entirely.
Instead of orchestrating complex, cross-team updates, your role becomes simple: Define groups, create Gateway routes, and share bootstrap endpoints. From there, KCP handles the heavy lifting, reducing a process that once took weeks to just hours.
Migrations don’t have to be painful. With built-in validation, monitoring, and rollback, they become predictable, incremental, and low risk. The last mile is no longer a bottleneck; it’s just a few commands.
KCP is open source (Apache 2.0) and free to use. Here's how to get started:
KCP GitHub Repository: Download KCP, explore the source, and contribute.
Migration Workshop: Access a hands-on, guided workshop that walks you through an end-to-end migration using KCP.
Confluent Cloud Free Trial: Try Confluent Cloud with promo code CLOUDBLOG60 for $60 worth of additional free usage.
Part 1: Automate Kafka Migration With KCP: If you haven't read it yet, start here for the full walk-through of Steps 1–3.
Apache®, Apache Kafka®, Kafka®, and the Kafka logo are registered trademarks of the Apache Software Foundation. No endorsement by the Apache Software Foundation is implied by the use of these marks.
At Confluent, our mission is to provide the world’s most secure and scalable data streaming platform. As cryptographic standards evolve to meet the challenges of the future, we are committed to ensuring your data remains protected against emerging threats—including the eventual development of Cr...
Learn how to migrate to Confluent Cloud in hours using Confluent’s open source Kafka Copy Paste tool. Get an in-depth introduction to the KCP tool and a walk-through of the four steps of migrating from MSK to Confluent Cloud using the tool.