[Virtual Event] Agentic AI Streamposium: Learn to Build Real-Time AI Agents & Apps | Register

Kafka Client Migrations With KCP and Confluent Cloud Gateway

Written By

The Last (and Longest) Mile of Apache Kafka® Migrations: Client Migrations With KCP and Confluent Cloud Gateway

In a previous blog post in this series, we introduced Kafka Copy Paste (KCP), an open source CLI tool that automates the discovery, provisioning, and data migration steps of moving your Apache Kafka® environment to Confluent Cloud. We walked through how KCP and Cluster Linking work together to reduce a process that traditionally took weeks to a matter of hours. At the end of that post, we hinted that automated client migration was coming soon.

That day has arrived.

With the addition of Confluent Cloud Gateway (Gateway) support, KCP now orchestrates the entire migration workflow end to end, from the first kcp discover command all the way through the final client cutover. This reduces the client cutover step from a weeks-long effort of cross-team coordination and setup to a few KCP commands. In this post, we'll show you how KCP and Gateway work together to make client migration seamless, safe, and rollback-friendly.

A Quick Recap: The 4-Step Migration

As we covered in the previous post, a typical Kafka migration breaks down into four steps:

  1. Discover and Plan: Scan your source environment to inventory clusters, topics, schemas, connectors, access control lists (ACLs), and active clients. Generate cost and metrics reports to inform your migration plan.

  2. Provision Infrastructure: Generate Terraform configurations for your Confluent Cloud environment, networking (AWS PrivateLink, Amazon Virtual Private Cloud [VPC] peering, AWS Transit Gateway), and Cluster Linking.

  3. Migrate Data: Replicate topics via Cluster Linking and migrate ACLs, schemas, and connectors using KCP-generated Terraform and migration utilities.

  4. Migrate Clients: Move your producers and consumers to the new Confluent Cloud cluster.

Steps 1 through 3 were fully automated in the previous post. Step 4, client migration, was the missing piece. Let's dive into how KCP now closes that gap.

Why Client Migration Is the Hardest Step

If you've ever attempted a Kafka migration, you know that moving data is only half the battle. The real challenge is migrating the clients, producers, and consumers that your applications depend on. Here's why it's so difficult:

  • Coordination Across Teams: Client applications are often owned by different teams, each with their own release cycles and risk tolerance. Orchestrating a synchronized cutover is a project management nightmare.

  • Authentication Differences: Your source cluster likely uses a different authentication mechanism (SASL/SCRAM, mTLS) than your destination. Updating every client's auth configuration is tedious and error-prone.

  • Offset Management: Consumer group offsets on the source cluster don't translate directly to the destination. Without offset preservation, consumers either reprocess messages or skip them entirely.

  • Rollback Risk: If something goes wrong mid-migration, you need a reliable way to roll back without data loss or corruption.

Traditionally, teams had two options: a "big bang" cutover with planned downtime or a complex dual-produce setup requiring significant application changes and technical overhead. Neither is ideal.

Enter KCP + Confluent Cloud Gateway

To solve these problems, KCP now integrates with Gateway, a cloud-native Kafka proxy that acts as an intelligent routing layer between your clients and your Kafka clusters. It routes Kafka protocol messages, rewrites metadata and broker addresses, and presents virtualized endpoints to clients. This enables KCP to transparently redirect traffic from the source Kafka cluster to the destination Confluent Cloud cluster without any client-side changes. Here’s why using KCP, Gateway, and Cluster Linking together is so powerful:

Capability

Cluster Linking Alone

KCP + Gateway + Cluster Linking

Offset Preservation

✅ Consumer offset syncing supported

✅ Consumer offsets synced via Cluster Linking, seamless failover

Auth Translation

❌ Clients must update auth configuration

✅ Gateway translates auth transparently (e.g., SASL/SCRAM → API keys)

Group-Based Migration

❌ All-or-nothing cutover

✅ Migrate subsets of clients and topics incrementally

Rollback Support

❌ Complex and risky

✅ Rollback possible before topic promotion completes

Client Code Changes

⚠️ Bootstrap server and auth updates required

✅ No client changes after initial Gateway onboarding

Coordination Overhead

⚠️ Requires cross-team synchronization for cutover

✅ KCP orchestrates the cutover automatically

Downtime From Topic Promotion

⚠️ Requires planned downtime

⚠️ Seconds of retriable errors

Client Migration Strategy: Deploy, Plan, Onboard, Migrate

Before diving into commands, let's discuss strategy. A successful client migration with KCP + Gateway follows four stages:

Stage 1: Deploy Gateway.

The first stage is deploying Gateway and configuring it with two streaming domains, with one pointing to your source cluster and one to your destination Confluent Cloud cluster. At this point, Gateway is deployed, but no clients are connected to it yet. Think of it as setting up the infrastructure before you start moving applications.

Stage 2: Plan Your Migration Groups.

Rather than migrating everything at once, KCP supports group-based migration. A group is your logical unit of migration—a set of topics and the client applications that depend on them (both producers and consumers) that need to move together.

You define groups by identifying which topics are related and which client applications produce to or consume from them. For example, a payments group might include payments.transactions and payments.events topics as well as the services that produce and consume them.

This gives you:

  • Team Autonomy: Each application team migrates on their own schedule.

  • Reduced Blast Radius: If an issue occurs, it's isolated to a single group rather than the entire cluster.

  • Incremental Validation: You can verify that each group is healthy on the destination before moving to the next one.

Stage 3: Onboard Clients to Gateway.

Once you know your groups, create a Gateway route for each one. Each route gets its own bootstrap endpoint. Then share that group's bootstrap URL with the relevant client teams, who update their applications to point to Gateway instead of directly connecting to the source cluster. 

The hands-on work required from you is minimal. Once the endpoints are ready, you simply distribute them, set a migration deadline, and continue focusing on your other priorities.

At this point, Gateway is proxying traffic to the source cluster. The client applications still produce and consume to the source cluster as before, but they now do it through Gateway. This decouples clients from the underlying cluster, setting the stage for a transparent cutover later.

Stage 4: Migrate Group by Group.

For each group, execute the migration using three KCP commands as outlined in the migration flow shared below. The cutover moves all traffic in the group—both consumers and producers—simultaneously. Gateway routes to a single backend cluster at a time, so the group cuts over as a single migration unit. This means there will be no partial state in which clients are split between the source and destination clusters. 

During the brief cutover window (~16 seconds), consumers experience zero effective downtime because their offsets are synced via Cluster Linking. The consumers pick up exactly where they left off on the destination cluster. Producers buffer during the block, and their pending writes succeed as soon as traffic is unblocked to the destination.

Authentication Translation

One of Gateway's most powerful features for migration is authentication swapping. Gateway can accept client connections authenticated with one mechanism and translate those credentials into a different mechanism when connecting to the destination cluster.

For example, your source cluster clients might authenticate using SASL/SCRAM with a username and password. Confluent Cloud uses SASL/PLAIN. Without Gateway, every client team would need to update their authentication configuration—a coordination nightmare. 

With Gateway's auth translation, clients continue using their existing SASL/SCRAM credentials. Gateway handles the authentication itself, extracts the client's identity, looks up the corresponding destination credentials from a supported secret store (AWS Secrets Manager, HashiCorp Vault, or Azure Key Vault), and authenticates to the destination cluster on the client's behalf.

Supported translation paths include:

Client → Gateway

Gateway → Confluent Cloud

SASL/PLAIN

SASL/PLAIN (API keys)

SASL/PLAIN

SASL/OAUTHBEARER

SASL/SCRAM-SHA-256/512

SASL/OAUTHBEARER

SASL/SCRAM-SHA-256/512

SASL/PLAIN (API keys)

mTLS (client certificates)

SASL/PLAIN (API keys)

mTLS (client certificates)

SASL/OAUTHBEARER

This means that most migrations require zero authentication changes on the client side. Gateway handles the translation transparently, so clients don't even know they're talking to a different cluster.

The Migration Flow: 3 Commands

With your clients pointing to Gateway, groups defined, and Cluster Linking replicating data, the actual cutover for each group is three commands. In the examples below, we use Amazon Managed Streaming for Apache Kafka (MSK) as the source cluster.

1. Initialize the Migration.

kcp migration init \
  --k8s-namespace <gateway-namespace> \
  --initial-cr-name <gateway-cr-name> \
  --source-cluster-arn <source-cluster-arn> \
  --cluster-id <cc-cluster-id> \
  --cluster-rest-endpoint <cc-rest-endpoint> \
  --cluster-link-name <cluster-link-name> \
  --cluster-api-key <cc-api-key> \
  --cluster-api-secret <cc-api-secret> \
  --fenced-cr-yaml <path-to-fenced-cr> \
  --switchover-cr-yaml <path-to-switchover-cr> \
  --topics <topic1,topic2,...> \
  --use-sasl-scram \
  --sasl-scram-username <source-username> \
  --sasl-scram-password <source-password>

This command is your preflight check. The --topics flag is where you define the scope of your migration group and specify the topics that belong to this group; KCP will validate that they're all being replicated by the cluster link. It also validates your Gateway streaming domains, Cluster Linking configuration, and consumer offset syncing. If everything checks out, it creates a migration plan and assigns a migration ID for tracking. No traffic is affected; this is purely validation.

You’ll also notice two important flags:

  1. --fenced-cr-yaml specifies the Gateway custom resource (CR) YAML used to temporarily block traffic during the migration window.

     

  2. --switchover-cr-yaml defines the Gateway CR YAML that will route traffic to Confluent Cloud when you’re ready to switch over.

It’s important to note that KCP doesn’t generate these files for you. Instead, you create both the fenced and switchover variants yourself, based on your initial CR, before running kcp migration init. You then provide their file paths using these flags. The KCP repository includes working examples covering all supported authentication configurations.

2. Monitor Replication Status.

kcp migration lag-check \
  --rest-endpoint <CONFLUENT_CLOUD_CLUSTER_REST_ENDPOINT> \
  --cluster-id <CONFLUENT_CLOUD_CLUSTER_ID> \
  --cluster-link-name <CLUSTER_LINK_NAME> \
  --cluster-api-key <CONFLUENT_CLOUD_CLUSTER_API_KEY> \
  --cluster-api-secret <CONFLUENT_CLOUD_CLUSTER_API_SECRET>

Before cutting over, you need to be confident that your destination cluster is fully caught up. KCP connects to the Cluster Link REST API and shows real-time, per-topic replication lag in a live terminal dashboard. You can run this command for as long as needed until lag reaches an acceptable level.

What’s considered acceptable depends on your tolerance for downtime. A single high-lag topic can delay promotion and extend the cutover window for the entire group.

3. Execute the Cutover.

kcp migration execute \
  --migration-id <MIGRATION_ID> \
  --lag-threshold 0 \
  --cluster-api-key <CONFLEUNT_CLOUD_CLUSTER_API_KEY> \
  --cluster-api-secret <CONFLEUNT_CLOUD_CLUSTER_API_SECRET> \
  --use-sasl-scram \
  --sasl-scram-username <source_username> \
  --sasl-scram-password <source_password>

When you're ready, this single command orchestrates the entire cutover sequence:

  1. Check Readiness: Are all topics caught up? If any topic has more unreplicated messages than your --lag-threshold allows, the cutover stops before anything changes.

  2. Pause Traffic: KCP temporarily blocks all client traffic through Gateway. Clients see a retriable error and wait.

  3. Promote Topics: Mirror topics on Confluent Cloud are promoted to regular, writable topics. Cluster Linking finishes replicating any remaining messages before each topic is promoted.

  4. Switch and Resume: KCP updates the Gateway configuration in a single atomic update, switching the route from the source to Confluent Cloud and unblocking traffic. Clients reconnect and resume producing and consuming—now on the destination.

The typical producer blocking time is a few seconds. Consumers experience zero downtime because their offsets are synced via Cluster Linking. They seamlessly pick up on the destination exactly where they left off.

Rollback: Rollback is possible during the cutover sequence up until topic promotion completes. After the route is switched and traffic is unblocked, the cutover is complete, and the group is live on Confluent Cloud.

Putting It All Together

With this release, KCP now covers the complete migration journey:

# Step 1: Discover your source environment (e.g., Amazon MSK)
kcp discover --region us-east-1

# Step 2: Generate and deploy infrastructure
kcp create-asset target-infra
kcp create-asset migration-infra --type 1

# Step 3: Migrate data, schemas, ACLs, and connectors
kcp create-asset migrate-topics
kcp create-asset migrate-schemas
kcp create-asset migrate-acls
kcp create-asset migrate-connectors

# Step 4: Migrate clients (NEW)
# Deploy Gateway, onboard clients, then for each migration group:
kcp migration init
kcp migration lag-check
kcp migration execute --lag-threshold 100

What used to be a weeks-long, error-prone process is now a series of CLI commands with built-in validation, monitoring, and rollback at every step.

What's Next

Today, KCP migrates all traffic in a group simultaneously; consumers and producers move together in a single cutover. We're working on traffic-plane-aware migration, which will let you migrate consumers and producers independently using a --traffic flag.

  • kcp migration execute --traffic consume: Migrate only consume-plane traffic first, without promoting topics. This is lower risk since it doesn't require topic promotion.
  • kcp migration execute --traffic produce: Migrate produce-plane traffic, which includes topic promotion and the routing switch.
  • kcp migration execute --traffic all: This is the default, which internally sequences consume-plane first, then produce-plane.

This will give you even more control over your migration strategy. Move consumers first to validate the destination, then switch producers when you're confident everything is working.

KCP and Gateway will also enable more granular principal and topic-based migrations to more precisely control the migration flow, reducing risk and blast radius even more. 

Conclusion: From Weeks to Hours

Client migration has traditionally been the hardest and longest phase of Kafka migrations, driven more by coordination than technology and often taking weeks of effort. With KCP and Confluent Cloud Gateway, that changes entirely.

Instead of orchestrating complex, cross-team updates, your role becomes simple: Define groups, create Gateway routes, and share bootstrap endpoints. From there, KCP handles the heavy lifting, reducing a process that once took weeks to just hours.

Migrations don’t have to be painful. With built-in validation, monitoring, and rollback, they become predictable, incremental, and low risk. The last mile is no longer a bottleneck; it’s just a few commands.

Get Started

KCP is open source (Apache 2.0) and free to use. Here's how to get started:

Apache®, Apache Kafka®, Kafka®, and the Kafka logo are registered trademarks of the Apache Software Foundation. No endorsement by the Apache Software Foundation is implied by the use of these marks. 

  • David Marsh is a Senior Technical Marketing Manager at Confluent. David is passionate about making complex cloud and data streaming concepts clear, practical, and actionable for both technical and non-technical audiences.

  • Ahmed Zamzam leads the Technical Marketing team at Confluent, where he and his team create blogs, demos, videos, workshops, and reference architectures that showcase the power of Confluent’s Data Streaming Platform. With over 15 years of experience spanning Solution Architecture and Technical Marketing, Ahmed has a deep passion for helping organizations unlock the value of real-time data. When he’s not diving into streaming technologies, you’ll likely find him traveling the world, playing tennis, or cycling.

Did you like this blog post? Share it now