Skip to content

Ensure Deregister::Conn Is Always Triggered in CDC Connection Handling #18245

@wlwilliamx

Description

@wlwilliamx

Bug Report

Currently, in the CDC connection handling logic, Deregister::Conn(conn_id) may not be triggered if an error occurs early in recv_req. This can lead to connection leaks, as the connection might not be properly deregistered.

The issue occurs in the following code:

let recv_req = async move {
    let mut stream = stream.map_err(|e| format!("{:?}", e));
    if let Some(request) = stream.try_next().await? {
        Self::set_conn_version(&scheduler, conn_id, version, explicit_features)?;
        Self::handle_request(&scheduler, &peer, request, conn_id)?;
    }
    while let Some(request) = stream.try_next().await? {
        Self::handle_request(&scheduler, &peer, request, conn_id)?;
    }
    let deregister = Deregister::Conn(conn_id);
    if let Err(e) = scheduler.schedule(Task::Deregister(deregister)) {
        error!("cdc deregister failed"; "error" => ?e, "conn_id" => ?conn_id);
    }
    Ok::<(), String>(())
};

If an error occurs before reaching the deregistration step, Deregister::Conn(conn_id) will not be executed, leading to a dangling connection.

What version of TiKV are you using?

master

What did you expect?

Deregister::Conn(conn_id) should always be executed, regardless of whether recv_req completes successfully or encounters an error.

Impact

  • Connections may not be properly deregistered.
  • Potential resource leaks affecting CDC stability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    affects-7.5This bug affects the 7.5.x(LTS) versions.affects-8.5This bug affects the 8.5.x(LTS) versions.component/CDCComponent: Change Data Captureseverity/minortype/bugThe issue is confirmed as a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions