Skip to content

Log Backup may get stuck due to async lock held across .await #19238

@3pointer

Description

@3pointer

Bug Report

Under certain configurations, Log Backup may get permanently stuck.

Once this happens:

  • Log backup no longer makes progress, no related backup log output
  • Pause/Stop task commands do not work

This issue is strongly correlated with holding an async lock across .await, which may stall the Tokio runtime when thread resources are constrained.

What version of TiKV are you using?

v8.5.3

What operating system and CPU are you using?

N/A

Steps to reproduce

It's not easy to reproduce, but a hypothesis version should be

  1. Use a small Tokio runtime thread pool
    • e.g. reduce [log-backup] num.threads to a small number (1–2)
  2. Configure log backup with a short flush interval
  3. Start normal write workload and wait

What did you expect?

Log backup works normally.

What did happened?

Log backup flush stops making progress without reporting any error.

When this happens:

  1. Log backup no longer advances
  2. Querying /async-tasks shows that async tasks are blocked on S3-related operations
    - These S3 operations are configured with Tokio timeouts, but:
    - The timeouts do not trigger
  3. Pause / control operations do not work

Metadata

Metadata

Assignees

No one assigned

    Labels

    affects-8.5This bug affects the 8.5.x(LTS) versions.component/backup-restoreComponent: backup, import, external_storageseverity/majortype/bugThe issue is confirmed as a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions