-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Bug Report
What version of TiKV are you using?
v8.5.3, v8.5.2
What operating system and CPU are you using?
Steps to reproduce
BR restore on a cluster of 20 TiKVs with data source being a Huawei OBS (an S3-compatible service).
The upstream service implemented rate limiting through a token-bucket filter, forcing the total download speed from the entire TiKV cluster to be below 1 GiB/s.
The Rust AWS SDK has a feature known as "stalled stream protection" awslabs/aws-sdk-rust#1146, which will kill the request if the throughput is below 1 B/s for some grace period (supposed to be 20s, but it seems the actual timeout is 6s).
So if the total downstream speed is too high, which exhausted the token bucket, and the penalty exceeded the grace period (6s), the AWS SDK will kill the connection, forcing TiKV to restart the downstream from the beginning, both worsening the resource usage and leads to #18839.
What did you expect?
Speed limit should not kill the download request (at least not within 6s).
What did happened?
The download request failed with "streaming error".