Socket level throttling for S3 read and write requests#81837
Conversation
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Workflow [PR], commit [e01b689] Summary: ❌
|
05d76a5 to
29d7544
Compare
|
We spoke about that.
|
|
Addressing review:
Not addressed yet |
|
|
I have found the problem with tests. It's an unsigned integer overflow that happens if UPD. Interestingly enough, Copilot was able to find it as well. I'd better ask it before I figure it out by myself... |
There was a problem hiding this comment.
Pull Request Overview
This PR implements socket-level throttling for S3 read and write requests to address issues with coarse-grained throttling when request sizes exceed token bucket capacity. The change moves throttling from the S3 request level to individual socket operations, enabling better distribution of traffic consumption and more effective use of token bucket volumes.
- Refactors throttler interface from
add()tothrottle()method with improved blocking semantics - Implements socket-level throttling in Poco networking stack with budget-based approach
- Integrates throttlers into HTTP connection pool and S3 operations via thread-local scopes
Reviewed Changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Common/Throttler.h | Updates throttler interface to use throttle() method instead of add() |
| src/Common/Throttler.cpp | Implements new throttling logic with blocking semantics and thread safety |
| src/IO/WriteBufferFromS3.cpp | Replaces request-level throttling with socket-level throttling scopes |
| src/IO/ReadBufferFromS3.cpp | Removes manual throttling calls, delegates to socket-level implementation |
| base/poco/Net/src/SocketImpl.cpp | Adds socket-level throttling with budget management |
| src/Common/CurrentThread.h | Adds throttling scope classes for read/write operations |
| src/Common/HTTPConnectionPool.cpp | Integrates throttlers into HTTP session management |
| Multiple buffer files | Updates throttler method calls from add() to throttle() |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…House into socket-level-throttling
|
With the help of integration tests debugging I was able to find and fix the rest of the issues with tests:
|
This is a bit strange. I checked all the |
|
All failed tests are unrelated
00002_log_and_exception_messages_formatting #84538 |
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250729) * Fix build due to ClickHouse/ClickHouse#76802 * Fix build due to ClickHouse/ClickHouse#81837 * Fix build due to ClickHouse/ClickHouse#84011 * Fix gtest due to ClickHouse/ClickHouse#83599 --------- Co-authored-by: kyligence-git <gluten@kyligence.io> Co-authored-by: Chang chen <chenchang@apache.com>
Depends on #81753 and should be reviewed after it is merged
Follows similar logic as #65182 and resolves the same problem: Coarse-grained throttling based on S3 request (or response) size does not work well when request sizes are larger than the token bucket size (usually 1 sec * max_speed B/sec). Throttling such large chunks leads to underutilization of real network bandwidth and idle-waiting of threads on throttlers for multiple seconds. Throttling individual socket operations instead of whole S3 requests solves the problem by distributing traffic consumption over a longer duration. This allows effective replenishment and use of the token bucket volume.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
S3 read and write requests are throttled on the HTTP socket level (instead of whole S3 requests) to avoid issues with
max_remote_read_network_bandwidth_for_serverandmax_remote_write_network_bandwidth_for_serverthrottling.