Follow-up from #21562 (raised in review by @domiwei).
Background
#21562 made cl/sentinel/httpreqresp stream the peer response through a per-topic size cap instead of buffering it in the handler, so memory is now bounded by the per-request byte budget. That fixes the original unbounded-buffering OOM.
Remaining improvement
The full response is still materialized once, in service.requestPeer:
data, err := io.ReadAll(resp.Body)
...
return &sentinelproto.ResponseData{Data: data, ...}
For a large by_range request the budget is large (~ requested-count x per-chunk), so this is still a big single allocation; and ResponseData.Data is a single gRPC bytes field, so the whole response also crosses the sentinel->cl/rpc boundary as one blob before parseResponseData decodes it.
Proposal
Decode req/resp chunks incrementally rather than buffering the whole response:
- keep
max_response_bytes as a wire-budget guard, enforced by the counting reader already in place;
- decode chunks incrementally from the
io.Reader, rejecting as soon as the byte budget / chunk count / per-chunk size is exceeded;
- avoid the second full copy in
requestPeer — this likely requires the sentinel SendRequest/SendPeerRequest gRPC to become server-streaming, since ResponseData.Data is one bytes field today.
Peak memory would then be ~ one chunk + the decoded object, the right shape for multi-chunk req/resp.
Note
Not a live OOM — #21562 already bounds the allocation by the per-request budget. This is memory-shape hardening, hence a regular enhancement.
Follow-up from #21562 (raised in review by @domiwei).
Background
#21562 made
cl/sentinel/httpreqrespstream the peer response through a per-topic size cap instead of buffering it in the handler, so memory is now bounded by the per-request byte budget. That fixes the original unbounded-buffering OOM.Remaining improvement
The full response is still materialized once, in
service.requestPeer:For a large
by_rangerequest the budget is large (~ requested-count x per-chunk), so this is still a big single allocation; andResponseData.Datais a single gRPCbytesfield, so the whole response also crosses the sentinel->cl/rpcboundary as one blob beforeparseResponseDatadecodes it.Proposal
Decode req/resp chunks incrementally rather than buffering the whole response:
max_response_bytesas a wire-budget guard, enforced by the counting reader already in place;io.Reader, rejecting as soon as the byte budget / chunk count / per-chunk size is exceeded;requestPeer— this likely requires the sentinelSendRequest/SendPeerRequestgRPC to become server-streaming, sinceResponseData.Datais onebytesfield today.Peak memory would then be ~ one chunk + the decoded object, the right shape for multi-chunk req/resp.
Note
Not a live OOM — #21562 already bounds the allocation by the per-request budget. This is memory-shape hardening, hence a regular enhancement.