Allow configuring Serve control loop interval, add related docs#45063
Conversation
Signed-off-by: Josh Karpel <josh.karpel@gmail.com>
Signed-off-by: Josh Karpel <josh.karpel@gmail.com> Signed-off-by: Josh Karpel <josh.karpel@gmail.com>
|
|
||
| You can set an end-to-end timeout for HTTP requests by setting the `request_timeout_s` in the `http_options` field of the Serve config. HTTP Proxies will wait for that many seconds before terminating an HTTP request. This config is global to your Ray cluster, and it cannot be updated during runtime. Use [client-side retries](serve-best-practices-http-requests) to retry requests that time out due to transient failures. | ||
|
|
||
| ### Give the Serve Controller more time to process requests |
There was a problem hiding this comment.
Took the liberty of adding a section here in case others run into the same issue. Please feel free to reword as desired, not sure what level of detail you want here :)
| # How often to call the control loop on the controller. | ||
| CONTROL_LOOP_PERIOD_S = 0.1 | ||
| # How long to sleep between control loop cycles on the controller. | ||
| CONTROL_LOOP_INTERVAL_S = float(os.getenv("RAY_SERVE_CONTROL_LOOP_INTERVAL_S", 0.1)) |
There was a problem hiding this comment.
I thought INTERVAL made more sense than PERIOD as the name, since it's the time between cycles, not a target for when the next cycle starts.
| # Only actually scale the replicas if we've made this decision for | ||
| # 'scale_up_consecutive_periods' in a row. | ||
| if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_PERIOD_S): | ||
| if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_INTERVAL_S): |
There was a problem hiding this comment.
Seems like the interval is used in a few other places to count control loop cycles - am I breaking some assumption by allowing it to be configurable to some larger value (e.g., does this still make sense if the loop interval is large)?
There was a problem hiding this comment.
I don't believe so -- but @zcin should confirm
There was a problem hiding this comment.
I don't think this breaks any assumptions, if upscale delay < control loop interval, then the intervals between cycles that the controller sleeps for already inherently "covers" the required delay, so this code still makes sense.
| # Only actually scale the replicas if we've made this decision for | ||
| # 'scale_up_consecutive_periods' in a row. | ||
| if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_PERIOD_S): | ||
| if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_INTERVAL_S): |
There was a problem hiding this comment.
I don't believe so -- but @zcin should confirm
| # How often to call the control loop on the controller. | ||
| CONTROL_LOOP_PERIOD_S = 0.1 | ||
| # How long to sleep between control loop cycles on the controller. | ||
| CONTROL_LOOP_INTERVAL_S = float(os.getenv("RAY_SERVE_CONTROL_LOOP_INTERVAL_S", 0.1)) |
|
Thanks for the quick reviews! Much appreciated! |
Why are these changes needed?
In our experiments, adjusting this value upward helps the Serve Controller keep up with a large number of autoscaling metrics pushes from a large number of
DeploymentHandles (because the loop body is blocking, so increasing the interval lets more other code when the control loop isn't running), at the cost of control loop responsiveness (since it doesn't run as often).Related issue number
Closes #44784 ... for now!
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.