fix(cody): use client-provided timeout for completions requests#63875
Conversation
There was a problem hiding this comment.
@rafax, is it the correct place to perform this check? Do we need to apply this check for all requests or only for those that are going to be sent to the Cody Gateway (as CODY-2775 suggests)?
There was a problem hiding this comment.
I think this is the correct place to perform this check in the Sourcegraph backend (where it will apply to Cody Gateway and BYOK/BYOLLM deployments). I'm not 100% sure if SG is responsible for the cancellation propagation bugs, so we may need to do the same thing in Cody Gateway.
There was a problem hiding this comment.
Completion client's Stream/Complete methods are called with this context (with timeout). So I expect the context cancelation to be handled by the corresponding method of a matching client returned by codyGatewayClient.clientForParams.
I'm new to this part of the codebase, so this may be a wrong assumption. WDYT, @rafax?
There was a problem hiding this comment.
IMO, we should apply this check for all requests, regardless of the completions provider it is sent to. It would be super-lame if the X-Timeout-Ms HTTP header was only honored by Cody Gateway, and completely ignored if the Sourcegraph backend was configured any other way.
There was a problem hiding this comment.
This PR implements the same X-Tineout-Ms header checks both on the Sourcegraph backend and on the Cody Gateway.
valerybugakov
left a comment
There was a problem hiding this comment.
The fast path support for this task is critical because most of the autocomplete traffic goes directly to Cody Gateway.
|
@rafax, @valerybugakov, I updated the PR to use the timeout value provided via the |
|
|
||
| // maxRequestDuration is the maximum amount of time a request can take before | ||
| // being cancelled as DeadlineExceeded. | ||
| const maxRequestDuration = 5 * time.Minute |
There was a problem hiding this comment.
Link to https://github.com/sourcegraph/infrastructure/blob/a697a5c2f3f01fc0886f4c6b6efec2ae5f40f1cb/cody-gateway/modules/cloudrun/cloudrun.tf#L67 that has some extra context?
There was a problem hiding this comment.
Overall, it pains me to see the default timeout so high - we should really default to a shorter timeout (10s) and make long timeouts opt-in, but I know it's not in scope of this change.
There was a problem hiding this comment.
We may want timeout per feature (chat/completions). WDYT, @chrsmith?
There was a problem hiding this comment.
Having different timeouts per-request might make sense... but even then, 5 minutes is still kinda bonkers high.
Like, for example, I'm not willing to wait five minutes to microwave a piece of pizza. Let alone to receive a chat completion. I'd drop this down all the way to a minute at max. And just keep the same timeout for both completions and chat until we have a more compelling reason to. (i.e. we find ourselves consistently hitting this timeout.)
But again, even at one minute it's still overly generous.
There was a problem hiding this comment.
Reduced the timeout to 1 minute for Cody Gateway.
8 minutes timeout remains unchanged for the Sourcegraph backend.
There was a problem hiding this comment.
Reduced the max request duration to 1 minute on the Sourcegraph backend too.
chrsmith
left a comment
There was a problem hiding this comment.
Minor nits, suggestions, and comments. But this all looks good to me as well.
|
|
||
| // maxRequestDuration is the maximum amount of time a request can take before | ||
| // being cancelled as DeadlineExceeded. | ||
| const maxRequestDuration = 5 * time.Minute |
There was a problem hiding this comment.
Having different timeouts per-request might make sense... but even then, 5 minutes is still kinda bonkers high.
Like, for example, I'm not willing to wait five minutes to microwave a piece of pizza. Let alone to receive a chat completion. I'd drop this down all the way to a minute at max. And just keep the same timeout for both completions and chat until we have a more compelling reason to. (i.e. we find ourselves consistently hitting this timeout.)
But again, even at one minute it's still overly generous.
There was a problem hiding this comment.
IMO, we should apply this check for all requests, regardless of the completions provider it is sent to. It would be super-lame if the X-Timeout-Ms HTTP header was only honored by Cody Gateway, and completely ignored if the Sourcegraph backend was configured any other way.
Part of [CODY-2775](https://linear.app/sourcegraph/issue/CODY-2775/%5Bautocomplete-latency%5D-apply-the-same-timeout-on-the-cody-gateway-side) Sends the `timeoutMs` config value to the backend using via `X-Timeout-Ms` header in both default and fast paths. Backend support added in https://github.com/sourcegraph/sourcegraph/pull/63875 ## Test plan Manually tested (using a debugger) alongside https://github.com/sourcegraph/sourcegraph/pull/63875 on both default and fast paths, using a locally running Sourcegraph instance and Cody Gateway. <!-- Required. See https://sourcegraph.com/docs/dev/background-information/testing_principles. -->
Closes CODY-2775
Enables client control over the request processing timeout on the server (both Sourcegraph backend and Cody Gateway). The context timeout is set to the value provided in the
X-Timeout-Msheader of the client request. If the header is not provided, the default context timeout is used (1 minute on both Sourcegraph backend and Cody Gateway).Previously, we only had a default timeout on the Sourcegraph backend side (8 minutes).
Corresponding client change:
Test plan
X-Timeout-Msheader, its value is used. If not, the default maximum request duration is applied.Changelog