Forward LLM completion in a sequential model, concurrently with filtering.#62111
Conversation
| func (a *completionsFilter) Send(ctx context.Context, e types.CompletionResponse) error { | ||
| if err := ctx.Err(); err != nil { | ||
| a.blockSending() | ||
| return err |
There was a problem hiding this comment.
Send short-circuiting the caller on context error is a reasonable behavior for any filter implementation. Introduced here for compatibility (via test cases) with new implementation.
| require.Equal(t, want, got) | ||
| } | ||
|
|
||
| func TestWaitDoneErr(t *testing.T) { |
There was a problem hiding this comment.
I dropped this test since now on timeout I actually error in Send as well as WaitDone. The test bed with o.replay(ctx, f) call also calls WaitDone which is a funnel that unifies various erroring conduits. Conclusion: Does not seem that relevant to specifically test WaitDone returning an error, and it would require surgical changes to the objects at play.
There was a problem hiding this comment.
Apologies for the large test diff, but I promise it's 95% indentation because of using a table test with bothImplementations - running the test for V1 and V2 completion filter. The only meaningful change is dropping one irrelevant (now) test that is pointed out below.
There was a problem hiding this comment.
This is the file to review.
|
OK this is now ready to land. |
| } | ||
|
|
||
| // attributionRunFilter implementation of CompletionsFilter that runs attribution search for snippets | ||
| // aboce certain threshold defined by `SnippetLowerBound`. |
There was a problem hiding this comment.
| // aboce certain threshold defined by `SnippetLowerBound`. | |
| // above certain threshold defined by `SnippetLowerBound`. |
There was a problem hiding this comment.
Approving the PR to test on S2 and verify for a while to see if the panic errors stop coming up again.
I will enable the feature flag and observe. IF the panics go away and don't show up in the next few weeks we can get rid of the old implementation.
Since there is really not a standard way to replicate the panic race condition of the failure on my local machine I am taking the test on S2 route.
Closes https://github.com/sourcegraph/sourcegraph/issues/60439
Part of https://github.com/sourcegraph/sourcegraph/issues/61828
Guardrails attribution is panicking the likely cause is a race between (a) attribution search (b) LLM responses being streamed back (c) request timeout and (d) sending back the last seen LLM response after search finishes.
This PR changes the implementation of an LLM filter to a simpler one - in hopes of making code so simple there are obviously no bugs in it, as opposed to code so complex that there are no obvious bugs in it (link).
The way it works is that the handler interacts with the filter via
SendandWaitDonecalls:Sendis invoked every time LLM streams back a completion prefix (which gets bigger and bigger)In practice that last part can happen in any order with:
One thing former implementation did was that it would memoize the last LLM completion that was not forwarded to the client. Then on the event of attribution search coming back with no results (no attribution = can serve completion) - it would concurrently to the request serving respond back with that last memoized piece.
Proposed implementation makes sure forwarding completions is contained only in the execution stack of the handler (so only in context of
SendandWaitDonecalls). In case of successful attribution result - we forward the latest unsent prefix withinWaitDonean not how it was done previously - synchronized, but concurrently.Hopefully this way we'll clearly capture request infra lifecycle and not attempt writing to a recycled writer.
Test plan