Forward LLM completion in a sequential model, concurrently with filtering. by cbart · Pull Request #62111 · sourcegraph/sourcegraph-public-snapshot

cbart · 2024-04-23T09:25:44Z

Closes https://github.com/sourcegraph/sourcegraph/issues/60439
Part of https://github.com/sourcegraph/sourcegraph/issues/61828

Guardrails attribution is panicking the likely cause is a race between (a) attribution search (b) LLM responses being streamed back (c) request timeout and (d) sending back the last seen LLM response after search finishes.

This PR changes the implementation of an LLM filter to a simpler one - in hopes of making code so simple there are obviously no bugs in it, as opposed to code so complex that there are no obvious bugs in it (link).

The way it works is that the handler interacts with the filter via Send and WaitDone calls:

Send is invoked every time LLM streams back a completion prefix (which gets bigger and bigger)
initially the filter forwards all completion prefixes back to the client,
internally the filter fires an attribution search once prefix reaches 10 newline characters and pauses forwarding,
an attribution search with the smallest 10-line-long prefix is fired async,
when the search comes back, we carry on sending the LLM completions.

In practice that last part can happen in any order with:

request being cancelled (and serving resources cleaned up)
LLM finishing streaming response to the sourcegraph instance

One thing former implementation did was that it would memoize the last LLM completion that was not forwarded to the client. Then on the event of attribution search coming back with no results (no attribution = can serve completion) - it would concurrently to the request serving respond back with that last memoized piece.

Proposed implementation makes sure forwarding completions is contained only in the execution stack of the handler (so only in context of Send and WaitDone calls). In case of successful attribution result - we forward the latest unsent prefix within WaitDone an not how it was done previously - synchronized, but concurrently.

Hopefully this way we'll clearly capture request infra lifecycle and not attempt writing to a recycled writer.

Test plan

Existing unit tests
Manual testing

cbart · 2024-04-25T13:40:50Z

 func (a *completionsFilter) Send(ctx context.Context, e types.CompletionResponse) error {
 	if err := ctx.Err(); err != nil {
 		a.blockSending()
+		return err


Send short-circuiting the caller on context error is a reasonable behavior for any filter implementation. Introduced here for compatibility (via test cases) with new implementation.

cbart · 2024-04-25T13:57:40Z

-	require.Equal(t, want, got)
-}
-
-func TestWaitDoneErr(t *testing.T) {


I dropped this test since now on timeout I actually error in Send as well as WaitDone. The test bed with o.replay(ctx, f) call also calls WaitDone which is a funnel that unifies various erroring conduits. Conclusion: Does not seem that relevant to specifically test WaitDone returning an error, and it would require surgical changes to the objects at play.

cbart · 2024-04-25T13:59:01Z

Apologies for the large test diff, but I promise it's 95% indentation because of using a table test with bothImplementations - running the test for V1 and V2 completion filter. The only meaningful change is dropping one irrelevant (now) test that is pointed out below.

cbart · 2024-04-25T19:38:43Z

This is the file to review.

…h into cb/fix-guardrails

cbart · 2024-04-26T10:25:49Z

OK this is now ready to land.

arafatkatze · 2024-04-26T10:47:21Z

+}
+
+// attributionRunFilter implementation of CompletionsFilter that runs attribution search for snippets
+// aboce certain threshold defined by `SnippetLowerBound`.


Suggested change

// aboce certain threshold defined by `SnippetLowerBound`.

// above certain threshold defined by `SnippetLowerBound`.

arafatkatze

Approving the PR to test on S2 and verify for a while to see if the panic errors stop coming up again.

I will enable the feature flag and observe. IF the panics go away and don't show up in the next few weeks we can get rid of the old implementation.

Since there is really not a standard way to replicate the panic race condition of the failure on my local machine I am taking the test on S2 route.

Tests broken

ab477a6

cla-bot Bot added the cla-signed label Apr 23, 2024

cbart added 2 commits April 25, 2024 15:34

Tests run OK

5e33936

Use guardrails v2 completions handler behind feature flag

b6991d6

cbart requested review from a team and keegancsmith April 25, 2024 13:39

cbart marked this pull request as ready for review April 25, 2024 13:39

cbart commented Apr 25, 2024

View reviewed changes

Add some comments

bcee165

cbart commented Apr 25, 2024

View reviewed changes

cbart requested a review from arafatkatze April 25, 2024 13:59

BAZEL fix

aa20dab

cbart commented Apr 25, 2024

View reviewed changes

Comment thread internal/guardrails/attribution_filter2.go

cbart Apr 25, 2024

Copy link
Copy Markdown

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the file to review.

cbart commented Apr 25, 2024

View reviewed changes

Comment thread internal/guardrails/attribution_filter2.go Outdated

cbart and others added 5 commits April 25, 2024 21:53

Update internal/guardrails/attribution_filter2.go

f600d55

gofmt

0a9e8a3

Merge branch 'cb/fix-guardrails' of github.com:sourcegraph/sourcegrap…

bc1c52d

…h into cb/fix-guardrails

Sync main

0087c9c

Sync main

2f5f3cb

cbart mentioned this pull request Apr 26, 2024

Cody Attributions panic #60439

Closed

arafatkatze reviewed Apr 26, 2024

View reviewed changes

arafatkatze approved these changes Apr 26, 2024

View reviewed changes

arafatkatze merged commit d10d4f0 into main Apr 26, 2024

arafatkatze deleted the cb/fix-guardrails branch April 26, 2024 11:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward LLM completion in a sequential model, concurrently with filtering.#62111

Forward LLM completion in a sequential model, concurrently with filtering.#62111
arafatkatze merged 10 commits into
mainfrom
cb/fix-guardrails

cbart commented Apr 23, 2024 •

edited

Loading

Uh oh!

cbart Apr 25, 2024

Uh oh!

cbart Apr 25, 2024

Uh oh!

cbart Apr 25, 2024

Uh oh!

cbart Apr 25, 2024

Uh oh!

Uh oh!

cbart commented Apr 26, 2024

Uh oh!

arafatkatze Apr 26, 2024

Uh oh!

arafatkatze left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// aboce certain threshold defined by `SnippetLowerBound`.
	// above certain threshold defined by `SnippetLowerBound`.

Conversation

cbart commented Apr 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

cbart Apr 25, 2024

Choose a reason for hiding this comment

Uh oh!

cbart Apr 25, 2024

Choose a reason for hiding this comment

Uh oh!

cbart Apr 25, 2024

Choose a reason for hiding this comment

Uh oh!

cbart Apr 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cbart commented Apr 26, 2024

Uh oh!

arafatkatze Apr 26, 2024

Choose a reason for hiding this comment

Uh oh!

arafatkatze left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cbart commented Apr 23, 2024 •

edited

Loading

arafatkatze left a comment •

edited

Loading