Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Forward LLM completion in a sequential model, concurrently with filtering.#62111

Merged
arafatkatze merged 10 commits into
mainfrom
cb/fix-guardrails
Apr 26, 2024
Merged

Forward LLM completion in a sequential model, concurrently with filtering.#62111
arafatkatze merged 10 commits into
mainfrom
cb/fix-guardrails

Conversation

@cbart

@cbart cbart commented Apr 23, 2024

Copy link
Copy Markdown
Contributor

Closes https://github.com/sourcegraph/sourcegraph/issues/60439
Part of https://github.com/sourcegraph/sourcegraph/issues/61828

Guardrails attribution is panicking the likely cause is a race between (a) attribution search (b) LLM responses being streamed back (c) request timeout and (d) sending back the last seen LLM response after search finishes.

This PR changes the implementation of an LLM filter to a simpler one - in hopes of making code so simple there are obviously no bugs in it, as opposed to code so complex that there are no obvious bugs in it (link).

The way it works is that the handler interacts with the filter via Send and WaitDone calls:

  • Send is invoked every time LLM streams back a completion prefix (which gets bigger and bigger)
  • initially the filter forwards all completion prefixes back to the client,
  • internally the filter fires an attribution search once prefix reaches 10 newline characters and pauses forwarding,
  • an attribution search with the smallest 10-line-long prefix is fired async,
  • when the search comes back, we carry on sending the LLM completions.

In practice that last part can happen in any order with:

  • request being cancelled (and serving resources cleaned up)
  • LLM finishing streaming response to the sourcegraph instance

One thing former implementation did was that it would memoize the last LLM completion that was not forwarded to the client. Then on the event of attribution search coming back with no results (no attribution = can serve completion) - it would concurrently to the request serving respond back with that last memoized piece.

Proposed implementation makes sure forwarding completions is contained only in the execution stack of the handler (so only in context of Send and WaitDone calls). In case of successful attribution result - we forward the latest unsent prefix within WaitDone an not how it was done previously - synchronized, but concurrently.

Hopefully this way we'll clearly capture request infra lifecycle and not attempt writing to a recycled writer.

Test plan

  • Existing unit tests
  • Manual testing

@cla-bot cla-bot Bot added the cla-signed label Apr 23, 2024
@cbart cbart requested review from a team and keegancsmith April 25, 2024 13:39
@cbart cbart marked this pull request as ready for review April 25, 2024 13:39
func (a *completionsFilter) Send(ctx context.Context, e types.CompletionResponse) error {
if err := ctx.Err(); err != nil {
a.blockSending()
return err

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send short-circuiting the caller on context error is a reasonable behavior for any filter implementation. Introduced here for compatibility (via test cases) with new implementation.

require.Equal(t, want, got)
}

func TestWaitDoneErr(t *testing.T) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped this test since now on timeout I actually error in Send as well as WaitDone. The test bed with o.replay(ctx, f) call also calls WaitDone which is a funnel that unifies various erroring conduits. Conclusion: Does not seem that relevant to specifically test WaitDone returning an error, and it would require surgical changes to the objects at play.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the large test diff, but I promise it's 95% indentation because of using a table test with bothImplementations - running the test for V1 and V2 completion filter. The only meaningful change is dropping one irrelevant (now) test that is pointed out below.

@cbart cbart requested a review from arafatkatze April 25, 2024 13:59

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the file to review.

Comment thread internal/guardrails/attribution_filter2.go Outdated
@cbart

cbart commented Apr 26, 2024

Copy link
Copy Markdown
Contributor Author

OK this is now ready to land.

@cbart cbart mentioned this pull request Apr 26, 2024
}

// attributionRunFilter implementation of CompletionsFilter that runs attribution search for snippets
// aboce certain threshold defined by `SnippetLowerBound`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// aboce certain threshold defined by `SnippetLowerBound`.
// above certain threshold defined by `SnippetLowerBound`.

@arafatkatze arafatkatze left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the PR to test on S2 and verify for a while to see if the panic errors stop coming up again.

I will enable the feature flag and observe. IF the panics go away and don't show up in the next few weeks we can get rid of the old implementation.

Since there is really not a standard way to replicate the panic race condition of the failure on my local machine I am taking the test on S2 route.

@arafatkatze arafatkatze merged commit d10d4f0 into main Apr 26, 2024
@arafatkatze arafatkatze deleted the cb/fix-guardrails branch April 26, 2024 11:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cody Attributions panic

2 participants