Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Return 'sourcegraph' as the CodyLLMConfigurationResolver.Provider#64276

Merged
chrsmith merged 4 commits into
mainfrom
chrsmith/PRIME-426/fix-autocomplete
Aug 6, 2024
Merged

Return 'sourcegraph' as the CodyLLMConfigurationResolver.Provider#64276
chrsmith merged 4 commits into
mainfrom
chrsmith/PRIME-426/fix-autocomplete

Conversation

@chrsmith

@chrsmith chrsmith commented Aug 5, 2024

Copy link
Copy Markdown
Contributor

This PR fixes Cody autocomplete in some situations, fixing PRIME-426.

Our story begins with this PR: https://github.com/sourcegraph/sourcegraph/pull/63886. The PR updated the CodyLLMConfigurationResolver.Provider GraphQL endpoint to no longer return the "provider" that was put into the site configuration, but to instead return the display name of the provider, or if multiple were configured the string "various".

- func (c *codyLLMConfigurationResolver) Provider() string {
- 	return string(c.config.Provider)
- }
+ func (c *codyLLMConfigurationResolver) Provider() string {
+	if len(r.modelconfig.Providers) != 1 {
+		return "various"
+	}
+	return r.modelconfig.Providers[0].DisplayName
+}

This change was wrong on several levels, and unfortunately stemmed from the original author (me) not tracking down and understanding how the GraphQL endpoint was actually used.

Once we discovered the problem, we quickly rectified this by changing the behavior to the more correct version of returning the Provider ID of the code completion model with https://github.com/sourcegraph/sourcegraph/pull/64165:

func (r *codyLLMConfigurationResolver) Provider() string {
	return string(r.modelconfig.DefaultModels.CodeCompletion.ProviderID())
}

However, after some more testing we discovered yet another edge case. We didn't realize that the provider "sourcegraph" (which is how you configure Sourcegraph to use Cody Gateway, using the older site config method) was required in some scenarios on the Cody client.

So the new logic, of returning the Provider ID was incorrect. (Because we report the model provider, e.g. "anthropic". Not the API provider, e.g. "sourcegraph"/"Cody Gateway".)

With this change, we should just have the behavior we had initially: returning whatever the admin had configured in the "completions.provider" section of the site config. If only the newer modelconfig settings were used, we return "sourcegraph" if using Cody Gatway. And if not, just return the Provider ID. (Though in some situations even that will lead to incorrect results, because ultimately we need to update the client here.)

Test plan

I wasn't able to test this super-well manually, and am relying on Stephen and Taras' knowledge of how the client uses this today.

Changelog

NA

@cla-bot cla-bot Bot added the cla-signed label Aug 5, 2024
@emidoots

emidoots commented Aug 5, 2024

Copy link
Copy Markdown
Member

I was able to confirm locally this appears to be returning the right results in the case of "completions" with a "sourcegraph" provider:

image

I wasn't able to confirm autocomplete actually works with that configuration locally, because of a different issue (probably the one you ran into, screenshot below) - but suspect it will work once deployed to non-dev environments.

image

@taras-yemets

Copy link
Copy Markdown
Contributor

Thanks for looking into it, Chris!

To test these changes we need to do the following:

  • sourcegraph/sourcegraph
    • Run sg start with the proper config in dev-private (we may start with the default one from the main branch).
  • sourcegraph/cody
    • Make resolveDefaultModelFromVSCodeConfigOrFeatureFlags always return null (so that we only rely on the config from the server).
    • Remove this.isLocalInstance check so that we don't use the fast path for "fireworks" (default provider) - it didn't work for me when connected ti the enterprise instance. I think found why, but anyway it's out of scope of this PR.
    • Launch VS Code extension from the debug menu (F5 or "Launch VS Code extension (Desktop, recommended).
    • In the new window opened by debugger ensure you're connected to the local instance and check if completions work.

@taras-yemets taras-yemets left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my recollection and understanding of the client code, a model name needs to be prefixed with the provider only when using an umbrella provider like “sourcegraph”. If the “provider” field in the site configuration is set to a specific LLM provider, the model name is not prefixed with the provider name.

Consider these configs:

{ provider: "sourcegraph", model: "fireworks/starcoder"}

and

{provider: "anthropic", model: "claude-instant-v1"

I’m not entirely certain, but I suspect that if the provider is not “sourcegraph,” the model name should not include the provider prefix. In other words, when the provider is “anthropic,” for example, the client likely expects “claude-instant-v1” as the completionModel, rather than “anthropic/claude-instant-v1”.
So we may need to update the logic for the CompletionModel resolver.

@taras-yemets

Copy link
Copy Markdown
Contributor

I wasn't able to confirm autocomplete actually works with that configuration locally, because of a different issue (probably the one you ran into, screenshot below) - but suspect it will work once deployed to non-dev environments.

@slimsag, removing isLocalInstance check https://github.com/sourcegraph/sourcegraph/pull/64276#issuecomment-2269936906 helped me when running this locally.

// just use that. This matches the behavior prior to modelconfig being used, and will support
// the case where "sourcegraph" is the configured provider.
if r.doNotUseCompletionsConfig != nil {
return r.doNotUseCompletionsConfig.Provider

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may(?) have an issue here: if both "completions" and "modelConfiguration" are specified, then doNotUseCompletionsConfig would be != nil and we currently don't have any site config validation that screams about both being set as an error

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that that would be a problem, if you have both "completions" and "modelConfiguration" specified then if the two disagree about the API provider for code completion, we'd return the wrong one.

... but I'm not sure if we care enough to fix this? Or that this is expected to be a legitimate problem?

I think the right solution for this would be to add the site config validator to bark if you have provided two different configuration blobs for Cody Enterprise. However, I don't know if we want to make that change ~a day before cutting the August release.

So I'd be able to be convinced otherwise, but I think this fall into the "won't fix" category. What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I think we should do:

  1. Do NOT introduce a site config validator for this case, because this complicates cloud rollout and is risky to do ~a day before the Aug release as you noted.
  2. Make sure this condition properly respects "modelConfiguration" NOT "completions" if both are specified.

I think you can achieve this by changing the conditional:

-if r.doNotUseCompletionsConfig != nil {
+if r.doNotUseCompletionsConfig != nil && !conf.UseExperimentalModelConfiguration() {

The reason I think we need to handle this now / in this PR is because this will introduce another variable - where before we can expect that if you have both specified, "completions" would be ignored.. after this PR if you have both specified 'weird things may happen' which is not good

@emidoots

emidoots commented Aug 5, 2024

Copy link
Copy Markdown
Member

I’m not entirely certain, but I suspect that if the provider is not “sourcegraph,” the model name should not include the provider prefix. In other words, when the provider is “anthropic,” for example, the client likely expects “claude-instant-v1” as the completionModel, rather than “anthropic/claude-instant-v1”.
So we may need to update the logic for the CompletionModel resolver.

@taras-yemets good catch. I think(?) that may only affect older Cody clients connecting to newer Sourcegraph instances - which absolutely happen.

@taras-yemets

Copy link
Copy Markdown
Contributor

I think(?) that only affect older Cody clients connecting to newer Sourcegraph instances

@slimsag, this branch of code is from current main. If I understand the parsing logic correctly, it's not able to parse { provider: "anthropic", modelId: "anthropic/claude-instant-v1" } properly: it'll just return the params as they are, and later the request to the upstream will probably fail as the model name/id "anthropic/claude-instant-v1" may not be found.

@emidoots

emidoots commented Aug 5, 2024

Copy link
Copy Markdown
Member

@taras-yemets note parseProviderAndModel is only called if ModelsService.getDefaultModel(ModelUsage.Autocomplete) doesn't return a model (code). I need to check which condition that returns a model or not.. but I think it's an old vs. new server split?

@taras-yemets

Copy link
Copy Markdown
Contributor

@slimsag, you're right! I missed that condition.

@taras-yemets

Copy link
Copy Markdown
Contributor

Cody for VS Code has a special case (delimiter) for "aws-bedrock" provider, but the new LLM config resolution returns provider "anthropic" when "aws-bedrock" is set as provider in the site config, so that special case won't work on clients when connected to the Sourcegraph instance with the new version.

☝🏻 one more observation for the "old" way to set up Cody in the site config (from PRIME-426)

@chrsmith

chrsmith commented Aug 5, 2024

Copy link
Copy Markdown
Contributor Author

@taras-yemets , @slimsag it isn't clear from the product documentation. But it does make sense that if you are using "completions" { "provider": "sourcegraph" } then you would need to qualify model names, e.g. "chatModel": "anthropic/claude-v1"

https://sourcegraph.com/docs/cody/core-concepts/cody-gateway#configuring-custom-models

I'll update this PR to qualify model names in those situations... does that sound right to you two? Or perhaps we should start a Slack thread or something?


Ugh, yes. Here's a crystal clear example from what looks to be like HTTP traffic recordings in the cody repo.
https://github.com/sourcegraph/cody/blob/066d9c6ff48beb96a834f17021affc4e62094415/agent/recordings/autocomplete_2136217965/recording.har.yaml#L[…]15

I'll update that PR to just serve that older behavior.

  • If you are using the older "completions" config, we'll just return the provider or model name directly as-is. (So if you entered "anthropic/claude-v1" as the chat mode, that's what we send to clients.)
  • If you are using the newer "modelConfiguration" config, then it'll require some more effort. But until we update Cody clients to use the newer style model data natively, we will want to keep older versions of the Cody client working as-is.

@emidoots emidoots left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're on the right track here. General approval to unblock, feel free to merge this whenever you please. Whatever you land on will be more correct than what is in main and will unblock testing, so I'm happy to review post-merge.

@chrsmith

chrsmith commented Aug 6, 2024

Copy link
Copy Markdown
Contributor Author

Merging to get these fixes in, but please take a look when you get in tomorrow @slimsag and @taras-yemets .

Rather than just complicate things, I just implemented two different GraphQL resolvers. One that matches the older behavior, which is used if you are only setting the older "completions" site config. (So there should not be any perceivable change for existing Cody Enterprise customers.)

The other resolver is only used if the "modelConfiguration" site config data is provided. And there we do all sorts of checks to refer to the API Provider (e.g. "aws-bedrock" or "sourcegraph" instead of the Model Provider like "anthropic".) Also, we return the model name instead of the model ID. Which is a small distinction, but hopefully something that will avoid errors because of a mismatch between what the client expects and the newer system.

I also added some unit tests to assert the behavior that we have in case we need to tweak this again in the future.

@chrsmith chrsmith merged commit a310035 into main Aug 6, 2024
@chrsmith chrsmith deleted the chrsmith/PRIME-426/fix-autocomplete branch August 6, 2024 02:15
@chrsmith

chrsmith commented Aug 6, 2024

Copy link
Copy Markdown
Contributor Author

With the unit tests and some manual testing from Aravind's doc I feel good enough about this to submit. Will continue to do more testing and see if there are more corner cases to account for.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants