self hosted models#63899
Conversation
…openaicompatible Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
chrsmith
left a comment
There was a problem hiding this comment.
I see what you have going on here, and have some high-level notes about the direction. But I'm happy to approve it so you can keep moving forward on this.
Though it might be nice to put some stakes in the ground and plan on unifying the "generic provider x openai" with "openaicompatible". As right now there is { azureopenai, generic/openai, openaicompatible } and that is definitely not a good position to be in.
... but we probably aren't ready to say if or when the "openai compatible" configuration knobs can be tuned in such a way that the codepath is a drop-in replacement for our completions/client/openai package.
WDYT?
| "items": { | ||
| "type": "string", | ||
| "enum": [ | ||
| "bigcode::v1::starcoder2-3b", |
There was a problem hiding this comment.
Just so I understand, this would be a "breaking change", since any existing site configs that were using some of these model references would start to fail?
So maybe a comment in the JSON file (if site.schema.json supports that non-standard JSON extension), to call that out?
There was a problem hiding this comment.
It is technically a breaking change, yes. However, since we haven't documented or shared this configuration with anyone (and, if they did find and set it on their own already.. I am pretty sure their instance would not be functional) - I feel comfortable removing it.
Unfortunately we don't have doc comments in site.schema.json - just description fields.
| "fireworks", | ||
| "google", | ||
| "openai", | ||
| "huggingface-tgi", |
There was a problem hiding this comment.
Maybe this is intentional, but this looks wrong.
Do we want to expose the "type" here and introduce huggingface-tgi as a "new form" of server-side provider config? I would have thought that we'd want an admin to express this with something like:
serverSideProviderConfig: {
type: "openaicompatible",
serviceProviderName: "huggingface-tgi",
...
}
It isn't a big deal, but again, I'd love for us to avoid the "sprawl" associated with making it look like we support N-different top-level providers. When in actuality, it's a smaller number. And "huggingface-tgi" is just a specific set of defaults/overrides on top of "openaicompatible".
Right?
There was a problem hiding this comment.
update: we chatted about this in our zoom call earlier today, and agreed this is okay because we really are trying to communicate that this is a top-level provider, well-supported, well-tested, etc.
The intention is that:
- openaicompatible provider is the 'super generic' one, that can connect to arbitrary things.
- huggingface-tgi is very specific, tested by us, etc. - it just happens to use the same implementation as
openaicompatibleprovider.
| "pointer": true | ||
| }, | ||
| "default": null, | ||
| "description": "Advanced configuration options that are only respected if the model is provided by an openaicompatible provider.", |
There was a problem hiding this comment.
This is good to call out. But two things:
- I'll mention the confusing around "openaicompatible provider" and the "huggingface-tgi"
typeof provider. - I am 100% certain we'll have some sort of generic "ClientSideModelOverrides" type that will overlap considerably with this. (So we can run experiments, or fine-tune things, etc.)
So this is fine as-is, but just calling out where I think things will evolve.
| package openaicompatible | ||
|
|
||
| // openAIChatCompletionsRequestParameters request object for openAI chat endpoint https://platform.openai.com/docs/api-reference/chat/create | ||
| type openAIChatCompletionsRequestParameters struct { |
There was a problem hiding this comment.
This is following the pattern we use for all of our other completion API providers. But having the data types not be exported makes it more difficult to test things in the frontend/internal/httpapi/completions package.
So if you are feeling saucy, we may want to export the data types from this package, so we can have integration tests that use the actual request/response types elsewhere. (Or not, just something to consider.)
arafatkatze
left a comment
There was a problem hiding this comment.
I took a long look at the PR and I didn't find any major issues but I do have questions:
-
Will you eventually be enabling Hugging face TGI so anyone can test this PR too? Right now I don't have a reference point to compare this with besides OpenAI code but I think the devil is in the details and unless I can actually get to run it and get the response back to cody and run it through a debugger it would be difficult for me to confidently say that this works great.
-
When you say openai compatible does that mean that both the request format and response format is EXACTLY the same as openAI meaning that even if you are sending one form of request to the OpenAI endpoint the functionality you need to read the SSE events coming back from the API might be slightly different and if that were to be the case then its harder to use the same code(This is exactly why I had issues with Claude in Google Vertex because it was similar to Anthropic API but not exactly the same so I had to create separate code). You mention that you have ran this on a few models already(in the description) so I assume the answer is a yes but I wanted to confirm again about this.
Yep, I fully agree with this. |
|
@arafatkatze thanks for taking a look, appreciate it!
But for software that doesn't advertise 'we follow OpenAI-API specification', then this provider isn't intended for that. |
|
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
These are the initial client changes needed for Cody Enterprise customers to be able to use self-hosted models. Specifically, this PR pairs with https://github.com/sourcegraph/sourcegraph/pull/63899 The first commit here is mostly changes @jamesmcnamara made to help me wire through the `Model` to the actual location where we write autocomplete providers. The following commits are me introducing a new generic `openaicompatible` autocomplete provider, which will supersede/replace the older experimental one which was purely client-side. Combined with https://github.com/sourcegraph/sourcegraph/pull/63899 - these changes are enough to have at least Chat and Autocomplete working fairly well using a self-hosted Starcoder2 and Mixtral 8x7b Instruct model. In this PR, the autocomplete provider is mostly tailored towards models in those families (starcoder and mixtral/mistral instruct models) - to unblock some customer use-cases - but all of the `Model` `clientSideConfiguration` options piped down to this autocomplete provider should be enough to generalize this approach to any model in the future. In a follow-up PR, I will begin making use of that `clientSideConfiguration` to enable using other self-hosted models and other use-cases (letting the site admin customize prompting, etc.) ## Test plan * The first commit from James has nice unit tests. * The new autocomplete provider doesn't yet, I will add these in a future PR. For now, it is manually tested on my end using: * A local dev Sourcegraph instance configured as described in https://github.com/sourcegraph/sourcegraph/pull/63899 * A remote server hosting 2 huggingface TGI Docker containers, with starchat2 and mixtral models. * A local dev build of VS Code Cody. --------- Signed-off-by: jamesmcnamara <james.mcnamara@sourcegraph.com> Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com> Co-authored-by: jamesmcnamara <james.mcnamara@sourcegraph.com>
…openaicompatible Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
This PR is stacked on top of all the prior work @chrsmith has done for shuffling configuration data around; it implements the new "Self hosted models" functionality.
Configuration
Configuring a Sourcegraph instance to use self-hosted models basically involves adding some configuration like this to the site config (if you set
modelConfiguration, you are opting in to the new system which is in early access):More advanced configurations are possible, the above is our blessed configuration for today.
Hosting models
Another major component of this work is starting to build up recommendations around how to self-host models, which ones to use, how to configure them, etc.
For now, we've been testing with these two on a machine with dual A100s:
bigcode/starcoder2-15bwitheetq8-bit quantization.casperhansen/mixtral-instruct-awqwhich usesawq4-bit quantization.This is our 'starter' configuration. Other models - specifically other starcoder 2, and mixtral instruct models - certainly work too, and higher parameter versions may of course provide better results.
Documentation for how to deploy Huggingface TGI, suggested configuration and debugging tips - coming soon.
Advanced configuration
As part of this effort, I have added a quite extensive set of configuration knobs to to the client side model configuration (see
type ClientSideModelConfigOpenAICompatiblein this PR)Some of these configuration options are needed for things to work at a basic level, while others (e.g. prompt customization) are not needed for basic functionality, but are very important for customers interested in self-hosting their own models.
Today, Cody clients have a number of different autocomplete provider implementations which tie model-specific logic to enable autocomplete, to a provider. For example, if you use a GPT model through Azure OpenAI, the autocomplete provider for that is entirely different from what you'd get if you used a GPT model through OpenAI officially. This can lead to some subtle issues for us, and so it is worth exploring ways to have a generalized autocomplete provider - and since with self-hosted models we must address this problem, these configuration knobs fed to the client from the server are a pathway to doing that - initially just for self-hosted models, but in the future possibly generalized to other providers.
Debugging facilities
Working with customers in the past to use OpenAI-compatible APIs, we've learned that debugging can be quite a pain. If you can't see what requests the Sourcegraph backend is making, and what it is getting back.. it can be quite painful to debug.
This PR implements quite extensive logging, and a
debugConnectionsflag which can be turned on to enable logging of the actual request payloads and responses. This is critical when a customer is trying to add support for a new model, their own custom OpenAI API service, etc.Robustness
Working with customers in the past, we also learned that various parts of our backend
openaiprovider were not super robust. For example, if more than one message was present it was a fatal error, or if the SSE stream yielded{"error"}payloads, they would go ignored. Similarly, the SSE event stream parser we use is heavily tailored towards the exact response structure which OpenAI's official API returns, and is therefor quite brittle if connecting to a different SSE stream.For this work, I have started by forking our
internal/completions/client/openai- and made a number of major improvements to it to make it more robust, handle errors better, etc.I have also replaced the usage of a custom SSE event stream parser - which was not spec compliant and brittle - with a proper SSE event stream parser that recently popped up in the Go community: https://github.com/tmaxmax/go-sse
My intention is that after more extensive testing, this new
internal/completions/client/openaicompatibleprovider will be more robust, more correct, and all around better thaninternal/completions/client/openai(and possibly the azure one) so that we can just supersede those with this newopenaicompatibleone entirely.Client implementation
Much of the work done in this PR is just "let the site admin configure things, and broadcast that config to the client through the new model config system."
Actually getting the clients to respect the new configuration, is a task I am tackling in future
sourcegraph/codyPRs.Test plan
debugConnectionsmechanism (which customers would use) to directly confirm requests are going to the right places, with the right data and payloads.Can we use more testing? Hell yeah, and I'm going to add it soon. Does it work quite well and have small room for error? Also yes.
Changelog
Cody Enterprise: added a new configuration for self-hosting models. Reach out to support if you would like to use this feature as it is in early access.