Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

self hosted models#63899

Merged
emidoots merged 34 commits into
mainfrom
sg/self-hosted-models
Jul 19, 2024
Merged

self hosted models#63899
emidoots merged 34 commits into
mainfrom
sg/self-hosted-models

Conversation

@emidoots

Copy link
Copy Markdown
Member

This PR is stacked on top of all the prior work @chrsmith has done for shuffling configuration data around; it implements the new "Self hosted models" functionality.

Configuration

Configuring a Sourcegraph instance to use self-hosted models basically involves adding some configuration like this to the site config (if you set modelConfiguration, you are opting in to the new system which is in early access):

  // Setting this field means we are opting into the new Cody model configuration system.
  "modelConfiguration": {
    // Disable use of Sourcegraph's servers for model discovery
    "sourcegraph": null,

    // Create two model providers
    "providerOverrides": [
      {
        // Our first model provider "mistral" will be a Huggingface TGI deployment which hosts our
        // mistral model for chat functionality.
        "id": "mistral",
        "displayName": "Mistral",
        "serverSideConfig": {
          "type": "huggingface-tgi",
          "endpoints": [{"url": "https://mistral.example.com/v1"}]
        },
      },
      {
        // Our second model provider "bigcode" will be a Huggingface TGI deployment which hosts our
        // bigcode/starcoder model for code completion functionality.
        "id": "bigcode",
        "displayName": "Bigcode",
        "serverSideConfig": {
          "type": "huggingface-tgi",
          "endpoints": [{"url": "http://starcoder.example.com/v1"}]
        }
      }
    ],

    // Make these two models available to Cody users
    "modelOverridesRecommendedSettings": [
      "mistral::v1::mixtral-8x7b-instruct",
      "bigcode::v1::starcoder2-7b"
    ],

    // Configure which models Cody will use by default
    "defaultModels": {
      "chat": "mistral::v1::mixtral-8x7b-instruct",
      "fastChat": "mistral::v1::mixtral-8x7b-instruct",
      "codeCompletion": "bigcode::v1::starcoder2-7b"
    }
  }

More advanced configurations are possible, the above is our blessed configuration for today.

Hosting models

Another major component of this work is starting to build up recommendations around how to self-host models, which ones to use, how to configure them, etc.

For now, we've been testing with these two on a machine with dual A100s:

  • Huggingface TGI (this is a Docker container for model inference, which provides an OpenAI-compatible API - and is widely popular)
  • Two models:
    • Starcoder2 for code completion; specifically bigcode/starcoder2-15b with eetq 8-bit quantization.
    • Mixtral 8x7b instruct for chat; specifically casperhansen/mixtral-instruct-awq which uses awq 4-bit quantization.

This is our 'starter' configuration. Other models - specifically other starcoder 2, and mixtral instruct models - certainly work too, and higher parameter versions may of course provide better results.

Documentation for how to deploy Huggingface TGI, suggested configuration and debugging tips - coming soon.

Advanced configuration

As part of this effort, I have added a quite extensive set of configuration knobs to to the client side model configuration (see type ClientSideModelConfigOpenAICompatible in this PR)

Some of these configuration options are needed for things to work at a basic level, while others (e.g. prompt customization) are not needed for basic functionality, but are very important for customers interested in self-hosting their own models.

Today, Cody clients have a number of different autocomplete provider implementations which tie model-specific logic to enable autocomplete, to a provider. For example, if you use a GPT model through Azure OpenAI, the autocomplete provider for that is entirely different from what you'd get if you used a GPT model through OpenAI officially. This can lead to some subtle issues for us, and so it is worth exploring ways to have a generalized autocomplete provider - and since with self-hosted models we must address this problem, these configuration knobs fed to the client from the server are a pathway to doing that - initially just for self-hosted models, but in the future possibly generalized to other providers.

Debugging facilities

Working with customers in the past to use OpenAI-compatible APIs, we've learned that debugging can be quite a pain. If you can't see what requests the Sourcegraph backend is making, and what it is getting back.. it can be quite painful to debug.

This PR implements quite extensive logging, and a debugConnections flag which can be turned on to enable logging of the actual request payloads and responses. This is critical when a customer is trying to add support for a new model, their own custom OpenAI API service, etc.

Robustness

Working with customers in the past, we also learned that various parts of our backend openai provider were not super robust. For example, if more than one message was present it was a fatal error, or if the SSE stream yielded {"error"} payloads, they would go ignored. Similarly, the SSE event stream parser we use is heavily tailored towards the exact response structure which OpenAI's official API returns, and is therefor quite brittle if connecting to a different SSE stream.

For this work, I have started by forking our internal/completions/client/openai - and made a number of major improvements to it to make it more robust, handle errors better, etc.

I have also replaced the usage of a custom SSE event stream parser - which was not spec compliant and brittle - with a proper SSE event stream parser that recently popped up in the Go community: https://github.com/tmaxmax/go-sse

My intention is that after more extensive testing, this new internal/completions/client/openaicompatible provider will be more robust, more correct, and all around better than internal/completions/client/openai (and possibly the azure one) so that we can just supersede those with this new openaicompatible one entirely.

Client implementation

Much of the work done in this PR is just "let the site admin configure things, and broadcast that config to the client through the new model config system."

Actually getting the clients to respect the new configuration, is a task I am tackling in future sourcegraph/cody PRs.

Test plan

  1. This change currently lacks any unit/regression tests, that is a major noteworthy point. I will follow-up with those in a future PR.
  • However, these changes are incredibly isolated, clearly only affecting customers who opt-in to this new self-hosted models configuration.
  • Most of the heavy lifting (SSE streaming, shuffling data around) is done in other well-tested codebases.
  1. Manual testing has played a big role here, specifically:
  • Running a dev instance with the new configuration, actually connected to Huggingface TGI deployed on a remote server.
  • Using the new debugConnections mechanism (which customers would use) to directly confirm requests are going to the right places, with the right data and payloads.
  • Confirming with a new client (changes not yet landed) that autocomplete and chat functionality work.

Can we use more testing? Hell yeah, and I'm going to add it soon. Does it work quite well and have small room for error? Also yes.

Changelog

Cody Enterprise: added a new configuration for self-hosting models. Reach out to support if you would like to use this feature as it is in early access.

Stephen Gutekanst added 21 commits July 17, 2024 17:08
…openaicompatible

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
@emidoots emidoots requested review from a team, arafatkatze and chrsmith July 18, 2024 01:35
"encoding/json"
"fmt"
"io"
"math/rand"

Check warning

Code scanning / Semgrep OSS

Semgrep Finding: security-semgrep-rules.semgrep-rules.golang.math-random-used

Do not use `math/rand`. Use `crypto/rand` instead.
@cla-bot cla-bot Bot added the cla-signed label Jul 18, 2024
Stephen Gutekanst added 4 commits July 17, 2024 18:37
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Stephen Gutekanst added 3 commits July 17, 2024 20:27
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

@chrsmith chrsmith left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you have going on here, and have some high-level notes about the direction. But I'm happy to approve it so you can keep moving forward on this.

Though it might be nice to put some stakes in the ground and plan on unifying the "generic provider x openai" with "openaicompatible". As right now there is { azureopenai, generic/openai, openaicompatible } and that is definitely not a good position to be in.

... but we probably aren't ready to say if or when the "openai compatible" configuration knobs can be tuned in such a way that the codepath is a drop-in replacement for our completions/client/openai package.

WDYT?

Comment thread internal/modelconfig/types/configuration.go Outdated
Comment thread internal/modelconfig/types/configuration.go
Comment thread internal/modelconfig/types/configuration.go Outdated
Comment thread internal/modelconfig/types/configuration.go Outdated
Comment thread internal/modelconfig/types/configuration.go Outdated
Comment thread schema/site.schema.json
"items": {
"type": "string",
"enum": [
"bigcode::v1::starcoder2-3b",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I understand, this would be a "breaking change", since any existing site configs that were using some of these model references would start to fail?

So maybe a comment in the JSON file (if site.schema.json supports that non-standard JSON extension), to call that out?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is technically a breaking change, yes. However, since we haven't documented or shared this configuration with anyone (and, if they did find and set it on their own already.. I am pretty sure their instance would not be functional) - I feel comfortable removing it.

Unfortunately we don't have doc comments in site.schema.json - just description fields.

Comment thread schema/site.schema.json
"fireworks",
"google",
"openai",
"huggingface-tgi",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is intentional, but this looks wrong.

Do we want to expose the "type" here and introduce huggingface-tgi as a "new form" of server-side provider config? I would have thought that we'd want an admin to express this with something like:

serverSideProviderConfig: {
    type: "openaicompatible",
    serviceProviderName: "huggingface-tgi",
    ...
}

It isn't a big deal, but again, I'd love for us to avoid the "sprawl" associated with making it look like we support N-different top-level providers. When in actuality, it's a smaller number. And "huggingface-tgi" is just a specific set of defaults/overrides on top of "openaicompatible".

Right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: we chatted about this in our zoom call earlier today, and agreed this is okay because we really are trying to communicate that this is a top-level provider, well-supported, well-tested, etc.

The intention is that:

  • openaicompatible provider is the 'super generic' one, that can connect to arbitrary things.
  • huggingface-tgi is very specific, tested by us, etc. - it just happens to use the same implementation as openaicompatible provider.

Comment thread schema/site.schema.json Outdated
Comment thread schema/site.schema.json
"pointer": true
},
"default": null,
"description": "Advanced configuration options that are only respected if the model is provided by an openaicompatible provider.",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to call out. But two things:

  • I'll mention the confusing around "openaicompatible provider" and the "huggingface-tgi" type of provider.
  • I am 100% certain we'll have some sort of generic "ClientSideModelOverrides" type that will overlap considerably with this. (So we can run experiments, or fine-tune things, etc.)

So this is fine as-is, but just calling out where I think things will evolve.

package openaicompatible

// openAIChatCompletionsRequestParameters request object for openAI chat endpoint https://platform.openai.com/docs/api-reference/chat/create
type openAIChatCompletionsRequestParameters struct {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is following the pattern we use for all of our other completion API providers. But having the data types not be exported makes it more difficult to test things in the frontend/internal/httpapi/completions package.

So if you are feeling saucy, we may want to export the data types from this package, so we can have integration tests that use the actual request/response types elsewhere. (Or not, just something to consider.)

@arafatkatze arafatkatze left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a long look at the PR and I didn't find any major issues but I do have questions:

  1. Will you eventually be enabling Hugging face TGI so anyone can test this PR too? Right now I don't have a reference point to compare this with besides OpenAI code but I think the devil is in the details and unless I can actually get to run it and get the response back to cody and run it through a debugger it would be difficult for me to confidently say that this works great.

  2. When you say openai compatible does that mean that both the request format and response format is EXACTLY the same as openAI meaning that even if you are sending one form of request to the OpenAI endpoint the functionality you need to read the SSE events coming back from the API might be slightly different and if that were to be the case then its harder to use the same code(This is exactly why I had issues with Claude in Google Vertex because it was similar to Anthropic API but not exactly the same so I had to create separate code). You mention that you have ran this on a few models already(in the description) so I assume the answer is a yes but I wanted to confirm again about this.

@emidoots

Copy link
Copy Markdown
Member Author

Though it might be nice to put some stakes in the ground and plan on unifying the "generic provider x openai" with "openaicompatible". As right now there is { azureopenai, generic/openai, openaicompatible } and that is definitely not a good position to be in.

... but we probably aren't ready to say if or when the "openai compatible" configuration knobs can be tuned in such a way that the codepath is a drop-in replacement for our completions/client/openai package.

Yep, I fully agree with this.

@emidoots

emidoots commented Jul 18, 2024

Copy link
Copy Markdown
Member Author

@arafatkatze thanks for taking a look, appreciate it!

  1. Yeah, I have my environment but am discussing with Aravind about getting a more proper environment up for others at sourcegraph to bang against this configuration and see how it works. TGI is easy to run, but the exact configuration is a bit nuanced and you need linux+nvidia GPU with lots of RAM.

  2. There's lots of software that says 'we expose an OpenAI-style API to our inference' (some examples below) - the goal of this provider is to support all of these pieces of software that claim to implement OpenAI API specification, with enough knobs and configuration abilities to actually support them all (many of them differ in subtle ways):

But for software that doesn't advertise 'we follow OpenAI-API specification', then this provider isn't intended for that.

@emidoots

Copy link
Copy Markdown
Member Author
  • Renamed DebugConnections -> EnableVerboseLogs
  • Entirely removed ServiceName enum and ServiceNameCustom, removed mention of nvidia-nim, etc.
  • Instead, kept top-level 'huggingface-tgi' provider as that is one we have well-tested and support.

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
emidoots referenced this pull request in sourcegraph/cody-public-snapshot Jul 19, 2024
These are the initial client changes needed for Cody Enterprise
customers to be able to use self-hosted models. Specifically, this PR
pairs with https://github.com/sourcegraph/sourcegraph/pull/63899

The first commit here is mostly changes @jamesmcnamara made to help me
wire through the `Model` to the actual location where we write
autocomplete providers. The following commits are me introducing a new
generic `openaicompatible` autocomplete provider, which will
supersede/replace the older experimental one which was purely
client-side.

Combined with https://github.com/sourcegraph/sourcegraph/pull/63899 -
these changes are enough to have at least Chat and Autocomplete working
fairly well using a self-hosted Starcoder2 and Mixtral 8x7b Instruct
model.

In this PR, the autocomplete provider is mostly tailored towards models
in those families (starcoder and mixtral/mistral instruct models) - to
unblock some customer use-cases - but all of the `Model`
`clientSideConfiguration` options piped down to this autocomplete
provider should be enough to generalize this approach to any model in
the future. In a follow-up PR, I will begin making use of that
`clientSideConfiguration` to enable using other self-hosted models and
other use-cases (letting the site admin customize prompting, etc.)

## Test plan

* The first commit from James has nice unit tests.
* The new autocomplete provider doesn't yet, I will add these in a
future PR. For now, it is manually tested on my end using:
* A local dev Sourcegraph instance configured as described in
https://github.com/sourcegraph/sourcegraph/pull/63899
* A remote server hosting 2 huggingface TGI Docker containers, with
starchat2 and mixtral models.
  * A local dev build of VS Code Cody.

---------

Signed-off-by: jamesmcnamara <james.mcnamara@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Co-authored-by: jamesmcnamara <james.mcnamara@sourcegraph.com>
Stephen Gutekanst added 3 commits July 18, 2024 18:13
…openaicompatible

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
@emidoots emidoots enabled auto-merge (squash) July 19, 2024 01:28
@emidoots emidoots merged commit dca1b96 into main Jul 19, 2024
@emidoots emidoots deleted the sg/self-hosted-models branch July 19, 2024 01:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants