self hosted models by emidoots · Pull Request #63899 · sourcegraph/sourcegraph-public-snapshot

emidoots · 2024-07-18T01:35:08Z

This PR is stacked on top of all the prior work @chrsmith has done for shuffling configuration data around; it implements the new "Self hosted models" functionality.

Configuration

Configuring a Sourcegraph instance to use self-hosted models basically involves adding some configuration like this to the site config (if you set modelConfiguration, you are opting in to the new system which is in early access):

  // Setting this field means we are opting into the new Cody model configuration system.
  "modelConfiguration": {
    // Disable use of Sourcegraph's servers for model discovery
    "sourcegraph": null,

    // Create two model providers
    "providerOverrides": [
      {
        // Our first model provider "mistral" will be a Huggingface TGI deployment which hosts our
        // mistral model for chat functionality.
        "id": "mistral",
        "displayName": "Mistral",
        "serverSideConfig": {
          "type": "huggingface-tgi",
          "endpoints": [{"url": "https://mistral.example.com/v1"}]
        },
      },
      {
        // Our second model provider "bigcode" will be a Huggingface TGI deployment which hosts our
        // bigcode/starcoder model for code completion functionality.
        "id": "bigcode",
        "displayName": "Bigcode",
        "serverSideConfig": {
          "type": "huggingface-tgi",
          "endpoints": [{"url": "http://starcoder.example.com/v1"}]
        }
      }
    ],

    // Make these two models available to Cody users
    "modelOverridesRecommendedSettings": [
      "mistral::v1::mixtral-8x7b-instruct",
      "bigcode::v1::starcoder2-7b"
    ],

    // Configure which models Cody will use by default
    "defaultModels": {
      "chat": "mistral::v1::mixtral-8x7b-instruct",
      "fastChat": "mistral::v1::mixtral-8x7b-instruct",
      "codeCompletion": "bigcode::v1::starcoder2-7b"
    }
  }

More advanced configurations are possible, the above is our blessed configuration for today.

Hosting models

Another major component of this work is starting to build up recommendations around how to self-host models, which ones to use, how to configure them, etc.

For now, we've been testing with these two on a machine with dual A100s:

Huggingface TGI (this is a Docker container for model inference, which provides an OpenAI-compatible API - and is widely popular)
Two models:
- Starcoder2 for code completion; specifically bigcode/starcoder2-15b with eetq 8-bit quantization.
- Mixtral 8x7b instruct for chat; specifically casperhansen/mixtral-instruct-awq which uses awq 4-bit quantization.

This is our 'starter' configuration. Other models - specifically other starcoder 2, and mixtral instruct models - certainly work too, and higher parameter versions may of course provide better results.

Documentation for how to deploy Huggingface TGI, suggested configuration and debugging tips - coming soon.

Advanced configuration

As part of this effort, I have added a quite extensive set of configuration knobs to to the client side model configuration (see type ClientSideModelConfigOpenAICompatible in this PR)

Some of these configuration options are needed for things to work at a basic level, while others (e.g. prompt customization) are not needed for basic functionality, but are very important for customers interested in self-hosting their own models.

Today, Cody clients have a number of different autocomplete provider implementations which tie model-specific logic to enable autocomplete, to a provider. For example, if you use a GPT model through Azure OpenAI, the autocomplete provider for that is entirely different from what you'd get if you used a GPT model through OpenAI officially. This can lead to some subtle issues for us, and so it is worth exploring ways to have a generalized autocomplete provider - and since with self-hosted models we must address this problem, these configuration knobs fed to the client from the server are a pathway to doing that - initially just for self-hosted models, but in the future possibly generalized to other providers.

Debugging facilities

Working with customers in the past to use OpenAI-compatible APIs, we've learned that debugging can be quite a pain. If you can't see what requests the Sourcegraph backend is making, and what it is getting back.. it can be quite painful to debug.

This PR implements quite extensive logging, and a debugConnections flag which can be turned on to enable logging of the actual request payloads and responses. This is critical when a customer is trying to add support for a new model, their own custom OpenAI API service, etc.

Robustness

Working with customers in the past, we also learned that various parts of our backend openai provider were not super robust. For example, if more than one message was present it was a fatal error, or if the SSE stream yielded {"error"} payloads, they would go ignored. Similarly, the SSE event stream parser we use is heavily tailored towards the exact response structure which OpenAI's official API returns, and is therefor quite brittle if connecting to a different SSE stream.

For this work, I have started by forking our internal/completions/client/openai - and made a number of major improvements to it to make it more robust, handle errors better, etc.

I have also replaced the usage of a custom SSE event stream parser - which was not spec compliant and brittle - with a proper SSE event stream parser that recently popped up in the Go community: https://github.com/tmaxmax/go-sse

My intention is that after more extensive testing, this new internal/completions/client/openaicompatible provider will be more robust, more correct, and all around better than internal/completions/client/openai (and possibly the azure one) so that we can just supersede those with this new openaicompatible one entirely.

Client implementation

Much of the work done in this PR is just "let the site admin configure things, and broadcast that config to the client through the new model config system."

Actually getting the clients to respect the new configuration, is a task I am tackling in future sourcegraph/cody PRs.

Test plan

This change currently lacks any unit/regression tests, that is a major noteworthy point. I will follow-up with those in a future PR.

However, these changes are incredibly isolated, clearly only affecting customers who opt-in to this new self-hosted models configuration.
Most of the heavy lifting (SSE streaming, shuffling data around) is done in other well-tested codebases.

Manual testing has played a big role here, specifically:

Running a dev instance with the new configuration, actually connected to Huggingface TGI deployed on a remote server.
Using the new debugConnections mechanism (which customers would use) to directly confirm requests are going to the right places, with the right data and payloads.
Confirming with a new client (changes not yet landed) that autocomplete and chat functionality work.

Can we use more testing? Hell yeah, and I'm going to add it soon. Does it work quite well and have small room for error? Also yes.

Changelog

Cody Enterprise: added a new configuration for self-hosting models. Reach out to support if you would like to use this feature as it is in early access.

…openaicompatible Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

+	"encoding/json"
+	"fmt"
+	"io"
+	"math/rand"


Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

chrsmith

I see what you have going on here, and have some high-level notes about the direction. But I'm happy to approve it so you can keep moving forward on this.

Though it might be nice to put some stakes in the ground and plan on unifying the "generic provider x openai" with "openaicompatible". As right now there is { azureopenai, generic/openai, openaicompatible } and that is definitely not a good position to be in.

... but we probably aren't ready to say if or when the "openai compatible" configuration knobs can be tuned in such a way that the codepath is a drop-in replacement for our completions/client/openai package.

WDYT?

chrsmith · 2024-07-18T17:46:33Z

          "items": {
            "type": "string",
            "enum": [
-              "bigcode::v1::starcoder2-3b",


Just so I understand, this would be a "breaking change", since any existing site configs that were using some of these model references would start to fail?

So maybe a comment in the JSON file (if site.schema.json supports that non-standard JSON extension), to call that out?

It is technically a breaking change, yes. However, since we haven't documented or shared this configuration with anyone (and, if they did find and set it on their own already.. I am pretty sure their instance would not be functional) - I feel comfortable removing it.

Unfortunately we don't have doc comments in site.schema.json - just description fields.

chrsmith · 2024-07-18T17:50:03Z

            "fireworks",
            "google",
            "openai",
+            "huggingface-tgi",


Maybe this is intentional, but this looks wrong.

Do we want to expose the "type" here and introduce huggingface-tgi as a "new form" of server-side provider config? I would have thought that we'd want an admin to express this with something like:

serverSideProviderConfig: { type: "openaicompatible", serviceProviderName: "huggingface-tgi", ... }

It isn't a big deal, but again, I'd love for us to avoid the "sprawl" associated with making it look like we support N-different top-level providers. When in actuality, it's a smaller number. And "huggingface-tgi" is just a specific set of defaults/overrides on top of "openaicompatible".

Right?

update: we chatted about this in our zoom call earlier today, and agreed this is okay because we really are trying to communicate that this is a top-level provider, well-supported, well-tested, etc.

The intention is that:

openaicompatible provider is the 'super generic' one, that can connect to arbitrary things.

huggingface-tgi is very specific, tested by us, etc. - it just happens to use the same implementation as openaicompatible provider.

chrsmith · 2024-07-18T17:55:42Z

+        "pointer": true
+      },
+      "default": null,
+      "description": "Advanced configuration options that are only respected if the model is provided by an openaicompatible provider.",


This is good to call out. But two things:

I'll mention the confusing around "openaicompatible provider" and the "huggingface-tgi" type of provider.

I am 100% certain we'll have some sort of generic "ClientSideModelOverrides" type that will overlap considerably with this. (So we can run experiments, or fine-tune things, etc.)

So this is fine as-is, but just calling out where I think things will evolve.

chrsmith · 2024-07-18T18:06:15Z

+package openaicompatible
+
+// openAIChatCompletionsRequestParameters request object for openAI chat endpoint https://platform.openai.com/docs/api-reference/chat/create
+type openAIChatCompletionsRequestParameters struct {


This is following the pattern we use for all of our other completion API providers. But having the data types not be exported makes it more difficult to test things in the frontend/internal/httpapi/completions package.

So if you are feeling saucy, we may want to export the data types from this package, so we can have integration tests that use the actual request/response types elsewhere. (Or not, just something to consider.)

arafatkatze

I took a long look at the PR and I didn't find any major issues but I do have questions:

Will you eventually be enabling Hugging face TGI so anyone can test this PR too? Right now I don't have a reference point to compare this with besides OpenAI code but I think the devil is in the details and unless I can actually get to run it and get the response back to cody and run it through a debugger it would be difficult for me to confidently say that this works great.
When you say openai compatible does that mean that both the request format and response format is EXACTLY the same as openAI meaning that even if you are sending one form of request to the OpenAI endpoint the functionality you need to read the SSE events coming back from the API might be slightly different and if that were to be the case then its harder to use the same code(This is exactly why I had issues with Claude in Google Vertex because it was similar to Anthropic API but not exactly the same so I had to create separate code). You mention that you have ran this on a few models already(in the description) so I assume the answer is a yes but I wanted to confirm again about this.

emidoots · 2024-07-18T23:16:55Z

Though it might be nice to put some stakes in the ground and plan on unifying the "generic provider x openai" with "openaicompatible". As right now there is { azureopenai, generic/openai, openaicompatible } and that is definitely not a good position to be in.

... but we probably aren't ready to say if or when the "openai compatible" configuration knobs can be tuned in such a way that the codepath is a drop-in replacement for our completions/client/openai package.

Yep, I fully agree with this.

emidoots · 2024-07-18T23:24:52Z

@arafatkatze thanks for taking a look, appreciate it!

Yeah, I have my environment but am discussing with Aravind about getting a more proper environment up for others at sourcegraph to bang against this configuration and see how it works. TGI is easy to run, but the exact configuration is a bit nuanced and you need linux+nvidia GPU with lots of RAM.
There's lots of software that says 'we expose an OpenAI-style API to our inference' (some examples below) - the goal of this provider is to support all of these pieces of software that claim to implement OpenAI API specification, with enough knobs and configuration abilities to actually support them all (many of them differ in subtle ways):

OpenAI's official API
Huggingface TGI https://huggingface.co/docs/text-generation-inference/en/messages_api
Ollama https://ollama.com/blog/openai-compatibility
AWS LISA https://github.com/awslabs/LISA?tab=readme-ov-file#background
NVIDIA NIM https://www.nvidia.com/en-us/ai/
and many others

But for software that doesn't advertise 'we follow OpenAI-API specification', then this provider isn't intended for that.

emidoots · 2024-07-19T01:03:54Z

Renamed DebugConnections -> EnableVerboseLogs
Entirely removed ServiceName enum and ServiceNameCustom, removed mention of nvidia-nim, etc.
Instead, kept top-level 'huggingface-tgi' provider as that is one we have well-tested and support.

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

@jamesmcnamara

These are the initial client changes needed for Cody Enterprise customers to be able to use self-hosted models. Specifically, this PR pairs with https://github.com/sourcegraph/sourcegraph/pull/63899 The first commit here is mostly changes @jamesmcnamara made to help me wire through the `Model` to the actual location where we write autocomplete providers. The following commits are me introducing a new generic `openaicompatible` autocomplete provider, which will supersede/replace the older experimental one which was purely client-side. Combined with https://github.com/sourcegraph/sourcegraph/pull/63899 - these changes are enough to have at least Chat and Autocomplete working fairly well using a self-hosted Starcoder2 and Mixtral 8x7b Instruct model. In this PR, the autocomplete provider is mostly tailored towards models in those families (starcoder and mixtral/mistral instruct models) - to unblock some customer use-cases - but all of the `Model` `clientSideConfiguration` options piped down to this autocomplete provider should be enough to generalize this approach to any model in the future. In a follow-up PR, I will begin making use of that `clientSideConfiguration` to enable using other self-hosted models and other use-cases (letting the site admin customize prompting, etc.) ## Test plan * The first commit from James has nice unit tests. * The new autocomplete provider doesn't yet, I will add these in a future PR. For now, it is manually tested on my end using: * A local dev Sourcegraph instance configured as described in https://github.com/sourcegraph/sourcegraph/pull/63899 * A remote server hosting 2 huggingface TGI Docker containers, with starchat2 and mixtral models. * A local dev build of VS Code Cody. --------- Signed-off-by: jamesmcnamara <james.mcnamara@sourcegraph.com> Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com> Co-authored-by: jamesmcnamara <james.mcnamara@sourcegraph.com>

…openaicompatible Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

Stephen Gutekanst added 21 commits July 17, 2024 17:08

cp -R internal/completions/client/openai internal/completions/client/…

335d393

…openaicompatible Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

initial modifications to openai completions client

71bc1cf

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

add proper SSE events client

b89928f

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

move JSON types to their own file

33f9566

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

rename client

1f3ea40

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

pass modelID to tokenusage correctly

db2495a

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

record distinct tokenusage.OpenAICompatible usage

2595b8a

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

refactor

cf3e59c

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

seperate HTTP request construction from execution

626ffd0

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

update to latest main

f53bd65

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

initial site config

3cb76d5

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

modelconfig: configuration

385275a

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

correct site.schema.json

81ae637

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

connection debugging option

0922218

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

site config + conversion logic

fe8f7f1

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

use new configuration

189abc5

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

openaicompatible: various improvements

8a05611

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

improved error handling & robustness

b67161d

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

improve endpoint/URL configuration, support non-v1 paths

723892a

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

improve logging

d515146

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

refine recommended settings

b1f73fe

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

emidoots requested review from a team, arafatkatze and chrsmith July 18, 2024 01:35

github-advanced-security AI found potential problems Jul 18, 2024

View reviewed changes

Comment thread internal/completions/client/openaicompatible/openaicompatible.go

"encoding/json"

"fmt"

"io"

"math/rand"

Check warning

Code scanning / Semgrep OSS

Semgrep Finding: security-semgrep-rules.semgrep-rules.golang.math-random-used

Do not use `math/rand`. Use `crypto/rand` instead.

cla-bot Bot added the cla-signed label Jul 18, 2024

Stephen Gutekanst added 4 commits July 17, 2024 18:37

fix linter

74737c0

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

fix linter

5f3f92c

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

bazel run //:gazelle-update-repos

90046d6

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

bazel run //:configure

cde7c6f

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

emidoots mentioned this pull request Jul 18, 2024

Self hosted models sourcegraph/cody-public-snapshot#4913

Merged

Stephen Gutekanst added 3 commits July 17, 2024 20:27

sg lint --fix format

425125d

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

bazel build --nobuild

86f8ed7

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

Merge remote-tracking branch 'origin/main' into sg/self-hosted-models

0f1b2c6

chrsmith approved these changes Jul 18, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into sg/self-hosted-models

146fca4

arafatkatze reviewed Jul 18, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into sg/self-hosted-models

061128f

address feedback

b51e1d9

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

Stephen Gutekanst added 3 commits July 18, 2024 18:13

Merge remote-tracking branch 'origin/main' into sg/self-hosted-models

e1ced13

bazel build --nobuild //internal/completions/client/openaicompatible:…

ff9d1cc

…openaicompatible Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

fix test

69092a6

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>

emidoots enabled auto-merge (squash) July 19, 2024 01:28

emidoots merged commit dca1b96 into main Jul 19, 2024

emidoots deleted the sg/self-hosted-models branch July 19, 2024 01:34

Conversation

emidoots commented Jul 18, 2024

Configuration

Hosting models

Advanced configuration

Debugging facilities

Robustness

Client implementation

Test plan

Changelog

Uh oh!

Check warning

chrsmith left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chrsmith Jul 18, 2024

Choose a reason for hiding this comment

Uh oh!

emidoots Jul 19, 2024

Choose a reason for hiding this comment

Uh oh!

chrsmith Jul 18, 2024

Choose a reason for hiding this comment

Uh oh!

emidoots Jul 19, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chrsmith Jul 18, 2024

Choose a reason for hiding this comment

Uh oh!

chrsmith Jul 18, 2024

Choose a reason for hiding this comment

Uh oh!

arafatkatze left a comment

Choose a reason for hiding this comment

Uh oh!

emidoots commented Jul 18, 2024

Uh oh!

emidoots commented Jul 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emidoots commented Jul 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

emidoots commented Jul 18, 2024 •

edited

Loading