Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

test(cody): Add unit tests for the Completions API#63434

Merged
chrsmith merged 4 commits into
mainfrom
chrsmith/entconfig-unit-tests
Jun 23, 2024
Merged

test(cody): Add unit tests for the Completions API#63434
chrsmith merged 4 commits into
mainfrom
chrsmith/entconfig-unit-tests

Conversation

@chrsmith

Copy link
Copy Markdown
Contributor

The Server-side Cody Model Selection project requires refactoring a lot of how the Completions API endpoint works for the Sourcegraph backend. It will need to update any place where we interact with the site config as it relates to LLMs, or any time we figure out which models can-or-can-not be used.

In order to safely land those refactoring without breaking existing users, who will be using the "older config format", we need tests. LOTS and LOTS of tests.

This PR adds the necessary infrastructure for writing unit tests against the Completions API, and adds a few basic ones for the Anthropic API provider. I wouldn't say it's as easy as we'd like to write these tests, but at least now it is possible. And we can further streamline things from here.

Overview

The bulk of functionality is in entconfig_base_test.go. That file contains some basic unit tests for how the enterprise configuration data is loaded, and provides the mocks and test infrastructure.

The crux of which is the apiProviderTestInfra struct. It will bundle mocks and the HTTP handlers to test, and provides high-level methods for invoking the completion API.

type apiProviderTestInfra struct {}

func (ti *apiProviderTestInfra) PushGetModelResult(model string, err error)
func (ti *apiProviderTestInfra) SetSiteConfig(siteConfig schema.SiteConfiguration)
func (ti *apiProviderTestInfra) CallChatCompletionAPI(t *testing.T, reqObj types.CodyCompletionRequestParameters) (int, string)
func (ti *apiProviderTestInfra) CallCodeCompletionAPI(t *testing.T, reqObj types.CodyCompletionRequestParameters) (int, string)
func (ti *apiProviderTestInfra) AssertCompletionsResponse(t *testing.T, rawResponseJSON string, wantResponse types.CompletionResponse)

What gets trick however, is how we actually hook into the implementation details of the completions API endpoint. You see, the user makes an HTTP request to the Sourcegraph instance. But we then figure out which LLM model to use, and then build an HTTP request to send to the specific LLM API Provider.

So the test infrastructure allows you to mock out that "middle part". The AssertCodyGatewayReceivesRequestWithResponse/AssertGenericExternalAPIRequestWithResponse functions will:

  1. Verify that the Sourcegraph instance is making an API call to the LLM provider that matches the format we expect. (e.g. using the correct Anthropic API request, etc.)
  2. The outbound HTTP request looks like it should, that it contains the right authorization headers, URL path, etc. (e.g. is it using the API key from the site config?)
  3. Finally, it returns the HTTP response from the API provider. (i.e. whatever Anthropic or Cody Gateway would have returned.)
type assertLLMRequestOptions struct {
	// WantRequestObj is what we expect the outbound HTTP request's JSON body
	// to be equal to. Required.
	WantRequestObj any
	// OutResponseObj is serialized to JSON and sent to the caller, i.e. our
	// LLM API Provider which is making the API request. Required.
	OutResponseObj any

	// WantRequestPath is the URL Path we expect in the outbound HTTP request.
	// No check is done if empty.
	WantRequestPath string

	// WantHeaders are HTTP header key/value pairs that must be present.
	WantHeaders map[string]string
}
func (ti *apiProviderTestInfra) AssertCodyGatewayReceivesRequestWithResponse(
	t *testing.T, opts assertLLMRequestOptions)
func (ti *apiProviderTestInfra) AssertGenericExternalAPIRequestWithResponse(
	t *testing.T, opts assertLLMRequestOptions)

Unfortunately it's super gnarly because we aren't really exposing the API data types from the API providers in a useful way. (e.g. perhaps we should just use the standard Anthropic Golang API client library.) So generating the test data relies on a lot of map[string]any to make it easier to construct arbitrary data types that can serialize to JSON the way that we need them to.

Anyways, with all of this test infrastructure in-place, you can write API provider tests like the following. Here's one such test from entconfig_anthropic_test.go which confirms that various aspects of the site-configuration are honored correctly when configured to use BYOK mode.

I've added "ℹ️ comments" to highlight some of the tricker parts.

t.Run("ViaBYOK", func(t *testing.T) {
		const (
			anthropicAPIKeyInConfig      = "secret-api-key"
			anthropicAPIEndpointInConfig = "https://byok.anthropic.com/path/from/config"
			chatModelInConfig            = "anthropic/claude-3-opus"
			codeModelInConfig            = "anthropic/claude-3-haiku"
		)

		infra.SetSiteConfig(schema.SiteConfiguration{
			CodyEnabled:                  pointers.Ptr(true),
			CodyPermissions:              pointers.Ptr(false),
			CodyRestrictUsersFeatureFlag: pointers.Ptr(false),

			// LicenseKey is required in order to use Cody.
			LicenseKey: "license-key",
			Completions: &schema.Completions{
				Provider:    "anthropic",
				AccessToken: anthropicAPIKeyInConfig,
				Endpoint:    anthropicAPIEndpointInConfig,

				ChatModel:       chatModelInConfig,
				CompletionModel: codeModelInConfig,
			},
		})

		t.Run("ChatModel", func(t *testing.T) {
			// ℹ️ Generating the "wantAnthropicRequest" and "outAnthropicResponse"
			// data is super-tedious. So we instead have a single function
			// that returns "a valid set", that we then customize.
			// So here, we just update the model we expect to see in the API
			// call to Anthropic.

			// Start with the stock test data, but customize it to reflect
			// what we expect to see based on the site configuration.
			testData := getValidTestData()
			testData.OutboundAnthropicRequest["model"] = "anthropic/claude-3-opus"

			// Register our hook to verify Cody Gateway got called with
			// the requested data.
			infra.AssertGenericExternalAPIRequestWithResponse(
				t, assertLLMRequestOptions{
					WantRequestPath: "/path/from/config",
					WantRequestObj:  &testData.OutboundAnthropicRequest,
					OutResponseObj:  &testData.InboundAnthropicRequest,
					WantHeaders: map[string]string{
						// Yes, Anthropic's API uses "X-Api-Key" rather than the "Authorization" header. 🤷
						"X-Api-Key": anthropicAPIKeyInConfig,
					},
				})

			// ℹ️ This `PushGetModelResult` is just a quirk of how the code
			// under test works. We mock out the `getModelFn` that is invoked
			// to resolve the _actual_ LLM model to use. (And not necessarily
			// use the one from the HTTP request.)
			infra.PushGetModelResult(chatModelInConfig, nil)

			status, responseBody := infra.CallChatCompletionAPI(t, testData.InitialCompletionRequest)

			assert.Equal(t, http.StatusOK, status)
			infra.AssertCompletionsResponse(t, responseBody, types.CompletionResponse{
				// ℹ️ The "totally rewrite it in Rust!" is coming from the
				// fake Anthropic response, from `getValidTestData`.
				Completion: "you should totally rewrite it in Rust!",
				StopReason: "max_tokens",
				Logprobs:   nil,
			})
		})
	})

Next steps

Once this gets checked-in, my plan is to carefully add new unit tests for the existing functionality before DISMANTLING the code to write through an entirely different site configuration object. 🤞 this will allow me to do so in such a way that I can confirm my changes won't alter any existing Sourcegraph instances that are using the older configuration format.

Test plan

Adds more tests.

Changelog

NA. Just trivial changes and adding more tests.

@chrsmith chrsmith requested review from a team, abeatrix and emidoots June 23, 2024 19:55
@cla-bot cla-bot Bot added the cla-signed label Jun 23, 2024
case types.HUMAN_MESSAGE_SPEAKER:
anthropicRole = "user"
default:
return nil, errors.Errorf("unexpected role: %s", text)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are just me fixing something I found while writing the tests. We duplicated incorrect code across all of our API provider implementations.

In all of these the "unexpected role" is referring to the speaker field. (i.e. what the switch statement is checking, and not the text of the message itself.)

@chrsmith chrsmith merged commit 1f9e72c into main Jun 23, 2024
@chrsmith chrsmith deleted the chrsmith/entconfig-unit-tests branch June 23, 2024 21:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants