Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

feat/cody-gateway: support wildcard models#62909

Merged
bobheadxi merged 1 commit into
mainfrom
cody-gateway-wildcard-models
May 31, 2024
Merged

feat/cody-gateway: support wildcard models#62909
bobheadxi merged 1 commit into
mainfrom
cody-gateway-wildcard-models

Conversation

@bobheadxi

@bobheadxi bobheadxi commented May 24, 2024

Copy link
Copy Markdown
Member

In https://sourcegraph.slack.com/archives/C05SZB829D0/p1715638980052279 we shared a decision we landed on as part of #62263:

Ignoring (then removing) per-subscription model allowlists: As part of the API discussions, we've also surfaced some opportunities for improvements - to make it easier to roll out new models to Enterprise, we're not including per-subscription model allowlists in the new API, and as part of the Cody Gateway migration (by end-of-June), we will update Cody Gateway to stop enforcing per-subscription model allowlists. Cody Gateway will still retain a Cody-Gateway-wide model allowlist. @chrsmith is working on a broader design here and will have more to share on this later.

To support this, we first need to extend Cody Gateway's model allowlist enforcement to respect a notion of "allow all models that are allowed in Cody Gateway". To ensure models are explicitly provided today, an empty AllowedModels is considered invalid, so we add a special single-element-slice-* configuration that can be used to indicate an actor's rate limit allows all models (prefixedMasterAllowlist).

This change also unifies somewhat the way we enforce allowed models in various places by introducing (*RateLimit).EvaluateAllowedModels(...) as the unified way to construct the final allowlist for a given rate limit.

I'm planning to roll this out before rolling out actual functionality changes (https://github.com/sourcegraph/sourcegraph/pull/62911) to ensure changes in cached rate limits don't end up confusing an older revision of Cody Gateway that doesn't yet support wildcard models. With #62911, rolling out new models to Enterprise customers no longer require additional code/override changes.

Part of https://linear.app/sourcegraph/issue/CORE-135

Test plan

Unit tests, and E2E test of this in https://github.com/sourcegraph/sourcegraph/pull/62911

bobheadxi commented May 24, 2024

Copy link
Copy Markdown
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @bobheadxi and the rest of your teammates on Graphite Graphite

@bobheadxi bobheadxi force-pushed the cody-gateway-wildcard-models branch 2 times, most recently from acb70df to 60a9db4 Compare May 24, 2024 20:59

@rafax rafax May 27, 2024

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing that this is upstream.go, which already does many things and has a lot of dependencies, I think it would be good to hire the "*" as an implementation detail in limits.go (say, add limits.IsWildCard())?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, move the whole "is this prefixed or not" logic to a single place, handle wildcard and prefixing there?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says this is TEMPORARY, so I think this was only meant to tide us over until the cache state migrates... ideally, this whole block of code should be removed. But, it doesn't feel appropriate to do in this PR 😅

In light of this being a one-off thing, would you be okay with keeping this here? I can add this as a docstring as well. My intent is to discourage this type of checking because the actor model list should always be prefixed, and we should never manipulate the actor models list ourselves, instead depending only on EvaluateAllowedModels to do the check

@bobheadxi bobheadxi force-pushed the cody-gateway-wildcard-models branch 3 times, most recently from f911fcc to 34453c5 Compare May 29, 2024 03:44
@bobheadxi bobheadxi requested a review from a team May 29, 2024 04:21
@bobheadxi bobheadxi force-pushed the cody-gateway-wildcard-models branch 2 times, most recently from 3a7b6a4 to 90ae50d Compare May 30, 2024 18:07
@bobheadxi bobheadxi force-pushed the cody-gateway-wildcard-models branch from 90ae50d to 0f45af8 Compare May 31, 2024 16:32
@bobheadxi

Copy link
Copy Markdown
Member Author

Merging this first to start working my way through landing this large stack of PRs, as I shared in Slack: https://sourcegraph.slack.com/archives/C05SZB829D0/p1717173273215949?thread_ts=1715638980.052279&cid=C05SZB829D0

This is low-risk as it doesn't include any real functionality changes yet, just prepares them, as outlined in PR description.

Happy to address any post-merge reviews :)

@bobheadxi bobheadxi merged commit 5833a98 into main May 31, 2024
@bobheadxi bobheadxi deleted the cody-gateway-wildcard-models branch May 31, 2024 20:09
bobheadxi added a commit that referenced this pull request Jun 4, 2024
This change makes Cody Gateway always apply a wildcard model allowlist,
irrespective of what the configured model allowlist is for an Enterprise
subscription is in dotcom (see #62909).

The next PR in the stack,
https://github.com/sourcegraph/sourcegraph/pull/62912, makes the GraphQL
queries return similar results, and removes model allowlists from the
subscription management UI.

Closes https://linear.app/sourcegraph/issue/CORE-135

### Context

In https://sourcegraph.slack.com/archives/C05SZB829D0/p1715638980052279
we shared a decision we landed on as part of #62263:

> Ignoring (then removing) per-subscription model allowlists: As part of
the API discussions, we've also surfaced some opportunities for
improvements - to make it easier to roll out new models to Enterprise,
we're not including per-subscription model allowlists in the new API,
and as part of the Cody Gateway migration (by end-of-June), we will
update Cody Gateway to stop enforcing per-subscription model allowlists.
Cody Gateway will still retain a Cody-Gateway-wide model allowlist.
[@chrsmith](https://sourcegraph.slack.com/team/U061QHKUBJ8) is working
on a broader design here and will have more to share on this later.

This means there is one less thing for us to migrate as part of
https://github.com/sourcegraph/sourcegraph/pull/62934, and avoids the
need to add an API field that will be removed shortly post-migration.

As part of this, rolling out new models to Enterprise customers no
longer require additional code/override changes.

## Test plan

Set up Cody Gateway locally as documented, then `sg start dotcom`. Set
up an enterprise subscription + license with a high seat count (for a
high quota), and force a Cody Gateway sync:

```
curl -v -H 'Authorization: bearer sekret' http://localhost:9992/-/actor/sync-all-sources
```

Verify we are using wildcard allowlist:

```sh
$ redis-cli -p 6379 get 'v2:product-subscriptions:v2:slk_...'
"{\"key\":\"slk_...\",\"id\":\"6ad033f4-c6da-43a9-95ef-f653bf59aaac\",\"name\":\"bobheadxi\",\"accessEnabled\":true,\"endpointAccess\":{\"/v1/attribution\":true},\"rateLimits\":{\"chat_completions\":{\"allowedModels\":[\"*\"],\"limit\":660,\"interval\":86400000000000,\"concurrentRequests\":330,\"concurrentRequestsInterval\":10000000000},\"code_completions\":{\"allowedModels\":[\"*\"],\"limit\":66000,\"interval\":86400000000000,\"concurrentRequests\":33000,\"concurrentRequestsInterval\":10000000000},\"embeddings\":{\"allowedModels\":[\"*\"],\"limit\":220000000,\"interval\":86400000000000,\"concurrentRequests\":110000000,\"concurrentRequestsInterval\":10000000000}},\"lastUpdated\":\"2024-05-24T20:28:58.283296Z\"}"
```

Using the local enterprise subscription's access token, we run the QA
test suite:

```sh
$ bazel test --runs_per_test=2 --test_output=all //cmd/cody-gateway/qa:qa_test --test_env=E2E_GATEWAY_ENDPOINT=http://localhost:9992 --test_env=E2E_GATEWAY_TOKEN=$TOKEN
INFO: Analyzed target //cmd/cody-gateway/qa:qa_test (0 packages loaded, 0 targets configured).
INFO: From Testing //cmd/cody-gateway/qa:qa_test (run 1 of 2):
==================== Test output for //cmd/cody-gateway/qa:qa_test (run 1 of 2):
PASS
================================================================================
INFO: From Testing //cmd/cody-gateway/qa:qa_test (run 2 of 2):
==================== Test output for //cmd/cody-gateway/qa:qa_test (run 2 of 2):
PASS
================================================================================
INFO: Found 1 test target...
Target //cmd/cody-gateway/qa:qa_test up-to-date:
  bazel-bin/cmd/cody-gateway/qa/qa_test_/qa_test
Aspect @@rules_rust//rust/private:clippy.bzl%rust_clippy_aspect of //cmd/cody-gateway/qa:qa_test up-to-date (nothing to build)
Aspect @@rules_rust//rust/private:rustfmt.bzl%rustfmt_aspect of //cmd/cody-gateway/qa:qa_test up-to-date (nothing to build)
INFO: Elapsed time: 13.653s, Critical Path: 13.38s
INFO: 7 processes: 1 internal, 6 darwin-sandbox.
INFO: Build completed successfully, 7 total actions
//cmd/cody-gateway/qa:qa_test                                            PASSED in 11.7s
  Stats over 2 runs: max = 11.7s, min = 11.7s, avg = 11.7s, dev = 0.0s

Executed 1 out of 1 test: 1 test passes.
```
bobheadxi referenced this pull request Jun 7, 2024
…ns (#62934)

Migrates Cody Gateway to use the new Enterprise Portal's "read-only"
APIs. For the most part, this is an in-place replacement - a lot of the
diff is in testing and minor changes. Some changes, such as the removal
of model allowlists, were made down the PR stack in
https://github.com/sourcegraph/sourcegraph/pull/62911.

At a high level, we replace the data requested by
`cmd/cody-gateway/internal/dotcom/operations.graphql` and replace it
with Enterprise Portal RPCs:

- `codyaccessv1.GetCodyGatewayAccess`
- `codyaccessv1.ListCodyGatewayAccesses`

Use cases that previously required retrieving the active license tags
now:

1. Use the display name provided by the Cody Access API
https://github.com/sourcegraph/sourcegraph/pull/62968
2. Depend on the connected Enterprise Portal dev instance to only return
dev subscriptions https://github.com/sourcegraph/sourcegraph/pull/62966

Closes https://linear.app/sourcegraph/issue/CORE-98
Related to https://linear.app/sourcegraph/issue/CORE-135
(https://github.com/sourcegraph/sourcegraph/pull/62909,
https://github.com/sourcegraph/sourcegraph/pull/62911)
Related to https://linear.app/sourcegraph/issue/CORE-97

## Local development

This change also adds Enterprise Portal to `sg start dotcom`. For local
development, we set up Cody Gateway to connect to Enterprise Portal such
that zero configuration is needed - all the required secrets are sourced
from the `sourcegrah-local-dev` GCP project automatically when you run
`sg start dotcom`, and local Cody Gateway will talk to local Enterprise
Portal to do the Enterprise subscriptions sync.

This is actually an upgrade from the current experience where you need
to provide Cody Gateway a Sourcegraph user access token to test
Enterprise locally, though the Sourcegraph user access token is still
required for the PLG actor source.

The credential is configured in
https://console.cloud.google.com/security/secret-manager/secret/SG_LOCAL_DEV_SAMS_CLIENT_SECRET/overview?project=sourcegraph-local-dev,
and I've included documentation in the secret annotation about what it
is for and what to do with it:


![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/c61ad4e0-3b75-408d-a930-076a414336fb)

## Rollout plan

I will open PRs to set up the necessary configuration for Cody Gateway
dev and prod. Once reviews taper down I'll cut an image from this branch
and deploy it to Cody Gateway dev, and monitor it closely + do some
manual testing. Once verified, I'll land this change and monitor a
rollout to production.

Cody Gateway dev SAMS client:
sourcegraph/infrastructure#6108
Cody Gateway prod SAMS client update (this one already exists):

```
accounts=> UPDATE idp_clients
SET scopes = scopes || '["enterprise_portal::subscription::read", "enterprise_portal::codyaccess::read"]'::jsonb
WHERE id = 'sams_cid_018ea062-479e-7342-9473-66645e616cbf';
UPDATE 1
accounts=> select name, scopes from idp_clients WHERE name = 'Cody Gateway (prod)';
        name         |                                                              scopes                                                              
---------------------+----------------------------------------------------------------------------------------------------------------------------------
 Cody Gateway (prod) | ["openid", "profile", "email", "offline_access", "enterprise_portal::subscription::read", "enterprise_portal::codyaccess::read"]
(1 row)
```

Configuring the target Enterprise Portal instances:
sourcegraph/infrastructure#6127

## Test plan

Start the new `dotcom` runset, now including Enterprise Portal, and
observe logs from both `enterprise-portal` and `cody-gateway`:

```
sg start dotcom
```

I reused the test plan from
https://github.com/sourcegraph/sourcegraph/pull/62911: set up Cody
Gateway external dependency secrets, then set up an enterprise
subscription + license with a high seat count (for a high quota), and
force a Cody Gateway sync:

```
curl -v -H 'Authorization: bearer sekret' http://localhost:9992/-/actor/sync-all-sources
```

This should indicate the new sync against "local dotcom" fetches the
correct number of actors and whatnot.

Using the local enterprise subscription's access token, we run the QA
test suite:

```sh
$ bazel test --runs_per_test=2 --test_output=all //cmd/cody-gateway/qa:qa_test --test_env=E2E_GATEWAY_ENDPOINT=http://localhost:9992 --test_env=E2E_GATEWAY_TOKEN=$TOKEN
INFO: Analyzed target //cmd/cody-gateway/qa:qa_test (0 packages loaded, 0 targets configured).
INFO: From Testing //cmd/cody-gateway/qa:qa_test (run 1 of 2):
==================== Test output for //cmd/cody-gateway/qa:qa_test (run 1 of 2):
PASS
================================================================================
INFO: From Testing //cmd/cody-gateway/qa:qa_test (run 2 of 2):
==================== Test output for //cmd/cody-gateway/qa:qa_test (run 2 of 2):
PASS
================================================================================
INFO: Found 1 test target...
Target //cmd/cody-gateway/qa:qa_test up-to-date:
  bazel-bin/cmd/cody-gateway/qa/qa_test_/qa_test
Aspect @@rules_rust//rust/private:clippy.bzl%rust_clippy_aspect of //cmd/cody-gateway/qa:qa_test up-to-date (nothing to build)
Aspect @@rules_rust//rust/private:rustfmt.bzl%rustfmt_aspect of //cmd/cody-gateway/qa:qa_test up-to-date (nothing to build)
INFO: Elapsed time: 13.653s, Critical Path: 13.38s
INFO: 7 processes: 1 internal, 6 darwin-sandbox.
INFO: Build completed successfully, 7 total actions
//cmd/cody-gateway/qa:qa_test                                            PASSED in 11.7s
  Stats over 2 runs: max = 11.7s, min = 11.7s, avg = 11.7s, dev = 0.0s

Executed 1 out of 1 test: 1 test passes.
```
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants