Skip to content

Evaluator: enable per-tenant HTTP caching for GitHub API client#953

Merged
edigaryev merged 12 commits intomainfrom
larker-github-client-caching
Oct 6, 2025
Merged

Evaluator: enable per-tenant HTTP caching for GitHub API client#953
edigaryev merged 12 commits intomainfrom
larker-github-client-caching

Conversation

@edigaryev
Copy link
Copy Markdown
Contributor

@edigaryev edigaryev commented Sep 30, 2025

To reduce GitHub API rate limit usage for Starlark scripts.

httpClient := httpcache.NewClient("memcache://")

// GitHub has a 10-second timeout for API requests
httpClient.Timeout = 11 * time.Second
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a TTL for caching or just request timeout?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request timeout, this is copied from the previous code.

}

func (gh *GitHub) client() *github.Client {
httpClient := httpcache.NewClient("memcache://")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a singleton shared between these instances. Otherwise we'll only cache things within a single starlark evaluation.

Maybe pass fs_cachable in the proto request.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a singleton shared between these instances. Otherwise we'll only cache things within a single starlark evaluation.

Fixed in 267599f.

Maybe pass fs_cachable in the proto request.

Can you elaborate?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking if it's safe to share the cache between repositories for a GH FS or should be have a cache of FSes per repo.

Copy link
Copy Markdown
Contributor Author

@edigaryev edigaryev Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the safest (and good enough) way to start with is to have a separate cache, HTTP client and GitHub client instance per EvaluateConfig() and EvaluateFunction().

The current implementation is almost like that, except that it instantiates a new cache, HTTP client and GitHub client instance on module loading (e.g. load("github.com/cirrus-modules/golang", "task", "container")).

It feels like we should avoid doing this if currentFS is already *github.GitHub here:

func findLocatorFS(
ctx context.Context,
currentFS fs.FileSystem,
env map[string]string,
location interface{},
) (fs.FileSystem, string, error) {
switch typedLocation := location.(type) {
case gitHubLocation:
token := env["CIRRUS_REPO_CLONE_TOKEN"]
ghFS, err := github.New(typedLocation.Owner, typedLocation.Name, typedLocation.Revision, token)
if err != nil {
return nil, "", err
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, perhaps a separate *github.Github is totally fine, we just need to have a shared *http.Client per EvaluateConfig() and EvaluateFunction() invocation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, perhaps a separate *github.Github is totally fine, we just need to have a shared *http.Client per EvaluateConfig() and EvaluateFunction() invocation.

Implemented in 792cb3a.

@edigaryev edigaryev force-pushed the larker-github-client-caching branch from 017de46 to 529ce35 Compare September 30, 2025 14:43
@edigaryev edigaryev requested a review from fkorotkov September 30, 2025 14:46

var defaultGitHubClient = github.NewClient(&http.Client{
Transport: &http.Transport{
var httpClient = httpcache.NewClient("memcache://", httpcache.WithUpstream(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it evict things ever?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think we need re-initialize the caching HTTP client per-evaluation 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in af92585.

@edigaryev edigaryev requested a review from fkorotkov September 30, 2025 15:13
}

func cachingHTTPClient() *http.Client {
httpClient := httpcache.NewClient("memcache://", httpcache.WithUpstream(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it ever invalidate entries?

Copy link
Copy Markdown
Contributor Author

@edigaryev edigaryev Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does:

  • NewClient(dsn, ...) calls NewTransport(dsn, ...)
  • NewTransport(dsn, ...) calls store.Open(dsn)
  • store.Open(dsn) calls registry.Default().OpenConn(dsn)
  • registry.Default().OpenConn(dsn) calls memcache.Open(), which creates a new in-memory map

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't see how it caches between invocations of EvaluateConfig. To me it seems it only caches within a single invocation of EvaluateConfig which is still good but probably not effective at all.

Copy link
Copy Markdown
Contributor Author

@edigaryev edigaryev Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't see how it caches between invocations of EvaluateConfig.

It does not for security reasons. Otherwise that would be considered a shared cache, which github.com/bartventer/httpcache is not:

Note: This package is intended for use as a private (client-side) cache. It is not a shared or proxy cache. It is designed to be used with an HTTP client to cache responses from origin servers, improving performance and reducing load on those servers.

Also for shared caches, storing responses to authenticated requests is not allowed as per RFC 9111, §3:

A cache MUST NOT store a response to a request unless:

[...]

  • if the cache is shared: the Authorization header field is not present in the request (see Section 11.6.2 of [HTTP]) or a response directive is present that explicitly allows shared caching (see Section 3.5); and

We do some exceptions to this in Chacha, but in case of Chacha we control much more variables (e.g. we specifically list HTTP targets for exclusion), compared to all of the GitHub API.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Then I'm not going crazy. Exactly why I initially proposed to have a cache per GitHub Repo.

@edigaryev edigaryev requested a review from fkorotkov October 2, 2025 12:23
@edigaryev edigaryev changed the title Larker: enable HTTP caching for GitHub API client Larker: enable per-tenant HTTP caching for GitHub API client Oct 2, 2025
@edigaryev edigaryev changed the title Larker: enable per-tenant HTTP caching for GitHub API client Evaluator: enable per-tenant HTTP caching for GitHub API client Oct 2, 2025
@edigaryev edigaryev merged commit 3af5e7d into main Oct 6, 2025
9 of 11 checks passed
@edigaryev edigaryev deleted the larker-github-client-caching branch October 6, 2025 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants