Evaluator: enable per-tenant HTTP caching for GitHub API client#953
Evaluator: enable per-tenant HTTP caching for GitHub API client#953
Conversation
pkg/larker/fs/github/github.go
Outdated
| httpClient := httpcache.NewClient("memcache://") | ||
|
|
||
| // GitHub has a 10-second timeout for API requests | ||
| httpClient.Timeout = 11 * time.Second |
There was a problem hiding this comment.
Is it a TTL for caching or just request timeout?
There was a problem hiding this comment.
Request timeout, this is copied from the previous code.
pkg/larker/fs/github/github.go
Outdated
| } | ||
|
|
||
| func (gh *GitHub) client() *github.Client { | ||
| httpClient := httpcache.NewClient("memcache://") |
There was a problem hiding this comment.
I think this should be a singleton shared between these instances. Otherwise we'll only cache things within a single starlark evaluation.
Maybe pass fs_cachable in the proto request.
There was a problem hiding this comment.
I think this should be a singleton shared between these instances. Otherwise we'll only cache things within a single starlark evaluation.
Fixed in 267599f.
Maybe pass
fs_cachablein the proto request.
Can you elaborate?
There was a problem hiding this comment.
Was thinking if it's safe to share the cache between repositories for a GH FS or should be have a cache of FSes per repo.
There was a problem hiding this comment.
I think the safest (and good enough) way to start with is to have a separate cache, HTTP client and GitHub client instance per EvaluateConfig() and EvaluateFunction().
The current implementation is almost like that, except that it instantiates a new cache, HTTP client and GitHub client instance on module loading (e.g. load("github.com/cirrus-modules/golang", "task", "container")).
It feels like we should avoid doing this if currentFS is already *github.GitHub here:
cirrus-cli/pkg/larker/resolver/module_fs.go
Lines 107 to 120 in fdfa078
There was a problem hiding this comment.
Actually, perhaps a separate *github.Github is totally fine, we just need to have a shared *http.Client per EvaluateConfig() and EvaluateFunction() invocation.
There was a problem hiding this comment.
Actually, perhaps a separate
*github.Githubis totally fine, we just need to have a shared*http.ClientperEvaluateConfig()andEvaluateFunction()invocation.
Implemented in 792cb3a.
017de46 to
529ce35
Compare
pkg/larker/fs/github/github.go
Outdated
|
|
||
| var defaultGitHubClient = github.NewClient(&http.Client{ | ||
| Transport: &http.Transport{ | ||
| var httpClient = httpcache.NewClient("memcache://", httpcache.WithUpstream( |
There was a problem hiding this comment.
Does it evict things ever?
There was a problem hiding this comment.
Good catch. I think we need re-initialize the caching HTTP client per-evaluation 🤔
internal/evaluator/evaluator.go
Outdated
| } | ||
|
|
||
| func cachingHTTPClient() *http.Client { | ||
| httpClient := httpcache.NewClient("memcache://", httpcache.WithUpstream( |
There was a problem hiding this comment.
Does it ever invalidate entries?
There was a problem hiding this comment.
It does:
NewClient(dsn, ...)callsNewTransport(dsn, ...)NewTransport(dsn, ...)callsstore.Open(dsn)store.Open(dsn)callsregistry.Default().OpenConn(dsn)registry.Default().OpenConn(dsn)callsmemcache.Open(), which creates a new in-memory map
There was a problem hiding this comment.
I just don't see how it caches between invocations of EvaluateConfig. To me it seems it only caches within a single invocation of EvaluateConfig which is still good but probably not effective at all.
There was a problem hiding this comment.
I just don't see how it caches between invocations of
EvaluateConfig.
It does not for security reasons. Otherwise that would be considered a shared cache, which github.com/bartventer/httpcache is not:
Note: This package is intended for use as a private (client-side) cache. It is not a shared or proxy cache. It is designed to be used with an HTTP client to cache responses from origin servers, improving performance and reducing load on those servers.
Also for shared caches, storing responses to authenticated requests is not allowed as per RFC 9111, §3:
A cache MUST NOT store a response to a request unless:
[...]
- if the cache is shared: the Authorization header field is not present in the request (see Section 11.6.2 of [HTTP]) or a response directive is present that explicitly allows shared caching (see Section 3.5); and
We do some exceptions to this in Chacha, but in case of Chacha we control much more variables (e.g. we specifically list HTTP targets for exclusion), compared to all of the GitHub API.
There was a problem hiding this comment.
OK. Then I'm not going crazy. Exactly why I initially proposed to have a cache per GitHub Repo.
To reduce GitHub API rate limit usage for Starlark scripts.