Embeddings: refactor job scheduling code#58787
Conversation
693a1b1 to
669d92c
Compare
There was a problem hiding this comment.
I couldn't see how it was helpful to create and call an anonymous function here, so I removed this indirection.
There was a problem hiding this comment.
it was important because we don't use a named variable for error in the return of ScheduleRepositories. If you just update it to a named var then the func will properly capture the error variable used to return. With this change your later "err" vars won't be the same as the err var captured in the defer.
There was a problem hiding this comment.
Ohh in retrospect this comment is obvious :) Thanks! A named return indeed seems best, I also think it's the only way for this assignment to work as intended:
defer func() { err = tx.Done(err) }()
Also updated our unit tests to make sure this case is covered.
669d92c to
6a28b80
Compare
669dee4 to
7c44dc0
Compare
7c44dc0 to
6f9515e
Compare
| return j != nil && (j.State == "completed" || j.State == "processing" || j.State == "queued") | ||
| return j.State == "completed" || j.State == "processing" || j.State == "queued" |
There was a problem hiding this comment.
any idea if this nil check was important? I'd be surprised if our worker gave us nil jobs.
Edit: I see later on we have an API that can return nil when we lookup a job. I guess people didn't handle that at those call sites.
There was a problem hiding this comment.
I moved the nil handling to the call sites, instead of doing it here. I'm not sure if there's a standard in Go, but in general nil checking at call sites feels more clear/ solid to me. Also checking for nil here was inconsistent with all the other methods on RepoEmbeddingJob, where we assume the job is non-nil.
There was a problem hiding this comment.
it was important because we don't use a named variable for error in the return of ScheduleRepositories. If you just update it to a named var then the func will properly capture the error variable used to return. With this change your later "err" vars won't be the same as the err var captured in the defer.
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
just checking, but this is one of the important changes. IE if a repo doesn't exist fail. And policy goes via something else to avoid this strange behaviour.
There was a problem hiding this comment.
Yes indeed, I tried to mention that in the PR description ("Make sure to fail entire GraphQL request if there is an error fetching repos, instead of silently ignoring it"). If you directly ask for a certain repo to be embedded through an API call, and it doesn't exist, then we should return an error.
For policies, I believe this should never really happen. For simplicity, we don't check or fail if the repos don't exist. In general, the main motivation for splitting things out wasn't fixing this bug, but rather making the logic clearer and being able to optimize the policy case.
Refactors the repository scheduling logic into two separate methods, one used by the GraphQL API, and the other by the policy framework. This makes the code easier to read and lets us make some improvements: * For policy scheduling, stop translating back-and-forth between repo IDs and names * Make sure to fail entire GraphQL request if there is an error fetching repos, instead of silently ignoring it ## Test plan Added new unit test
Refactors the repository scheduling logic into two separate methods, one used by the GraphQL API, and the other by the policy framework. This makes the code easier to read and lets us make some improvements: * For policy scheduling, stop translating back-and-forth between repo IDs and names * Make sure to fail entire GraphQL request if there is an error fetching repos, instead of silently ignoring it ## Test plan Added new unit test
#59090) * Embeddings: refactor job scheduling code (#58787) Refactors the repository scheduling logic into two separate methods, one used by the GraphQL API, and the other by the policy framework. This makes the code easier to read and lets us make some improvements: * For policy scheduling, stop translating back-and-forth between repo IDs and names * Make sure to fail entire GraphQL request if there is an error fetching repos, instead of silently ignoring it ## Test plan Added new unit test * Embeddings: avoid constantly rerunning job if it failed (#58980) The embeddings policy framework attempts to rerun a repo job even if a previous run failed at the exact same revision. This means that when a job failed, for example because of rate limits or a problematic file, it would immediately be rescheduled and fail again. This can be expensive and noisy. Now, the policy framework does **not** rerun failed jobs unless the revision changes. An admin can always kick off a job manually if they want to rerun a job at the revision. This reduces noise and feels like a better trade-off. * Fix compile error
Refactors the repository scheduling logic into two separate methods, one used
by the GraphQL API, and the other by the policy framework. This makes the code
easier to read and lets us make some improvements:
names
instead of silently ignoring it
Test plan
Added new unit test