Embeddings: refactor job scheduling code by jtibshirani · Pull Request #58787 · sourcegraph/sourcegraph-public-snapshot

jtibshirani · 2023-12-05T20:08:30Z

Refactors the repository scheduling logic into two separate methods, one used
by the GraphQL API, and the other by the policy framework. This makes the code
easier to read and lets us make some improvements:

For policy scheduling, stop translating back-and-forth between repo IDs and
names
Make sure to fail entire GraphQL request if there is an error fetching repos,
instead of silently ignoring it

Test plan

Added new unit test

jtibshirani · 2023-12-05T20:30:53Z

I couldn't see how it was helpful to create and call an anonymous function here, so I removed this indirection.

it was important because we don't use a named variable for error in the return of ScheduleRepositories. If you just update it to a named var then the func will properly capture the error variable used to return. With this change your later "err" vars won't be the same as the err var captured in the defer.

Ohh in retrospect this comment is obvious :) Thanks! A named return indeed seems best, I also think it's the only way for this assignment to work as intended:

defer func() { err = tx.Done(err) }()

Also updated our unit tests to make sure this case is covered.

keegancsmith · 2023-12-06T13:11:57Z

-	return j != nil && (j.State == "completed" || j.State == "processing" || j.State == "queued")
+	return j.State == "completed" || j.State == "processing" || j.State == "queued"


any idea if this nil check was important? I'd be surprised if our worker gave us nil jobs.

Edit: I see later on we have an API that can return nil when we lookup a job. I guess people didn't handle that at those call sites.

I moved the nil handling to the call sites, instead of doing it here. I'm not sure if there's a standard in Go, but in general nil checking at call sites feels more clear/ solid to me. Also checking for nil here was inconsistent with all the other methods on RepoEmbeddingJob, where we assume the job is non-nil.

keegancsmith · 2023-12-06T13:14:06Z

it was important because we don't use a named variable for error in the return of ScheduleRepositories. If you just update it to a named var then the func will properly capture the error variable used to return. With this change your later "err" vars won't be the same as the err var captured in the defer.

keegancsmith · 2023-12-06T13:15:23Z

+		if err != nil {
+			return err
+		}


just checking, but this is one of the important changes. IE if a repo doesn't exist fail. And policy goes via something else to avoid this strange behaviour.

Yes indeed, I tried to mention that in the PR description ("Make sure to fail entire GraphQL request if there is an error fetching repos, instead of silently ignoring it"). If you directly ask for a certain repo to be embedded through an API call, and it doesn't exist, then we should return an error.

For policies, I believe this should never really happen. For simplicity, we don't check or fail if the repos don't exist. In general, the main motivation for splitting things out wasn't fixing this bug, but rather making the logic clearer and being able to optimize the policy case.

Refactors the repository scheduling logic into two separate methods, one used by the GraphQL API, and the other by the policy framework. This makes the code easier to read and lets us make some improvements: * For policy scheduling, stop translating back-and-forth between repo IDs and names * Make sure to fail entire GraphQL request if there is an error fetching repos, instead of silently ignoring it ## Test plan Added new unit test

#59090) * Embeddings: refactor job scheduling code (#58787) Refactors the repository scheduling logic into two separate methods, one used by the GraphQL API, and the other by the policy framework. This makes the code easier to read and lets us make some improvements: * For policy scheduling, stop translating back-and-forth between repo IDs and names * Make sure to fail entire GraphQL request if there is an error fetching repos, instead of silently ignoring it ## Test plan Added new unit test * Embeddings: avoid constantly rerunning job if it failed (#58980) The embeddings policy framework attempts to rerun a repo job even if a previous run failed at the exact same revision. This means that when a job failed, for example because of rate limits or a problematic file, it would immediately be rescheduled and fail again. This can be expensive and noisy. Now, the policy framework does **not** rerun failed jobs unless the revision changes. An admin can always kick off a job manually if they want to rerun a job at the revision. This reduces noise and feels like a better trade-off. * Fix compile error

cla-bot Bot added the cla-signed label Dec 5, 2023

jtibshirani force-pushed the jtibs/embeddings branch from 693a1b1 to 669d92c Compare December 5, 2023 20:28

jtibshirani commented Dec 5, 2023

View reviewed changes

jtibshirani force-pushed the jtibs/embeddings branch from 669d92c to 6a28b80 Compare December 5, 2023 20:35

jtibshirani requested review from a team December 5, 2023 20:37

jtibshirani force-pushed the jtibs/embeddings branch 2 times, most recently from 669dee4 to 7c44dc0 Compare December 6, 2023 02:31

Add dedicated method for policy scheduling

6f9515e

jtibshirani force-pushed the jtibs/embeddings branch from 7c44dc0 to 6f9515e Compare December 6, 2023 02:47

keegancsmith approved these changes Dec 6, 2023

View reviewed changes

jtibshirani added 4 commits December 6, 2023 11:58

Use ListMinimalRepos and simplify iteration

5a8d455

When scheduling through API, return error if fetching revision fails

99c5622

Use named return to properly roll back transaction

48b0c1e

Merge remote-tracking branch 'upstream/main' into jtibs/embeddings

5344f48

jtibshirani merged commit 98c64be into main Dec 6, 2023

jtibshirani deleted the jtibs/embeddings branch December 6, 2023 22:40

jtibshirani mentioned this pull request Dec 6, 2023

Embeddings: fix slow scheduling queries #58360

Closed

jtibshirani mentioned this pull request Dec 18, 2023

[Backport 5.2] Embeddings: avoid constantly rerunning job if it failed #59090

Merged

jtibshirani added the backported-to-5.2 label Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embeddings: refactor job scheduling code#58787

Embeddings: refactor job scheduling code#58787
jtibshirani merged 5 commits into
mainfrom
jtibs/embeddings

jtibshirani commented Dec 5, 2023

Uh oh!

jtibshirani Dec 5, 2023

Uh oh!

keegancsmith Dec 6, 2023

Uh oh!

jtibshirani Dec 6, 2023

Uh oh!

keegancsmith Dec 6, 2023 •

edited

Loading

Uh oh!

jtibshirani Dec 6, 2023

Uh oh!

keegancsmith Dec 6, 2023

Uh oh!

keegancsmith Dec 6, 2023

Uh oh!

jtibshirani Dec 6, 2023

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return j != nil && (j.State == "completed" \|\| j.State == "processing" \|\| j.State == "queued")
		return j.State == "completed" \|\| j.State == "processing" \|\| j.State == "queued"

Conversation

jtibshirani commented Dec 5, 2023

Test plan

Uh oh!

jtibshirani Dec 5, 2023

Choose a reason for hiding this comment

Uh oh!

keegancsmith Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

jtibshirani Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

keegancsmith Dec 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtibshirani Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

keegancsmith Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

keegancsmith Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

jtibshirani Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

keegancsmith Dec 6, 2023 •

edited

Loading