Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Embeddings: refactor job scheduling code#58787

Merged
jtibshirani merged 5 commits into
mainfrom
jtibs/embeddings
Dec 6, 2023
Merged

Embeddings: refactor job scheduling code#58787
jtibshirani merged 5 commits into
mainfrom
jtibs/embeddings

Conversation

@jtibshirani

Copy link
Copy Markdown
Contributor

Refactors the repository scheduling logic into two separate methods, one used
by the GraphQL API, and the other by the policy framework. This makes the code
easier to read and lets us make some improvements:

  • For policy scheduling, stop translating back-and-forth between repo IDs and
    names
  • Make sure to fail entire GraphQL request if there is an error fetching repos,
    instead of silently ignoring it

Test plan

Added new unit test

Comment thread internal/embeddings/schedule.go Outdated

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't see how it was helpful to create and call an anonymous function here, so I removed this indirection.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was important because we don't use a named variable for error in the return of ScheduleRepositories. If you just update it to a named var then the func will properly capture the error variable used to return. With this change your later "err" vars won't be the same as the err var captured in the defer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh in retrospect this comment is obvious :) Thanks! A named return indeed seems best, I also think it's the only way for this assignment to work as intended:

defer func() { err = tx.Done(err) }()

Also updated our unit tests to make sure this case is covered.

@jtibshirani jtibshirani requested review from a team December 5, 2023 20:37
@jtibshirani jtibshirani force-pushed the jtibs/embeddings branch 2 times, most recently from 669dee4 to 7c44dc0 Compare December 6, 2023 02:31
Comment on lines -40 to +41
return j != nil && (j.State == "completed" || j.State == "processing" || j.State == "queued")
return j.State == "completed" || j.State == "processing" || j.State == "queued"

@keegancsmith keegancsmith Dec 6, 2023

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea if this nil check was important? I'd be surprised if our worker gave us nil jobs.

Edit: I see later on we have an API that can return nil when we lookup a job. I guess people didn't handle that at those call sites.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the nil handling to the call sites, instead of doing it here. I'm not sure if there's a standard in Go, but in general nil checking at call sites feels more clear/ solid to me. Also checking for nil here was inconsistent with all the other methods on RepoEmbeddingJob, where we assume the job is non-nil.

Comment thread internal/embeddings/schedule.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was important because we don't use a named variable for error in the return of ScheduleRepositories. If you just update it to a named var then the func will properly capture the error variable used to return. With this change your later "err" vars won't be the same as the err var captured in the defer.

Comment on lines +29 to +31
if err != nil {
return err
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checking, but this is one of the important changes. IE if a repo doesn't exist fail. And policy goes via something else to avoid this strange behaviour.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, I tried to mention that in the PR description ("Make sure to fail entire GraphQL request if there is an error fetching repos, instead of silently ignoring it"). If you directly ask for a certain repo to be embedded through an API call, and it doesn't exist, then we should return an error.

For policies, I believe this should never really happen. For simplicity, we don't check or fail if the repos don't exist. In general, the main motivation for splitting things out wasn't fixing this bug, but rather making the logic clearer and being able to optimize the policy case.

Comment thread internal/embeddings/schedule.go Outdated
Comment thread internal/embeddings/schedule.go Outdated
@jtibshirani jtibshirani merged commit 98c64be into main Dec 6, 2023
@jtibshirani jtibshirani deleted the jtibs/embeddings branch December 6, 2023 22:40
vovakulikov pushed a commit that referenced this pull request Dec 12, 2023
Refactors the repository scheduling logic into two separate methods, one used
by the GraphQL API, and the other by the policy framework. This makes the code
easier to read and lets us make some improvements:
* For policy scheduling, stop translating back-and-forth between repo IDs and
names
* Make sure to fail entire GraphQL request if there is an error fetching repos,
instead of silently ignoring it

## Test plan

Added new unit test
jtibshirani added a commit that referenced this pull request Dec 18, 2023
Refactors the repository scheduling logic into two separate methods, one used
by the GraphQL API, and the other by the policy framework. This makes the code
easier to read and lets us make some improvements:
* For policy scheduling, stop translating back-and-forth between repo IDs and
names
* Make sure to fail entire GraphQL request if there is an error fetching repos,
instead of silently ignoring it

## Test plan

Added new unit test
camdencheek pushed a commit that referenced this pull request Jan 9, 2024
#59090)

* Embeddings: refactor job scheduling code (#58787)

Refactors the repository scheduling logic into two separate methods, one used
by the GraphQL API, and the other by the policy framework. This makes the code
easier to read and lets us make some improvements:
* For policy scheduling, stop translating back-and-forth between repo IDs and
names
* Make sure to fail entire GraphQL request if there is an error fetching repos,
instead of silently ignoring it

## Test plan

Added new unit test

* Embeddings: avoid constantly rerunning job if it failed (#58980)

The embeddings policy framework attempts to rerun a repo job even if a previous
run failed at the exact same revision. This means that when a job failed, for
example because of rate limits or a problematic file, it would immediately be
rescheduled and fail again. This can be expensive and noisy.

Now, the policy framework does **not** rerun failed jobs unless the revision
changes. An admin can always kick off a job manually if they want to rerun a
job at the revision. This reduces noise and feels like a better trade-off.

* Fix compile error
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants