Embeddings: avoid constantly rerunning job if it failed by jtibshirani · Pull Request #58980 · sourcegraph/sourcegraph-public-snapshot

jtibshirani · 2023-12-14T05:24:04Z

The embeddings policy framework attempts to rerun a repo job even if a previous
run failed at the exact same revision. This means that when a job failed, for
example because of rate limits or a problematic file, it would immediately be
rescheduled and fail again. This can be expensive and noisy.

Now, the policy framework does not rerun failed jobs unless the revision
changes. An admin can always kick off a job manually if they want to rerun a
job at the revision. This reduces noise and feels like a better trade-off.

Test plan

Modified unit tests

stefanhengl

LGTM!

jtibshirani · 2023-12-15T19:42:37Z

Thanks for the review. I'll backport this as part of a round of embeddings fixes.

The embeddings policy framework attempts to rerun a repo job even if a previous run failed at the exact same revision. This means that when a job failed, for example because of rate limits or a problematic file, it would immediately be rescheduled and fail again. This can be expensive and noisy. Now, the policy framework does **not** rerun failed jobs unless the revision changes. An admin can always kick off a job manually if they want to rerun a job at the revision. This reduces noise and feels like a better trade-off.

#59090) * Embeddings: refactor job scheduling code (#58787) Refactors the repository scheduling logic into two separate methods, one used by the GraphQL API, and the other by the policy framework. This makes the code easier to read and lets us make some improvements: * For policy scheduling, stop translating back-and-forth between repo IDs and names * Make sure to fail entire GraphQL request if there is an error fetching repos, instead of silently ignoring it ## Test plan Added new unit test * Embeddings: avoid constantly rerunning job if it failed (#58980) The embeddings policy framework attempts to rerun a repo job even if a previous run failed at the exact same revision. This means that when a job failed, for example because of rate limits or a problematic file, it would immediately be rescheduled and fail again. This can be expensive and noisy. Now, the policy framework does **not** rerun failed jobs unless the revision changes. An admin can always kick off a job manually if they want to rerun a job at the revision. This reduces noise and feels like a better trade-off. * Fix compile error

Embeddings: avoid constantly rerunning job if it failed

cfc4c56

cla-bot Bot added the cla-signed label Dec 14, 2023

Merge branch 'main' into jtibs/embeddings

f422ca6

jtibshirani marked this pull request as ready for review December 14, 2023 22:02

jtibshirani requested a review from a team December 14, 2023 22:52

stefanhengl approved these changes Dec 15, 2023

View reviewed changes

jtibshirani merged commit 550e077 into main Dec 15, 2023

jtibshirani deleted the jtibs/embeddings branch December 15, 2023 19:42

jtibshirani mentioned this pull request Dec 18, 2023

[Backport 5.2] Embeddings: avoid constantly rerunning job if it failed #59090

Merged

jtibshirani added the backported-to-5.2 label Dec 19, 2023

jtibshirani mentioned this pull request Jan 10, 2024

Search: add missing changelog entries for 5.2 #59486

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embeddings: avoid constantly rerunning job if it failed#58980

Embeddings: avoid constantly rerunning job if it failed#58980
jtibshirani merged 2 commits into
mainfrom
jtibs/embeddings

jtibshirani commented Dec 14, 2023 •

edited

Loading

Uh oh!

stefanhengl left a comment

Uh oh!

jtibshirani commented Dec 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jtibshirani commented Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

stefanhengl left a comment

Choose a reason for hiding this comment

Uh oh!

jtibshirani commented Dec 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jtibshirani commented Dec 14, 2023 •

edited

Loading