Embeddings: fail job immediately if rate limited exceeded by jtibshirani · Pull Request #58869 · sourcegraph/sourcegraph-public-snapshot

jtibshirani · 2023-12-08T22:38:23Z

Usually, during an embeddings job we allow 10% of embedding requests to fail,
simply skipping over failed chunks. If a customer has hit their rate limits,
this means we might continually send a huge number of embedding requests that
we know will immediately fail. With this change, we immediately fail a job if
the rate limit is exceeded.

It also increases the amount of time between attempting to run a job to 15
minutes. This won't make a big difference to user experience, since by default
embeddings jobs aren't allowed to be scheduled within 24h of the last run. But
it helps prevent jobs from continuously being scheduled then failing.

This change is unlikely to have a user-facing impact, but just helps cut down
on noise in logs and excessive requests to Cody Gateway.

Test plan

Added new unit test

Preview 🤩

Preview Link

keegancsmith · 2023-12-11T07:22:54Z

+				// To avoid failing large jobs on a flaky API, just mark all files
+				// as failed and continue. This means we may have some missing
+				// files, but they will be logged as such below and some embeddings
+				// are better than no embeddings.


do we know what errors we would get back in this case? Feels like it would be more robust to allow list the errors we have this behaviour with rather than assuming it is the flaky API. Either way this is an improvement so rather land this PR than directly address this comment. Also fine to say we don't know what the errors are :)

Good point! Looking at the PR where it was added (https://github.com/sourcegraph/sourcegraph/pull/57224), I don't think we have a precise handle on the causes of flakiness. So I'll leave this as-is for now.

sourcegraph-release-bot · 2023-12-11T16:22:21Z

The backport to 5.2 failed at https://github.com/sourcegraph/sourcegraph/actions/runs/7170316193:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-5.2 5.2
# Navigate to the new working tree
cd .worktrees/backport-5.2
# Create a new branch
git switch --create backport-58869-to-5.2
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 6ea73218c13abcaf9938e912d6b4809d02e36ace
# Push it to GitHub
git push --set-upstream origin backport-58869-to-5.2
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-5.2

If you encouter conflict, first resolve the conflict and stage all files, then run the commands below:

git cherry-pick --continue
# Push it to GitHub
git push --set-upstream origin backport-58869-to-5.2
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-5.2

Follow above instructions to backport the commit.
Create a pull request where the base branch is 5.2 and the compare/head branch is backport-58869-to-5.2., click here to create the pull request.
Make sure to tag @sourcegraph/release-guild in the pull request description.
Once the backport pull request is created, kindly remove the release-blocker from this pull request.

Usually, during an embeddings job we allow 10% of embedding requests to fail, simply skipping over failed chunks. If a customer has hit their rate limits, this means we might continually send a huge number of embedding requests that we know will immediately fail. With this change, we immediately fail a job if the rate limit is exceeded. It also increases the amount of time between attempting to run a job to 15 minutes. This won't make a big difference to user experience, since by default embeddings jobs aren't allowed to be scheduled within 24h of the last run. But it helps prevent jobs from continuously being scheduled then failing. This change is unlikely to have a user-facing impact, but just helps cut down on noise in logs and excessive requests to Cody Gateway.

…ded (#58939) Embeddings: fail job immediately if rate limited exceeded (#58869) Usually, during an embeddings job we allow 10% of embedding requests to fail, simply skipping over failed chunks. If a customer has hit their rate limits, this means we might continually send a huge number of embedding requests that we know will immediately fail. With this change, we immediately fail a job if the rate limit is exceeded. It also increases the amount of time between attempting to run a job to 15 minutes. This won't make a big difference to user experience, since by default embeddings jobs aren't allowed to be scheduled within 24h of the last run. But it helps prevent jobs from continuously being scheduled then failing. This change is unlikely to have a user-facing impact, but just helps cut down on noise in logs and excessive requests to Cody Gateway.

jtibshirani added 2 commits December 8, 2023 14:07

Embeddings: fail job immediately if rate limited exceeded

e65a067

Increase time between scheduled jobs

8e9edc6

cla-bot Bot added the cla-signed label Dec 8, 2023

jtibshirani marked this pull request as ready for review December 8, 2023 22:58

jtibshirani requested a review from a team December 8, 2023 22:59

keegancsmith approved these changes Dec 11, 2023

View reviewed changes

jtibshirani merged commit 6ea7321 into main Dec 11, 2023

jtibshirani deleted the jtibs/embeddings branch December 11, 2023 16:20

jtibshirani added the backport 5.2 label Dec 11, 2023

sourcegraph-release-bot added backports failed-backport-to-5.2 release-blocker Prevents us from releasing: https://about.sourcegraph.com/handbook/engineering/releases labels Dec 11, 2023

jtibshirani mentioned this pull request Dec 12, 2023

[Backport 5.2] Embeddings: fail job immediately if rate limited exceeded #58939

Merged

jtibshirani removed release-blocker Prevents us from releasing: https://about.sourcegraph.com/handbook/engineering/releases failed-backport-to-5.2 labels Jan 10, 2024

jtibshirani mentioned this pull request Jan 10, 2024

Search: add missing changelog entries for 5.2 #59486

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embeddings: fail job immediately if rate limited exceeded#58869

Embeddings: fail job immediately if rate limited exceeded#58869
jtibshirani merged 2 commits into
mainfrom
jtibs/embeddings

jtibshirani commented Dec 8, 2023 •

edited

Loading

Uh oh!

keegancsmith Dec 11, 2023

Uh oh!

jtibshirani Dec 11, 2023

Uh oh!

sourcegraph-release-bot commented Dec 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jtibshirani commented Dec 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Preview 🤩

Uh oh!

keegancsmith Dec 11, 2023

Choose a reason for hiding this comment

Uh oh!

jtibshirani Dec 11, 2023

Choose a reason for hiding this comment

Uh oh!

sourcegraph-release-bot commented Dec 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jtibshirani commented Dec 8, 2023 •

edited

Loading