Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Embeddings: fix low-hanging issues with scheduling job#58510

Merged
jtibshirani merged 1 commit into
mainfrom
jtibs/embeddings
Nov 27, 2023
Merged

Embeddings: fix low-hanging issues with scheduling job#58510
jtibshirani merged 1 commit into
mainfrom
jtibs/embeddings

Conversation

@jtibshirani

@jtibshirani jtibshirani commented Nov 22, 2023

Copy link
Copy Markdown
Contributor

As part of the embeddings policy framework, a worker periodically checks what
repos can be embedded. For every candidate repo, it queries the DB to see if
there's a new revision to embed. This runs every minute and becomes
increasingly expensive as the jobs table fills up with more entries over time.

This change makes small optimizations to improve this:

  • Add an index to make selecting on repo_id and revision much faster
  • Check the repos every 5 minutes instead of 1 minute. This shouldn't make a
    huge difference in user experience, since by default embeddings jobs aren't
    allowed to be scheduled within 24h of the last run

Addresses #58360

Test plan

Covered by existing embeddings jobs tests and DB upgrade tests

Preview 🤩

Preview Link

@jtibshirani jtibshirani marked this pull request as ready for review November 22, 2023 23:32
@jtibshirani jtibshirani requested review from a team November 23, 2023 01:01
@jtibshirani

Copy link
Copy Markdown
Contributor Author

I'm planning a follow-up that avoids making queries per every repo against the repo and jobs tables, which can be expensive when a large number of repos match the policy.

@sourcegraph-release-bot

Copy link
Copy Markdown
Collaborator

The backport to 5.2 failed at https://github.com/sourcegraph/sourcegraph/actions/runs/7029567908:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-5.2 5.2
# Navigate to the new working tree
cd .worktrees/backport-5.2
# Create a new branch
git switch --create backport-58510-to-5.2
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7ea0c8bd07ddabe8e2f2dfb19fb048488a8b5977
# Push it to GitHub
git push --set-upstream origin backport-58510-to-5.2
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-5.2

If you encouter conflict, first resolve the conflict and stage all files, then run the commands below:

git cherry-pick --continue
# Push it to GitHub
git push --set-upstream origin backport-58510-to-5.2
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-5.2
  • Follow above instructions to backport the commit.
  • Create a pull request where the base branch is 5.2 and the compare/head branch is backport-58510-to-5.2., click here to create the pull request.
  • Make sure to tag @sourcegraph/release-guild in the pull request description.
  • Once the backport pull request is created, kindly remove the release-blocker from this pull request.

@sourcegraph-release-bot sourcegraph-release-bot added backports failed-backport-to-5.2 release-blocker Prevents us from releasing: https://about.sourcegraph.com/handbook/engineering/releases labels Nov 29, 2023
jtibshirani added a commit that referenced this pull request Nov 29, 2023
As part of the embeddings policy framework, a worker periodically checks what
repos can be embedded. For every candidate repo, it queries the DB to see if
there's a new revision to embed. This runs every minute and becomes
increasingly expensive as the jobs table fills up with more entries over time.

This change makes small optimizations to improve this:
* Add an index to make selecting on `repo_id` and `revision` much faster
* Check the repos every 5 minutes instead of 1 minute. This shouldn't make a
huge difference in user experience, since by default embeddings jobs aren't
allowed to be scheduled within 24h of the last run
keegancsmith pushed a commit that referenced this pull request Nov 29, 2023
…#58651)

Embeddings: fix low-hanging issues with scheduling job (#58510)

As part of the embeddings policy framework, a worker periodically checks what
repos can be embedded. For every candidate repo, it queries the DB to see if
there's a new revision to embed. This runs every minute and becomes
increasingly expensive as the jobs table fills up with more entries over time.

This change makes small optimizations to improve this:
* Add an index to make selecting on `repo_id` and `revision` much faster
* Check the repos every 5 minutes instead of 1 minute. This shouldn't make a
huge difference in user experience, since by default embeddings jobs aren't
allowed to be scheduled within 24h of the last run
@jtibshirani jtibshirani removed release-blocker Prevents us from releasing: https://about.sourcegraph.com/handbook/engineering/releases failed-backport-to-5.2 labels Dec 8, 2023
vovakulikov pushed a commit that referenced this pull request Dec 12, 2023
As part of the embeddings policy framework, a worker periodically checks what
repos can be embedded. For every candidate repo, it queries the DB to see if
there's a new revision to embed. This runs every minute and becomes
increasingly expensive as the jobs table fills up with more entries over time.

This change makes small optimizations to improve this:
* Add an index to make selecting on `repo_id` and `revision` much faster
* Check the repos every 5 minutes instead of 1 minute. This shouldn't make a
huge difference in user experience, since by default embeddings jobs aren't
allowed to be scheduled within 24h of the last run
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants