This repository was archived by the owner on Sep 30, 2024. It is now read-only.
Embeddings: fix low-hanging issues with scheduling job#58510
Merged
Conversation
e85c03d to
a5a2e66
Compare
a5a2e66 to
066dbb8
Compare
Contributor
Author
|
I'm planning a follow-up that avoids making queries per every repo against the repo and jobs tables, which can be expensive when a large number of repos match the policy. |
keegancsmith
approved these changes
Nov 23, 2023
Collaborator
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-5.2 5.2
# Navigate to the new working tree
cd .worktrees/backport-5.2
# Create a new branch
git switch --create backport-58510-to-5.2
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7ea0c8bd07ddabe8e2f2dfb19fb048488a8b5977
# Push it to GitHub
git push --set-upstream origin backport-58510-to-5.2
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-5.2If you encouter conflict, first resolve the conflict and stage all files, then run the commands below: git cherry-pick --continue
# Push it to GitHub
git push --set-upstream origin backport-58510-to-5.2
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-5.2
|
jtibshirani
added a commit
that referenced
this pull request
Nov 29, 2023
As part of the embeddings policy framework, a worker periodically checks what repos can be embedded. For every candidate repo, it queries the DB to see if there's a new revision to embed. This runs every minute and becomes increasingly expensive as the jobs table fills up with more entries over time. This change makes small optimizations to improve this: * Add an index to make selecting on `repo_id` and `revision` much faster * Check the repos every 5 minutes instead of 1 minute. This shouldn't make a huge difference in user experience, since by default embeddings jobs aren't allowed to be scheduled within 24h of the last run
keegancsmith
pushed a commit
that referenced
this pull request
Nov 29, 2023
…#58651) Embeddings: fix low-hanging issues with scheduling job (#58510) As part of the embeddings policy framework, a worker periodically checks what repos can be embedded. For every candidate repo, it queries the DB to see if there's a new revision to embed. This runs every minute and becomes increasingly expensive as the jobs table fills up with more entries over time. This change makes small optimizations to improve this: * Add an index to make selecting on `repo_id` and `revision` much faster * Check the repos every 5 minutes instead of 1 minute. This shouldn't make a huge difference in user experience, since by default embeddings jobs aren't allowed to be scheduled within 24h of the last run
vovakulikov
pushed a commit
that referenced
this pull request
Dec 12, 2023
As part of the embeddings policy framework, a worker periodically checks what repos can be embedded. For every candidate repo, it queries the DB to see if there's a new revision to embed. This runs every minute and becomes increasingly expensive as the jobs table fills up with more entries over time. This change makes small optimizations to improve this: * Add an index to make selecting on `repo_id` and `revision` much faster * Check the repos every 5 minutes instead of 1 minute. This shouldn't make a huge difference in user experience, since by default embeddings jobs aren't allowed to be scheduled within 24h of the last run
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As part of the embeddings policy framework, a worker periodically checks what
repos can be embedded. For every candidate repo, it queries the DB to see if
there's a new revision to embed. This runs every minute and becomes
increasingly expensive as the jobs table fills up with more entries over time.
This change makes small optimizations to improve this:
repo_idandrevisionmuch fasterhuge difference in user experience, since by default embeddings jobs aren't
allowed to be scheduled within 24h of the last run
Addresses #58360
Test plan
Covered by existing embeddings jobs tests and DB upgrade tests
Preview 🤩
Preview Link