Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

repo-updater: Hydrate schedule on startup#62891

Merged
eseliger merged 1 commit into
mainfrom
es/05-24-repo-updaterhydratescheduleonstartup
Jun 4, 2024
Merged

repo-updater: Hydrate schedule on startup#62891
eseliger merged 1 commit into
mainfrom
es/05-24-repo-updaterhydratescheduleonstartup

Conversation

@eseliger

Copy link
Copy Markdown
Member

Currently, when repo-updater restarts it loses all intel it collected over time. That causes a large flood of git fetch requests after it restarts. Every repo will be enqueued for an immediate update.

This PR fixes that by populating the scheduler with an initial delay per repo that is calculated with the same heuristic that the scheduler uses when it's fully warmed up.

This should avoid fetching git repos that are very stale (most likely the majority on instances with many repos).

Test plan:

Ran it locally, verified the scheduler state using the instrumentation tool for it, the schedule looks as expected and most repos aren't scheduled for the next 8h.

@cla-bot cla-bot Bot added the cla-signed label May 24, 2024

Copy link
Copy Markdown
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @eseliger and the rest of your teammates on Graphite Graphite

@github-actions github-actions Bot added team/product-platform team/source Tickets under the purview of Source - the one Source to graph it all labels May 24, 2024
if len(managed) == 0 {
return nil
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes an edge-case where the schedule is empty and all repos would be loaded into the scheduler accidentally.

ctx, cancel := context.WithCancel(actor.WithInternalActor(context.Background()))
s.cancelCtx = cancel

if !dotcom.SourcegraphDotComMode() {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot work as easily for dotcom, it's too complex right now with the indexed repos and friends are indexed and fetched regularly only stuff.

I didn't want to spend too many brain cycles on this, but we can get back to it.

@eseliger eseliger force-pushed the es/05-24-repo-updaterhydratescheduleonstartup branch from 3689395 to 971d93a Compare June 4, 2024 10:43
Currently, when repo-updater restarts it loses all intel it collected over time. That causes a large flood of git fetch requests after it restarts. Every repo will be enqueued for an immediate update.

This PR fixes that by populating the scheduler with an initial delay per repo that is calculated with the same heuristic that the scheduler uses when it's fully warmed up.

This should avoid fetching git repos that are very stale (most likely the majority on instances with many repos).

Test plan:

Ran it locally, verified the scheduler state using the instrumentation tool for it, the schedule looks as expected and most repos aren't scheduled for the next 8h.
@eseliger eseliger force-pushed the es/05-24-repo-updaterhydratescheduleonstartup branch from 971d93a to 1a0ebd7 Compare June 4, 2024 10:46
@eseliger eseliger marked this pull request as ready for review June 4, 2024 11:41
@eseliger eseliger requested a review from a team June 4, 2024 11:41
@eseliger eseliger merged commit 03c05e5 into main Jun 4, 2024
@eseliger eseliger deleted the es/05-24-repo-updaterhydratescheduleonstartup branch June 4, 2024 17:00
eseliger referenced this pull request Jun 4, 2024
My local instance has few repos enough that this doesn't happen, but on larger instances this preloading fights with the new preloading.
They are both best effort, and are meant to achieve the same thing.
Thus, this one is not required anymore, and we can delete it, after we added another one in https://github.com/sourcegraph/sourcegraph/pull/62891.

Test plan:

Verified with sleeps and logs locally that repos are correctly upserted in the schedule now.
eseliger referenced this pull request Jun 4, 2024
…#63086)

My local instance has few repos enough that this doesn't happen, but on larger instances this preloading fights with the new preloading.
They are both best effort, and are meant to achieve the same thing.
Thus, this one is not required anymore, and we can delete it, after we added another one in https://github.com/sourcegraph/sourcegraph/pull/62891.

Test plan:

Verified with sleeps and logs locally that repos are correctly upserted in the schedule now.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/product-platform team/source Tickets under the purview of Source - the one Source to graph it all

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants