Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

syncer: Fix issues causing excessive fetches#62837

Merged
eseliger merged 1 commit into
mainfrom
es/05-22-syncerfixissuescausingexcessivefetches
May 23, 2024
Merged

syncer: Fix issues causing excessive fetches#62837
eseliger merged 1 commit into
mainfrom
es/05-22-syncerfixissuescausingexcessivefetches

Conversation

@eseliger

Copy link
Copy Markdown
Member

This PR is the result of quite a journey. You'll see a bunch of meaningless one-liners, and yes - this was a bit frustrating :D

tl;dr I noticed that S2 was fetching repos a ton of times, but the numbers didn't check out. Why would we need 9 fetches a second to keep 1100 repos fresh, of which most are very inactive?
Turns out, we trigger fetches for repos that are marked as modified during code host syncing.
That in itself seems mostly unnecessary (so I reduced the conditions under which that would trigger an immediate update),
and the condition for detecting when a repo was modified was also not correctly computed in a bunch of cases.
These are fixed by one-liners here as well. Most likely, this is not a complete list, but it's what I found locally.

Here's some more writeup:
https://www.notion.so/sourcegraph/May-22-2024-S2-fetches-repos-a-lot-e19c282bb3564140b09086ebce172450?pvs=4

Test plan:

See the notion journey as well, but I verified that the repos are no longer marked as modified erroneously in my local instance.

@cla-bot cla-bot Bot added the cla-signed label May 22, 2024

Copy link
Copy Markdown
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @eseliger and the rest of your teammates on Graphite Graphite

@github-actions github-actions Bot added team/product-platform team/source Tickets under the purview of Source - the one Source to graph it all labels May 22, 2024
Comment thread internal/repos/azuredevops.go Outdated

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata is always a pointer when unmarshalled from the DB, so it should also be a pointer on this end, otherwise the changeset syncer always detects it as "Metadata changed" because it switched from non-pointer to pointer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

facepalm

Comment thread internal/extsvc/github/v4.go Outdated

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a condition below that adds it to the conditional fields, but those are never added here, so instead we add it here explicitly.
For dotcom, we would not have fetched visibility previously.

Comment thread internal/repos/github.go Outdated

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on if the value comes from GQL or REST it can be uppercase or lowercase. This unifies it, although I don't love that this exists in the first place.

Comment thread internal/extsvc/github/common.go Outdated

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the parent wasn't persisted in the DB, so it would always trigger a "metadata changed" - this was added for supporting deduplication of objects but we stopped that project, so it's safe to remove it for now.

Comment thread cmd/repo-updater/internal/scheduler/scheduler.go Outdated
@eseliger eseliger force-pushed the es/05-22-syncerfixissuescausingexcessivefetches branch 2 times, most recently from 77ca8f5 to 2870ccd Compare May 22, 2024 14:51
@eseliger eseliger marked this pull request as ready for review May 22, 2024 14:52
@eseliger eseliger requested a review from a team May 22, 2024 14:52
Comment thread cmd/repo-updater/internal/scheduler/scheduler_test.go Outdated
Comment thread internal/extsvc/github/testdata/golden/TestV3Client_Fork_success_user Outdated
Comment thread internal/repos/testdata/sources/GITLAB/TestGitlabSource_ListRepos Outdated
This PR is the result of quite a journey. You'll see a bunch of meaningless one-liners, and yes - this was a bit frustrating :D

tl;dr I noticed that S2 was fetching repos a ton of times, but the numbers didn't check out. Why would we need 9 fetches a second to keep 1100 repos fresh, of which most are very inactive?
Turns out, we trigger fetches for repos that are marked as modified during code host syncing.
That in itself seems mostly unnecessary (so I reduced the conditions under which that would trigger an immediate update),
and the condition for detecting when a repo was modified was also not correctly computed in a bunch of cases.
These are fixed by one-liners here as well. Most likely, this is not a complete list, but it's what I found locally.

Here's some more writeup:
https://www.notion.so/sourcegraph/May-22-2024-S2-fetches-repos-a-lot-e19c282bb3564140b09086ebce172450?pvs=4

Test plan:

See the notion journey as well, but I verified that the repos are no longer marked as modified erroneously in my local instance.
@eseliger eseliger force-pushed the es/05-22-syncerfixissuescausingexcessivefetches branch from 2870ccd to ed5f9e5 Compare May 23, 2024 18:36
@eseliger eseliger merged commit d4489a8 into main May 23, 2024
@eseliger eseliger deleted the es/05-22-syncerfixissuescausingexcessivefetches branch May 23, 2024 18:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/product-platform team/source Tickets under the purview of Source - the one Source to graph it all

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants