syncer: Fix issues causing excessive fetches#62837
Conversation
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Metadata is always a pointer when unmarshalled from the DB, so it should also be a pointer on this end, otherwise the changeset syncer always detects it as "Metadata changed" because it switched from non-pointer to pointer.
There was a problem hiding this comment.
There's a condition below that adds it to the conditional fields, but those are never added here, so instead we add it here explicitly.
For dotcom, we would not have fetched visibility previously.
There was a problem hiding this comment.
Depending on if the value comes from GQL or REST it can be uppercase or lowercase. This unifies it, although I don't love that this exists in the first place.
There was a problem hiding this comment.
the parent wasn't persisted in the DB, so it would always trigger a "metadata changed" - this was added for supporting deduplication of objects but we stopped that project, so it's safe to remove it for now.
77ca8f5 to
2870ccd
Compare
This PR is the result of quite a journey. You'll see a bunch of meaningless one-liners, and yes - this was a bit frustrating :D tl;dr I noticed that S2 was fetching repos a ton of times, but the numbers didn't check out. Why would we need 9 fetches a second to keep 1100 repos fresh, of which most are very inactive? Turns out, we trigger fetches for repos that are marked as modified during code host syncing. That in itself seems mostly unnecessary (so I reduced the conditions under which that would trigger an immediate update), and the condition for detecting when a repo was modified was also not correctly computed in a bunch of cases. These are fixed by one-liners here as well. Most likely, this is not a complete list, but it's what I found locally. Here's some more writeup: https://www.notion.so/sourcegraph/May-22-2024-S2-fetches-repos-a-lot-e19c282bb3564140b09086ebce172450?pvs=4 Test plan: See the notion journey as well, but I verified that the repos are no longer marked as modified erroneously in my local instance.
2870ccd to
ed5f9e5
Compare


This PR is the result of quite a journey. You'll see a bunch of meaningless one-liners, and yes - this was a bit frustrating :D
tl;dr I noticed that S2 was fetching repos a ton of times, but the numbers didn't check out. Why would we need 9 fetches a second to keep 1100 repos fresh, of which most are very inactive?
Turns out, we trigger fetches for repos that are marked as modified during code host syncing.
That in itself seems mostly unnecessary (so I reduced the conditions under which that would trigger an immediate update),
and the condition for detecting when a repo was modified was also not correctly computed in a bunch of cases.
These are fixed by one-liners here as well. Most likely, this is not a complete list, but it's what I found locally.
Here's some more writeup:
https://www.notion.so/sourcegraph/May-22-2024-S2-fetches-repos-a-lot-e19c282bb3564140b09086ebce172450?pvs=4
Test plan:
See the notion journey as well, but I verified that the repos are no longer marked as modified erroneously in my local instance.