Code ownership: Parse GitHub style code owners file to get ownership information#39250
Conversation
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 4907aa0...52c1dc8.
|
| filteredStream := streaming.StreamFunc(func(event streaming.SearchEvent) { | ||
| event.Results, _ = applyCodeOwnershipFiltering(ctx, s.includeOwners, s.excludeOwners, event.Results) | ||
| var err error | ||
| event.Results, err = applyCodeOwnershipFiltering(ctx, clients.DB, &mu, &rules, s.includeOwners, s.excludeOwners, event.Results) |
There was a problem hiding this comment.
I expect this will be a performance bottleneck. We need to make a network request (sometimes multiple) for every repo we return results for. This is serialized by the mutex, but ideally we'd run those in parallel while still maintaining the order of results. This is likely a good job for lib/group's StreamGroup.
Feel free to skip the performance tuning for now though until we get a working version merged.
There was a problem hiding this comment.
Okay this makes sense, so:
- We have a read through
RulesCacheor whatever that holds amapof fetched the ruleset for a repo and async.RWMutex. - We start by getting a read lock and read from the map.
- If the map contains the item we can keep reading the ruleset. We never mutate a ruleset so once it is read we can release the lock.
- If the map has no element, we start by creating a new channel and then enqueue a fetch task into a
StreamGroupmanaged by theRulesCache. We now wait for the channel to receive data before continuing:- At some point the task will run and resolve to a codeowners file. At some layer point the callback will be called
- The callbacks are resolved in-order and from a single go routine so we can use this to write to the map after getting a write lock.
- At the end of the callback we write to the channel to continue the in the
GetFromCacheOrFetchfunction.
How do we avoid that we start fetching ownership information for the same repo multiple times though? We may need an additional map of inflight requests but the problem I see then is that we'd need to "upgrade" from a read lock to a write lock if we have a cache miss which doesn't seem to be a supported pattern. Is there some pattern for this that would simplify my solution?
There was a problem hiding this comment.
How do we avoid that we start fetching ownership information for the same repo multiple times though?
That's a good question. The canonical answer is probably https://pkg.go.dev/golang.org/x/sync/singleflight, and I think this could be added to your read-through cache so requests fetches for the same repository are merged.
Realistically, we should probably not optimize this until needed because in most cases, a query will only return a result for a few repos. We can get fancy later 😄
…ry-codeowners-file-for-ownership-info
camdencheek
left a comment
There was a problem hiding this comment.
Looking good! Remaining comments are non-blocking
| (REPOSCOMPUTEEXCLUDED | ||
| ) | ||
| NoopJob)))))`), | ||
| }, { |
There was a problem hiding this comment.
I didn't find a working way to mock the feature flag value inside the tests. I can't simply create a new FlagSet since its values are not public and mocking the flag store also didn't really work so I decided to remove the test for now and will add it back once we open this up.
I've verified that this feature flag works locally by creating it via GraphQL and overwriting it for my test user though.
There was a problem hiding this comment.
Seems reasonable. FWIW, I think you can mock this out with featureflag.WithContext(), but we should probably just put the resolved features on the SearchInputs struct rather than doing it further down the line. A task for another time 🙂
| (REPOSCOMPUTEEXCLUDED | ||
| ) | ||
| NoopJob)))))`), | ||
| }, { |
There was a problem hiding this comment.
Seems reasonable. FWIW, I think you can mock this out with featureflag.WithContext(), but we should probably just put the resolved features on the SearchInputs struct rather than doing it further down the line. A task for another time 🙂
Closes #39158
Closes #39159
This PR adds support for reading the GitHub style code owner specs from repositories to infer code ownership information automatically. To do that, we add a new dependency: https://github.com/hmarr/codeowners (MIT license so I don't see any legal issues).
Here's a bullet point recap of what changes we can find here:
codeownership.Rulesettype that currently is a proxy to thecodeowners.Ruleset(from the new dependency). This allows us to expose a nice API but also will make it easier to later add other ways to get the code ownership mapping: E.g. we may want to fetch data rules from our postgres database instead..gitlabpath)mapso that we only have to create thecodeowners.Rulesetfor a specific repo once per search. So if one repo returns 100 results, we only have one ruleset.autogoldwhile at it.Test plan
I added a new test that mocks git server to return a specified code owners file. I've also tested it by running
sg start.