Code ownership: Parse GitHub style code owners file to get ownership information by philipp-spiess · Pull Request #39250 · sourcegraph/sourcegraph-public-snapshot

philipp-spiess · 2022-07-21T17:09:52Z

This PR adds support for reading the GitHub style code owner specs from repositories to infer code ownership information automatically. To do that, we add a new dependency: https://github.com/hmarr/codeowners (MIT license so I don't see any legal issues).

Here's a bullet point recap of what changes we can find here:

I've added a new codeownership.Ruleset type that currently is a proxy to the codeowners.Ruleset (from the new dependency). This allows us to expose a nice API but also will make it easier to later add other ways to get the code ownership mapping: E.g. we may want to fetch data rules from our postgres database instead.
We have a new method to create a ruleset based on querying the gitserver for a specific repo at a specific revision. This will download a static number of blobs that are allowed code owner locations (The plugin also supports GitLab style code owners so I've added the .gitlab path)
In the filter job, we create a new map so that we only have to create the codeowners.Ruleset for a specific repo once per search. So if one repo returns 100 results, we only have one ruleset.
When a code ownership mapping was found, we test the result paths against the mapping. If the owners contain the owner we need to include, we keep the result, otherwise we drop it.
I added a new test for the filter job.
I also migrated the filter job test to use autogold while at it.
I added a feature flag to gate out the work.

Test plan

I added a new test that mocks git server to return a specified code owners file. I've also tested it by running sg start.

α sourcegraph (ps/code-ownership-query-codeowners-file-for-ownership-info)✗ go test github.com/sourcegraph/sourcegraph/internal/search/codeownership
ok      github.com/sourcegraph/sourcegraph/internal/search/codeownership        1.448s

…information

sourcegraph-bot · 2022-07-21T17:11:29Z

Codenotify: Notifying subscribers in CODENOTIFY files for diff 4907aa0...52c1dc8.

Notify	File(s)
@beyang	internal/search/codeownership/job.go internal/search/codeownership/job_test.go internal/search/codeownership/rules_cache.go internal/search/codeownership/ruleset.go internal/search/job/jobutil/job.go internal/search/job/jobutil/job_test.go internal/search/types.go
@camdencheek	internal/search/codeownership/job.go internal/search/codeownership/job_test.go internal/search/codeownership/rules_cache.go internal/search/codeownership/ruleset.go internal/search/job/jobutil/job.go internal/search/job/jobutil/job_test.go internal/search/types.go
@keegancsmith	internal/search/codeownership/job.go internal/search/codeownership/job_test.go internal/search/codeownership/rules_cache.go internal/search/codeownership/ruleset.go internal/search/job/jobutil/job.go internal/search/job/jobutil/job_test.go internal/search/types.go

camdencheek · 2022-07-21T17:30:34Z

 	filteredStream := streaming.StreamFunc(func(event streaming.SearchEvent) {
-		event.Results, _ = applyCodeOwnershipFiltering(ctx, s.includeOwners, s.excludeOwners, event.Results)
+		var err error
+		event.Results, err = applyCodeOwnershipFiltering(ctx, clients.DB, &mu, &rules, s.includeOwners, s.excludeOwners, event.Results)


I expect this will be a performance bottleneck. We need to make a network request (sometimes multiple) for every repo we return results for. This is serialized by the mutex, but ideally we'd run those in parallel while still maintaining the order of results. This is likely a good job for lib/group's StreamGroup.

Feel free to skip the performance tuning for now though until we get a working version merged.

Okay this makes sense, so:

We have a read through RulesCache or whatever that holds a map of fetched the ruleset for a repo and a sync.RWMutex.

We start by getting a read lock and read from the map.

If the map contains the item we can keep reading the ruleset. We never mutate a ruleset so once it is read we can release the lock.

If the map has no element, we start by creating a new channel and then enqueue a fetch task into a StreamGroup managed by the RulesCache. We now wait for the channel to receive data before continuing:

At some point the task will run and resolve to a codeowners file. At some layer point the callback will be called

The callbacks are resolved in-order and from a single go routine so we can use this to write to the map after getting a write lock.

At the end of the callback we write to the channel to continue the in the GetFromCacheOrFetch function.

How do we avoid that we start fetching ownership information for the same repo multiple times though? We may need an additional map of inflight requests but the problem I see then is that we'd need to "upgrade" from a read lock to a write lock if we have a cache miss which doesn't seem to be a supported pattern. Is there some pattern for this that would simplify my solution?

How do we avoid that we start fetching ownership information for the same repo multiple times though?

That's a good question. The canonical answer is probably https://pkg.go.dev/golang.org/x/sync/singleflight, and I think this could be added to your read-through cache so requests fetches for the same repository are merged.

Realistically, we should probably not optimize this until needed because in most cases, a query will only return a result for a few repos. We can get fancy later 😄

…ry-codeowners-file-for-ownership-info

camdencheek

Looking good! Remaining comments are non-blocking

philipp-spiess · 2022-07-26T10:33:00Z

-          (REPOSCOMPUTEEXCLUDED
-            )
-          NoopJob)))))`),
-		}, {


I didn't find a working way to mock the feature flag value inside the tests. I can't simply create a new FlagSet since its values are not public and mocking the flag store also didn't really work so I decided to remove the test for now and will add it back once we open this up.

I've verified that this feature flag works locally by creating it via GraphQL and overwriting it for my test user though.

Seems reasonable. FWIW, I think you can mock this out with featureflag.WithContext(), but we should probably just put the resolved features on the SearchInputs struct rather than doing it further down the line. A task for another time 🙂

camdencheek · 2022-07-26T15:36:04Z

-          (REPOSCOMPUTEEXCLUDED
-            )
-          NoopJob)))))`),
-		}, {


Seems reasonable. FWIW, I think you can mock this out with featureflag.WithContext(), but we should probably just put the resolved features on the SearchInputs struct rather than doing it further down the line. A task for another time 🙂

philipp-spiess added 2 commits July 21, 2022 18:47

Code ownership: Parse GitHub style code owners file to get ownership …

abd2169

…information

Use a set to avoid linear search in owners list

30d1616

cla-bot Bot added the cla-signed label Jul 21, 2022

philipp-spiess requested review from a team, camdencheek and rvantonder July 21, 2022 17:10

philipp-spiess self-assigned this Jul 21, 2022

philipp-spiess changed the title ~~Code ownership: Parse GitHub style code owners file to get ownership information~~ Code Ownership: Parse GitHub style code owners file to get ownership information Jul 21, 2022

philipp-spiess changed the title ~~Code Ownership: Parse GitHub style code owners file to get ownership information~~ Code ownership: Parse GitHub style code owners file to get ownership information Jul 21, 2022

camdencheek reviewed Jul 21, 2022

View reviewed changes

philipp-spiess added 5 commits July 22, 2022 14:40

Merge remote-tracking branch 'origin/main' into ps/code-ownership-que…

52f8ef6

…ry-codeowners-file-for-ownership-info

Pass context and gitserver, avoid interim maps

3adfc8a

Use struct as map key

5a55ff9

Add codeowners as direct dependency

ab730eb

Implement a read-through cache for storing rulesets

3995850

camdencheek approved these changes Jul 22, 2022

View reviewed changes

Comment thread internal/search/codeownership/rules_cache.go

Comment thread internal/search/codeownership/ruleset.go Outdated

Comment thread internal/search/codeownership/ruleset.go Outdated

rvantonder approved these changes Jul 22, 2022

View reviewed changes

Comment thread internal/search/codeownership/job.go

Comment thread internal/search/codeownership/job.go Outdated

philipp-spiess added 6 commits July 25, 2022 13:48

Remove unecessary switch statement

b134e4f

Clean up NewRuleset

1880691

Save an empty ruleset in the cache to avoid re-fetching it

cb68cf3

Don't crash if we can't find the first CODEOWNERS location

433ce3d

Add code-ownership feature flag

2600405

Add note about why we don't return a custom owner representation yet

099e5e1

philipp-spiess commented Jul 26, 2022

View reviewed changes

Don't use fmt.Errorf

52c1dc8

camdencheek approved these changes Jul 26, 2022

View reviewed changes

philipp-spiess merged commit d9c71e7 into main Jul 27, 2022

philipp-spiess deleted the ps/code-ownership-query-codeowners-file-for-ownership-info branch July 27, 2022 09:09

Conversation

philipp-spiess commented Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

sourcegraph-bot commented Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

camdencheek Jul 21, 2022

Choose a reason for hiding this comment

Uh oh!

philipp-spiess Jul 22, 2022

Choose a reason for hiding this comment

Uh oh!

camdencheek Jul 22, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

camdencheek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

philipp-spiess Jul 26, 2022

Choose a reason for hiding this comment

Uh oh!

camdencheek Jul 26, 2022

Choose a reason for hiding this comment

Uh oh!

camdencheek Jul 26, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

philipp-spiess commented Jul 21, 2022 •

edited

Loading

sourcegraph-bot commented Jul 21, 2022 •

edited

Loading