Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Code ownership: Parse GitHub style code owners file to get ownership information#39250

Merged
philipp-spiess merged 14 commits into
mainfrom
ps/code-ownership-query-codeowners-file-for-ownership-info
Jul 27, 2022
Merged

Code ownership: Parse GitHub style code owners file to get ownership information#39250
philipp-spiess merged 14 commits into
mainfrom
ps/code-ownership-query-codeowners-file-for-ownership-info

Conversation

@philipp-spiess

@philipp-spiess philipp-spiess commented Jul 21, 2022

Copy link
Copy Markdown
Contributor

Closes #39158
Closes #39159

This PR adds support for reading the GitHub style code owner specs from repositories to infer code ownership information automatically. To do that, we add a new dependency: https://github.com/hmarr/codeowners (MIT license so I don't see any legal issues).

Here's a bullet point recap of what changes we can find here:

  • I've added a new codeownership.Ruleset type that currently is a proxy to the codeowners.Ruleset (from the new dependency). This allows us to expose a nice API but also will make it easier to later add other ways to get the code ownership mapping: E.g. we may want to fetch data rules from our postgres database instead.
  • We have a new method to create a ruleset based on querying the gitserver for a specific repo at a specific revision. This will download a static number of blobs that are allowed code owner locations (The plugin also supports GitLab style code owners so I've added the .gitlab path)
  • In the filter job, we create a new map so that we only have to create the codeowners.Ruleset for a specific repo once per search. So if one repo returns 100 results, we only have one ruleset.
  • When a code ownership mapping was found, we test the result paths against the mapping. If the owners contain the owner we need to include, we keep the result, otherwise we drop it.
  • I added a new test for the filter job.
  • I also migrated the filter job test to use autogold while at it.
  • I added a feature flag to gate out the work.

Test plan

I added a new test that mocks git server to return a specified code owners file. I've also tested it by running sg start.

α sourcegraph (ps/code-ownership-query-codeowners-file-for-ownership-info)✗ go test github.com/sourcegraph/sourcegraph/internal/search/codeownership
ok      github.com/sourcegraph/sourcegraph/internal/search/codeownership        1.448s

Screenshot 2022-07-21 at 19 01 17
Screenshot 2022-07-21 at 19 01 09

@cla-bot cla-bot Bot added the cla-signed label Jul 21, 2022
@philipp-spiess philipp-spiess requested review from a team, camdencheek and rvantonder July 21, 2022 17:10
@philipp-spiess philipp-spiess self-assigned this Jul 21, 2022
@philipp-spiess philipp-spiess changed the title Code ownership: Parse GitHub style code owners file to get ownership information Code Ownership: Parse GitHub style code owners file to get ownership information Jul 21, 2022
@philipp-spiess philipp-spiess changed the title Code Ownership: Parse GitHub style code owners file to get ownership information Code ownership: Parse GitHub style code owners file to get ownership information Jul 21, 2022
@sourcegraph-bot

sourcegraph-bot commented Jul 21, 2022

Copy link
Copy Markdown
Contributor

Codenotify: Notifying subscribers in CODENOTIFY files for diff 4907aa0...52c1dc8.

Notify File(s)
@beyang internal/search/codeownership/job.go
internal/search/codeownership/job_test.go
internal/search/codeownership/rules_cache.go
internal/search/codeownership/ruleset.go
internal/search/job/jobutil/job.go
internal/search/job/jobutil/job_test.go
internal/search/types.go
@camdencheek internal/search/codeownership/job.go
internal/search/codeownership/job_test.go
internal/search/codeownership/rules_cache.go
internal/search/codeownership/ruleset.go
internal/search/job/jobutil/job.go
internal/search/job/jobutil/job_test.go
internal/search/types.go
@keegancsmith internal/search/codeownership/job.go
internal/search/codeownership/job_test.go
internal/search/codeownership/rules_cache.go
internal/search/codeownership/ruleset.go
internal/search/job/jobutil/job.go
internal/search/job/jobutil/job_test.go
internal/search/types.go

Comment thread internal/search/codeownership/ruleset.go Outdated
Comment thread internal/search/codeownership/ruleset.go Outdated
Comment thread internal/search/codeownership/job.go Outdated
Comment thread internal/search/codeownership/job.go Outdated
Comment thread internal/search/codeownership/job.go Outdated
filteredStream := streaming.StreamFunc(func(event streaming.SearchEvent) {
event.Results, _ = applyCodeOwnershipFiltering(ctx, s.includeOwners, s.excludeOwners, event.Results)
var err error
event.Results, err = applyCodeOwnershipFiltering(ctx, clients.DB, &mu, &rules, s.includeOwners, s.excludeOwners, event.Results)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this will be a performance bottleneck. We need to make a network request (sometimes multiple) for every repo we return results for. This is serialized by the mutex, but ideally we'd run those in parallel while still maintaining the order of results. This is likely a good job for lib/group's StreamGroup.

Feel free to skip the performance tuning for now though until we get a working version merged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay this makes sense, so:

  • We have a read through RulesCache or whatever that holds a map of fetched the ruleset for a repo and a sync.RWMutex.
  • We start by getting a read lock and read from the map.
  • If the map contains the item we can keep reading the ruleset. We never mutate a ruleset so once it is read we can release the lock.
  • If the map has no element, we start by creating a new channel and then enqueue a fetch task into a StreamGroup managed by the RulesCache. We now wait for the channel to receive data before continuing:
    • At some point the task will run and resolve to a codeowners file. At some layer point the callback will be called
    • The callbacks are resolved in-order and from a single go routine so we can use this to write to the map after getting a write lock.
    • At the end of the callback we write to the channel to continue the in the GetFromCacheOrFetch function.

How do we avoid that we start fetching ownership information for the same repo multiple times though? We may need an additional map of inflight requests but the problem I see then is that we'd need to "upgrade" from a read lock to a write lock if we have a cache miss which doesn't seem to be a supported pattern. Is there some pattern for this that would simplify my solution?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we avoid that we start fetching ownership information for the same repo multiple times though?

That's a good question. The canonical answer is probably https://pkg.go.dev/golang.org/x/sync/singleflight, and I think this could be added to your read-through cache so requests fetches for the same repository are merged.

Realistically, we should probably not optimize this until needed because in most cases, a query will only return a result for a few repos. We can get fancy later 😄

Comment thread internal/search/codeownership/job.go Outdated
Comment thread internal/search/codeownership/ruleset.go Outdated

@camdencheek camdencheek left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Remaining comments are non-blocking

Comment thread internal/search/codeownership/rules_cache.go
Comment thread internal/search/codeownership/ruleset.go Outdated
Comment thread internal/search/codeownership/ruleset.go Outdated
Comment thread internal/search/codeownership/job.go
Comment thread internal/search/codeownership/job.go Outdated
(REPOSCOMPUTEEXCLUDED
)
NoopJob)))))`),
}, {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a working way to mock the feature flag value inside the tests. I can't simply create a new FlagSet since its values are not public and mocking the flag store also didn't really work so I decided to remove the test for now and will add it back once we open this up.

I've verified that this feature flag works locally by creating it via GraphQL and overwriting it for my test user though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. FWIW, I think you can mock this out with featureflag.WithContext(), but we should probably just put the resolved features on the SearchInputs struct rather than doing it further down the line. A task for another time 🙂

(REPOSCOMPUTEEXCLUDED
)
NoopJob)))))`),
}, {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. FWIW, I think you can mock this out with featureflag.WithContext(), but we should probably just put the resolved features on the SearchInputs struct rather than doing it further down the line. A task for another time 🙂

@philipp-spiess philipp-spiess merged commit d9c71e7 into main Jul 27, 2022
@philipp-spiess philipp-spiess deleted the ps/code-ownership-query-codeowners-file-for-ownership-info branch July 27, 2022 09:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Code Ownership: Show code ownership information in the blob view Code Ownership: Filter code ownership based on CODEOWNERS in git

5 participants