Search backend: handle repo:contains predicates during repo resolution#38988
Conversation
fea0819 to
7ae7f84
Compare
842d7b0 to
904ec94
Compare
repo:contains.* during repo resolution
repo:contains.* during repo resolutionrepo:contains predicates during repo resolution
0e715fb to
80184fc
Compare
There was a problem hiding this comment.
Driving change: we move this information to the RepoOptions so this struct contains all the information we need to decide what repos apply to this basic query. RepoOptions is now the sole owner of "what repos should this search cover."
As such, I think an appropriate followup change would be to move isGlobalSearch to a method on RepoOptions. This will help to colocate this logic with the data it depends on, and it will help enforce that "repo stuff goes in RepoOptions".
There was a problem hiding this comment.
The resolver now needs the ability to talk to Zoekt in order to determine which repos are indexed and searcher in order to resolve repo:contains.file() for unindexed searches.
I'm not convinced this is the ideal situation, especially since zoekt can theoretically handle repo:contains() as part of the search query with type:repo. I think this is the best solution for now for two reasons:
- It allows us to do all repo resolution in one place. We'll eventually have to call
PartitionReposanyways, so doing it while we're resolving repos seems appropriate. - We don't have good machinery right now for pushing down optimizations in the job tree, especially ones as complex as "for any repos that are indexed, this backend can handle containment filters."
There was a problem hiding this comment.
This is the meat of the change. One method that filters a page of repos to only those that match the containment predicates.
There was a problem hiding this comment.
this is expected based on test. This test will go away very soon because we will no longer substitute.
There was a problem hiding this comment.
The state of RepoSearchJob is now only RepoOptions.
|
|
||
| foundRevs := Set[repoAndRev]{} | ||
| onMatch := func(res *zoekt.SearchResult) { | ||
| for _, file := range res.Files { |
There was a problem hiding this comment.
wow are you seeing this range keyword isn't syntax highlighted? Also the for and if keywords below? I think your PR broke GH
There was a problem hiding this comment.
Ha, I'll bet GH's syntax highlighting doesn't handle generics
| } | ||
| } | ||
|
|
||
| _, err := searcher.Search( |
There was a problem hiding this comment.
I'm sure there's a good reason, and couldn't easily tell why, but what's stopping us from (re)using backend jobs like TextSearchJob
There was a problem hiding this comment.
We technically could. However, we need to search each repo/rev individually to know whether they each return results, so all the overhead of the goroutines spun up by the job is unnecessary. Also, I just thought it was easier to read 🤷
There was a problem hiding this comment.
gotcha, makes sense! just thought all the unused values in this call looked odd but I see this call is actually used deeper in our stack (past searchFilesInRepos) makes sense to just go for the direct invocation
80184fc to
477ce5f
Compare
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff a754c1a...477ce5f.
|


This updates our repo resolution to handle
repo:contains.file(x),repo:contains.content(y),repo:contains(file:x content:y), andrepohasfile:x.Comments inline, but the summary is:
RepoOptionscontains everything we need to know about what repos to searchrepo_has_file.gohas been moved into the repo resolution stepRepoHasFileContentArgsstruct, which has optional fieldsAND/ORqueries withrepo:contains...Fixes https://github.com/sourcegraph/sourcegraph/issues/23566
Fixes https://github.com/sourcegraph/sourcegraph/issues/20337
Replaces https://github.com/sourcegraph/sourcegraph/pull/37637
Fixes https://github.com/sourcegraph/sourcegraph/issues/26340
Fixes https://github.com/sourcegraph/sourcegraph/issues/19688
I plan to wait to merge this until after the release cut. It's a somewhat large change, and I'd prefer to give it a release cycle in the wild.Release cut has happened. Will merge when approved.Don't believe the lines added. It's mostly a generated mock and associated tests. This actually removes a couple hundred lines of non-generated code.
Test plan
Solid backend integration tests, existing unit tests, and added some heavily-mocked tests that hit the partitioning logic a bit. I've done quite a bit of manual edge-case testing as well.