Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Search backend: handle repo:contains predicates during repo resolution#38988

Merged
camdencheek merged 3 commits into
mainfrom
backend-integration/cc/resolve-has-file-content
Jul 21, 2022
Merged

Search backend: handle repo:contains predicates during repo resolution#38988
camdencheek merged 3 commits into
mainfrom
backend-integration/cc/resolve-has-file-content

Conversation

@camdencheek

@camdencheek camdencheek commented Jul 18, 2022

Copy link
Copy Markdown
Member

This updates our repo resolution to handle repo:contains.file(x), repo:contains.content(y), repo:contains(file:x content:y), and repohasfile:x.

Comments inline, but the summary is:

  • RepoOptions contains everything we need to know about what repos to search
  • The logic in repo_has_file.go has been moved into the repo resolution step
  • Every combination of file and content filters is represented by the RepoHasFileContentArgs struct, which has optional fields
  • This works efficiently, and we can now reasonably do AND/OR queries with repo:contains...

Fixes https://github.com/sourcegraph/sourcegraph/issues/23566
Fixes https://github.com/sourcegraph/sourcegraph/issues/20337
Replaces https://github.com/sourcegraph/sourcegraph/pull/37637
Fixes https://github.com/sourcegraph/sourcegraph/issues/26340
Fixes https://github.com/sourcegraph/sourcegraph/issues/19688

I plan to wait to merge this until after the release cut. It's a somewhat large change, and I'd prefer to give it a release cycle in the wild. Release cut has happened. Will merge when approved.

Don't believe the lines added. It's mostly a generated mock and associated tests. This actually removes a couple hundred lines of non-generated code.

Test plan

Solid backend integration tests, existing unit tests, and added some heavily-mocked tests that hit the partitioning logic a bit. I've done quite a bit of manual edge-case testing as well.

@cla-bot cla-bot Bot added the cla-signed label Jul 18, 2022
@camdencheek camdencheek force-pushed the backend-integration/cc/resolve-has-file-content branch 2 times, most recently from fea0819 to 7ae7f84 Compare July 18, 2022 22:49
@camdencheek camdencheek force-pushed the backend-integration/cc/resolve-has-file-content branch 2 times, most recently from 842d7b0 to 904ec94 Compare July 18, 2022 23:15
@camdencheek camdencheek changed the base branch from main to cc/move-private-repos July 18, 2022 23:15
@camdencheek camdencheek changed the title Backend integration/cc/resolve has file content Search backend: handle repo:contains.* during repo resolution Jul 18, 2022
@camdencheek camdencheek changed the title Search backend: handle repo:contains.* during repo resolution Search backend: handle repo:contains predicates during repo resolution Jul 18, 2022
Base automatically changed from cc/move-private-repos to main July 19, 2022 13:21
@camdencheek camdencheek force-pushed the backend-integration/cc/resolve-has-file-content branch 3 times, most recently from 0e715fb to 80184fc Compare July 19, 2022 21:25
@camdencheek camdencheek marked this pull request as ready for review July 20, 2022 13:59
Comment thread internal/search/types.go Outdated
Comment on lines 313 to 315

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driving change: we move this information to the RepoOptions so this struct contains all the information we need to decide what repos apply to this basic query. RepoOptions is now the sole owner of "what repos should this search cover."

As such, I think an appropriate followup change would be to move isGlobalSearch to a method on RepoOptions. This will help to colocate this logic with the data it depends on, and it will help enforce that "repo stuff goes in RepoOptions".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! great.

Comment thread internal/search/repos/repos.go Outdated
Comment on lines 59 to 67

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolver now needs the ability to talk to Zoekt in order to determine which repos are indexed and searcher in order to resolve repo:contains.file() for unindexed searches.

I'm not convinced this is the ideal situation, especially since zoekt can theoretically handle repo:contains() as part of the search query with type:repo. I think this is the best solution for now for two reasons:

  1. It allows us to do all repo resolution in one place. We'll eventually have to call PartitionRepos anyways, so doing it while we're resolving repos seems appropriate.
  2. We don't have good machinery right now for pushing down optimizations in the job tree, especially ones as complex as "for any repos that are indexed, this backend can handle containment filters."

Comment thread internal/search/repos/repos.go Outdated

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the meat of the change. One method that filters a page of repos to only those that match the containment predicates.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 112 to 116

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is expected based on test. This test will go away very soon because we will no longer substitute.

Comment thread internal/search/run/repository.go Outdated
Comment on lines 17 to 19

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state of RepoSearchJob is now only RepoOptions.

@rvantonder rvantonder left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to nominate a new PR of the week. Or just give pre-assign it this to next week :P

Comment thread internal/search/repos/repos.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


foundRevs := Set[repoAndRev]{}
onMatch := func(res *zoekt.SearchResult) {
for _, file := range res.Files {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow are you seeing this range keyword isn't syntax highlighted? Also the for and if keywords below? I think your PR broke GH

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, I'll bet GH's syntax highlighting doesn't handle generics

Comment thread internal/search/types.go Outdated
Comment on lines 313 to 315

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! great.

}
}

_, err := searcher.Search(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there's a good reason, and couldn't easily tell why, but what's stopping us from (re)using backend jobs like TextSearchJob

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We technically could. However, we need to search each repo/rev individually to know whether they each return results, so all the overhead of the goroutines spun up by the job is unnecessary. Also, I just thought it was easier to read 🤷

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, makes sense! just thought all the unused values in this call looked odd but I see this call is actually used deeper in our stack (past searchFilesInRepos) makes sense to just go for the direct invocation

@camdencheek camdencheek force-pushed the backend-integration/cc/resolve-has-file-content branch from 80184fc to 477ce5f Compare July 21, 2022 16:42
@camdencheek camdencheek enabled auto-merge (squash) July 21, 2022 16:43
@sourcegraph-bot

Copy link
Copy Markdown
Contributor

Codenotify: Notifying subscribers in CODENOTIFY files for diff a754c1a...477ce5f.

Notify File(s)
@beyang cmd/searcher/internal/search/zoekt_search.go
internal/search/alert/observer.go
internal/search/commit/commit.go
internal/search/job/jobutil/alert.go
internal/search/job/jobutil/job.go
internal/search/job/jobutil/job_test.go
internal/search/job/jobutil/repo_pager_job.go
internal/search/predicate/substitute_test.go
internal/search/query/predicate.go
internal/search/query/types.go
internal/search/repos/mocks_test.go
internal/search/repos/repos.go
internal/search/repos/repos_test.go
internal/search/run/repo_has_file.go
internal/search/run/repository.go
internal/search/run/repository_test.go
internal/search/searcher/search.go
internal/search/searcher/search_test.go
internal/search/structural/structural.go
internal/search/types.go
internal/search/zoekt/indexed_search.go
internal/search/zoekt/query.go
internal/search/zoekt/query_test.go
@keegancsmith cmd/searcher/internal/search/zoekt_search.go
internal/search/alert/observer.go
internal/search/commit/commit.go
internal/search/job/jobutil/alert.go
internal/search/job/jobutil/job.go
internal/search/job/jobutil/job_test.go
internal/search/job/jobutil/repo_pager_job.go
internal/search/predicate/substitute_test.go
internal/search/query/predicate.go
internal/search/query/types.go
internal/search/repos/mocks_test.go
internal/search/repos/repos.go
internal/search/repos/repos_test.go
internal/search/run/repo_has_file.go
internal/search/run/repository.go
internal/search/run/repository_test.go
internal/search/searcher/search.go
internal/search/searcher/search_test.go
internal/search/structural/structural.go
internal/search/types.go
internal/search/zoekt/indexed_search.go
internal/search/zoekt/query.go
internal/search/zoekt/query_test.go
@rvantonder internal/search/query/predicate.go
internal/search/query/types.go

@camdencheek camdencheek merged commit 9a0f825 into main Jul 21, 2022
@camdencheek camdencheek deleted the backend-integration/cc/resolve-has-file-content branch July 21, 2022 16:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

3 participants