(feat) add repo match search results aggregation by repo metadata by erzhtor · Pull Request #51248 · sourcegraph/sourcegraph-public-snapshot

erzhtor · 2023-04-28T09:46:47Z

Part of https://github.com/sourcegraph/pr-faqs/issues/96.

This PR introduces new aggregation mode over repository metadata. It works for select:repo and shows number of repositories for a given key-value pair metadata.

Test plan

sg start
Set repository-metadata feature flag to true
Use search with additional select:repo and check that aggregation panel had new "Repo metadata" option

Screenshots

Screen.Recording.2023-05-12.at.14.41.27.mov

sourcegraph-buildkite · 2023-04-28T09:53:55Z

Bundle size report 📦

Initial size	Total size	Async size	Modules
0.00% (+0.06 kb)	0.00% (+0.56 kb)	0.00% (+0.49 kb)	0.00% (0)

Look at the Statoscope report for a full comparison between the commits 789d428 and 8af39fc or learn more.

Open explanation

Initial size is the size of the initial bundle (the one that is loaded when you open the page)
Total size is the size of the initial bundle + all the async loaded chunks
Async size is the size of all the async loaded chunks
Modules is the number of modules in the initial bundle

toolmantim · 2023-05-03T05:09:41Z

I think we might need to aggregate on value, and let them choose a key, to help them with the more common job to be done. e.g. What are the repo licenses across the company for this search? (e.g. choosing key license) Or, what is the department breakdown for these matching repositories (e.g. choosing key department)

I've designed up an idea for how that could work in https://github.com/sourcegraph/sourcegraph/pull/51392

erzhtor · 2023-05-11T16:35:11Z

I think we might need to aggregate on value, and let them choose a key, to help them with the more common job to be done. e.g. What are the repo licenses across the company for this search? (e.g. choosing key license) Or, what is the department breakdown for these matching repositories (e.g. choosing key department)

I've designed up an idea for how that could work in #51392

Thanks @toolmantim for the great feedback and preparing storybook example. During the last sync call with Ryan, we discussed to finilize this PR and merge it without repo meta picking UI, as it will take more effort to implement right now. The current architecture of aggregation panel doesn't support extra options/config, which will require lots of code and architecture changes.

However, we put this repo meta key picking on the list to address it if we have time after we finish rest of the issues/requirements.

sourcegraph-bot · 2023-05-12T08:49:02Z

Codenotify: Notifying subscribers in CODENOTIFY files for diff bad2254...789d428.

Notify	File(s)
@fkling	client/web/src/search/results/components/aggregation/components/aggregation-mode-controls/AggregationModeControls.tsx client/web/src/search/results/components/aggregation/hooks.ts
@limitedmage	client/web/src/search/results/components/aggregation/components/aggregation-mode-controls/AggregationModeControls.tsx client/web/src/search/results/components/aggregation/hooks.ts
@sourcegraph/code-insights-backend	enterprise/internal/insights/aggregation/BUILD.bazel enterprise/internal/insights/aggregation/aggregation.go enterprise/internal/insights/aggregation/aggregation_test.go enterprise/internal/insights/query/querybuilder/BUILD.bazel enterprise/internal/insights/query/querybuilder/builder.go enterprise/internal/insights/query/querybuilder/builder_test.go enterprise/internal/insights/types/types.go

sourcegraph-bot · 2023-05-12T08:55:31Z

📖 Storybook live preview

sashaostrikov

Nice! Left 1 minor comment about naming

toolmantim

Overall it's looking good.

I just found one issue, which is I believe type:repo doesn't work but should be supported (same as we do for the other groups, i.e. type:file)

chwarwick

I understand why you did it, but I'd like to avoid leaking the aggregation mode into result counting logic, it opens the possibility to do that for many more reasons and I'd would like to see if we can compose solutions on top of the existing match types instead.

I'm also concerned about performance on instances with hundreds of thousands of repos and on dotcom. Is there a plan in place to ensure that performance is acceptable?

chwarwick · 2023-05-16T12:20:50Z

+		repoIDs.Add(r.RepoName().ID)
+	}
+
+	res, err := r.db.Repos().List(r.ctx, database.ReposListOptions{IDs: repoIDs.Values()})


I am concerned about the performance of doing this on dotcom. A batch of matches can contain thousands of results and I know querying the repo table is already an issue.

But these thousand repos are read at once for each batch. Or is your concern memory usage related?

I'm not worried so much about memory usage more so about long running queries on the repo table. This would issue a single query with a potentially thousands of repo ids. We've done similar things in insights for filtering repos out of an insight and had performance issues that came from passing a long list of parameters. I don't know this repo query well enough to know how it performs in that situation about you should make sure it handles it will.

Is the need for this DB call to load the metadata keys / values?

I thought yes, but I just looked and I see the metadata in the search result event stream, so it shouldn't be necessary.

I'm seeing matches comeback as

[ { "type": "repo", "repositoryID": 399, "repository": "github.com/sourcegraph/sourcegraph", "repoStars": 7857, "repoLastFetched": "2023-05-16T15:41:12.441465Z", "description": "Code Intelligence Platform", "metadata": { "code-intel": null, "code-search": null, "license": "multiple", "open-source": "", "team": "sourcegraph" } } ]

if that's the case, I don't fully understand why this lookup is happening. Could we just use the search results directly here @erzhtor?

The metadata is populated lazily right at the search API boundary, so it's not on the search results that are seen by the aggregation code

@chwarwick, can you please point out where did you check the stream? I'm getting following when checking here (where the aggregation is happening):

{ "Results": [ { "Name": "github.com/sourcegraph/sourcegraph", "ID": 7, "Rev": "", "DescriptionMatches": null, "RepoNameMatches": [{ "start": [0, 0, 0], "end": [34, 0, 34] }] } ], "Stats": { "IsLimitHit": false, "Repos": null, "Status": {}, "BackendsMissing": 0, "ExcludedForks": 0, "ExcludedArchived": 0 } }

I see, you've mentioned search result stream. I think that part similarly does additional batch db query for repo metadata and appends to the eventMatch, https://sourcegraph.com/github.com/sourcegraph/sourcegraph@f3b65acc3e189c7b54586e7bd1d7eaa13dc77104/-/blob/cmd/frontend/internal/search/search.go?L697-713 (pointed out by @camdencheek). So I used the same approach here for aggregation event stream.

chwarwick · 2023-05-16T14:38:33Z

 }

-func NewSearchResultsAggregatorWithContext(ctx context.Context, tabulator AggregationTabulator, countFunc AggregationCountFunc, db database.DB) SearchResultsAggregator {
+func NewSearchResultsAggregatorWithContext(ctx context.Context, tabulator AggregationTabulator, countFunc AggregationCountFunc, db database.DB, mode types.SearchAggregationMode) SearchResultsAggregator {


I would like to avoid the AggregationMode from leaking into this aggregator. My initial goal here was that over time we would be able to reuse this for Code Insights that are persisted on the dashboard as it was previously a goal to be able to persist these aggregations to an Insights dashboard.

I think an alternative approach that I think is worth trying is to use the repoCount function and then swap out the Aggregator when you are in metadata mode. My understanding is that since it's select:repo the result is going to be a long list of repoID with a 1.

I think you could compose your own aggregator on top of the existing limitedAggregator that builds up the repo slice and then converts it to metadata, maybe flushing the list every N Add calls or when the user requests it to be sorted.

type LimitedAggregator interface { Add(label string, count int32) SortAggregate() []*Aggregate OtherCounts() OtherCount }

Thank you for the suggestion, will take a look!

chwarwick

After talking though the other options I'm good with these changes. If Insights revisits being able to convert an aggregation into a a persisted Insight on a dashboard this will require some additional work to support but that isn't currently planned.

…function

add repo metadata default aggreation mode check

feat: handle repometa filter url for cases when value is present

Co-authored-by: Alex Ostrikov <alex.ostrikov@sourcegraph.com>

…po metadata to use "repository searches" instead of "repository match searches"

…1248) Co-authored-by: Alex Ostrikov <alex.ostrikov@sourcegraph.com>

erzhtor requested a review from camdencheek April 28, 2023 09:46

erzhtor self-assigned this Apr 28, 2023

cla-bot Bot added the cla-signed label Apr 28, 2023

erzhtor force-pushed the erzhtor/add-results-aggregation-by-repo-metadata branch from 6f21764 to 5674f5d Compare May 2, 2023 12:26

toolmantim mentioned this pull request May 3, 2023

[Draft] Design for result grouping header w/ metadata #51392

Closed

erzhtor force-pushed the erzhtor/add-results-aggregation-by-repo-metadata branch from 5674f5d to 8c25b35 Compare May 11, 2023 09:16

erzhtor force-pushed the erzhtor/add-results-aggregation-by-repo-metadata branch from 8c25b35 to a5536d1 Compare May 12, 2023 08:24

erzhtor changed the title ~~(draft) (feat) add repo match search results aggregation by repo metadata~~ (feat) add repo match search results aggregation by repo metadata May 12, 2023

erzhtor requested review from a team May 12, 2023 08:46

erzhtor marked this pull request as ready for review May 12, 2023 08:47

sashaostrikov approved these changes May 12, 2023

View reviewed changes

Comment thread enterprise/internal/insights/aggregation/aggregation.go Outdated

erzhtor requested review from ryphil and toolmantim May 12, 2023 09:31

toolmantim reviewed May 16, 2023

View reviewed changes

Comment thread enterprise/cmd/frontend/internal/insights/resolvers/aggregates_resolvers.go Outdated

chwarwick suggested changes May 16, 2023

View reviewed changes

erzhtor requested a review from chwarwick May 22, 2023 14:19

chwarwick approved these changes May 23, 2023

View reviewed changes

erzhtor added 6 commits May 23, 2023 19:51

(feat) add repo match search results aggregation by repo metadata

816d177

refactor(aggregation.go): use collections.NewSet to simplify repoIDs …

c0c285c

…function

refactor aggregationFunc to accept repo as arg

4359a6a

add repo metadata default aggreation mode check

feat: use key:value pair when value is not null or empty

1119b69

feat: handle repometa filter url for cases when value is present

fixup! feat: use key:value pair when value is not null or empty

a580abf

fix: run bazel configure

9fe7b9d

erzhtor and others added 3 commits May 23, 2023 19:52

Update enterprise/internal/insights/aggregation/aggregation.go

137c278

Co-authored-by: Alex Ostrikov <alex.ostrikov@sourcegraph.com>

fix: hide new aggregation mode if feature flag is disabled

a208ea2

fix(aggregates_resolvers.go): update error message for grouping by re…

dce03b3

…po metadata to use "repository searches" instead of "repository match searches"

erzhtor force-pushed the erzhtor/add-results-aggregation-by-repo-metadata branch from 6023820 to 486c3d9 Compare May 23, 2023 18:04

update AddRepoMetadataFilter func

789d428

erzhtor force-pushed the erzhtor/add-results-aggregation-by-repo-metadata branch from 486c3d9 to 789d428 Compare May 24, 2023 06:11

erzhtor merged commit b76236c into main May 24, 2023

erzhtor deleted the erzhtor/add-results-aggregation-by-repo-metadata branch May 24, 2023 07:10

ErikaRS pushed a commit that referenced this pull request Jun 22, 2023

(feat) add repo match search results aggregation by repo metadata (#5…

b2140d7

…1248) Co-authored-by: Alex Ostrikov <alex.ostrikov@sourcegraph.com>

Conversation

erzhtor commented Apr 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Screenshots

Uh oh!

sourcegraph-buildkite commented Apr 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bundle size report 📦

Uh oh!

toolmantim commented May 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erzhtor commented May 11, 2023

Uh oh!

sourcegraph-bot commented May 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sourcegraph-bot commented May 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sashaostrikov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

toolmantim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chwarwick left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chwarwick left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

erzhtor commented Apr 28, 2023 •

edited

Loading

sourcegraph-buildkite commented Apr 28, 2023 •

edited

Loading

toolmantim commented May 3, 2023 •

edited

Loading

sourcegraph-bot commented May 12, 2023 •

edited

Loading

sourcegraph-bot commented May 12, 2023 •

edited

Loading