Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Search: add regex support to repo:has.meta()#63891

Merged
camdencheek merged 22 commits into
mainfrom
cc/metadata-regex
Jul 19, 2024
Merged

Search: add regex support to repo:has.meta()#63891
camdencheek merged 22 commits into
mainfrom
cc/metadata-regex

Conversation

@camdencheek

@camdencheek camdencheek commented Jul 17, 2024

Copy link
Copy Markdown
Member

This adds support to searching for repo metadata with a regex pattern.

Background: repo metadata is a useful feature for shoehorning business-specific information into the search query language. It allows tagging repos with arbitrary metadata (think ownership info, quality info, 3rd-party system IDs, etc.). This ends up being a useful escape hatch to shim in functionality that is not natively supported in Sourcegraph.

However it's currently limited to searching with an exact key/value pair. We've had a few requests to extend this to allow searching by pattern because it enables ingesting semi-structured data and making it searchable.

This adds the ability to use a /.../-delimited regex pattern to match against both keys and values. For example, repo:has.meta(team:/^my\/org/)

Fixes SRCH-731

Test plan

  • Added tests for parsing in the backend
  • Added tests for parsing and decoration in the frontend
  • Added tests for the database repo listing layer
  • Added a simple integration test
  • Manually tested that the new database index is usable, and provides good performance at dotcom scales
  • Visual testing that the syntax highlighting looks good and establishes confidence.
    screenshot-2024-07-18_15-24-13@2x

Changelog

  • repo:has.meta() predicate now supports regex patterns for keys and values

Comment on lines 13 to 14

@camdencheek camdencheek Jul 18, 2024

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a new GIN index on repo KVPs. This makes regex lookups an efficient operation. Tested against sourcegraph.com (3 million kvps for testing). Reduced a point query from 1.5 seconds (surprisingly fast full table scan) to 100ms (using the GIN index).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long did adding the index take? Any concerns for this running in migrator for larger customers?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't measure closely, but it was ~30s. That's for 3 million with one kvp each ("camdentest":<repo_name>), so long posting lists for the key and many posting lists for the value. I think that's fast enough that we don't need to be too worried

Comment thread internal/types/types.go Outdated
Comment on lines 2186 to 2191

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but I wanted some way to type a regex string. Too many times I've gotten confused about whether a pattern is a string literal or a regex string.

Comment on lines 10 to 30

@camdencheek camdencheek Jul 18, 2024

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated cleanup because every time I touch this code I get confused. This just flattens the access tree so a predicate is just a named method on a filter type.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary after flattening the access tree.

@camdencheek camdencheek changed the title WIP Search: add regex support to repo:has.meta() Search: add regex support to repo:has.meta() Jul 18, 2024
@camdencheek camdencheek marked this pull request as ready for review July 18, 2024 21:38
@camdencheek camdencheek requested review from a team July 18, 2024 21:38

@keegancsmith keegancsmith left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backend code LGTM. nice stuff.

Comment on lines 13 to 14

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long did adding the index take? Any concerns for this running in migrator for larger customers?

@camdencheek camdencheek merged commit 033cb9d into main Jul 19, 2024
@camdencheek camdencheek deleted the cc/metadata-regex branch July 19, 2024 17:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants