Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Search: do not interpret content: as regex if it is quoted#57679

Merged
camdencheek merged 2 commits into
mainfrom
cc/no-insights-mangling
Oct 17, 2023
Merged

Search: do not interpret content: as regex if it is quoted#57679
camdencheek merged 2 commits into
mainfrom
cc/no-insights-mangling

Conversation

@camdencheek

@camdencheek camdencheek commented Oct 17, 2023

Copy link
Copy Markdown
Member

We have some logic that annotates a content: node as regex if patterntype:regex is set. This allows us to do things like content:^a:b, since escaping colons used to be a significant part of how content: was used.

However, another significant way content: is used is to just search for string literals that would otherwise be interpreted as search syntax. Like content:"a and b". It's a pretty solid heuristic to not treat a quoted string as regex, especially since we now have /.../-delimited regex literals, which makes using content: for regex patterns largely unnecessary.

This was causing an issue where a parsing roundtrip for code insights would mangle a query like content:"TEST" patterntype:regex to content:/"TEST"/, which is obviously not ideal.

Fixes #57323

Test plan

Added a unit test.

@cla-bot cla-bot Bot added the cla-signed label Oct 17, 2023
@camdencheek camdencheek marked this pull request as ready for review October 17, 2023 18:17
@camdencheek camdencheek requested a review from a team October 17, 2023 18:18
@sourcegraph-bot

sourcegraph-bot commented Oct 17, 2023

Copy link
Copy Markdown
Contributor

Codenotify: Notifying subscribers in CODENOTIFY files for diff 4a29682...4844000.

Notify File(s)
@jtibshirani internal/search/query/transformer.go
@keegancsmith internal/search/query/transformer.go
@sourcegraph/code-insights-backend internal/insights/query/querybuilder/builder_test.go

@camdencheek camdencheek requested a review from a team October 17, 2023 18:28

@fkling fkling left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably add a changelog entry for that.

@camdencheek camdencheek enabled auto-merge (squash) October 17, 2023 18:33
@camdencheek camdencheek merged commit b621b8b into main Oct 17, 2023
@camdencheek camdencheek deleted the cc/no-insights-mangling branch October 17, 2023 18:40
sourcegraph-release-bot pushed a commit that referenced this pull request Dec 12, 2023
We have some logic that annotates a `content:` node as regex if `patterntype:regex` is set. This allows us to do things like `content:^a:b`, since escaping colons used to be a significant part of how `content:` was used.

However, another significant way `content:` is used is to just search for string literals that would otherwise be interpreted as search syntax. Like `content:"a and b"`. It's a pretty solid heuristic to not treat a quoted string as regex, especially since we now have `/.../`-delimited regex literals, which makes using `content:` for regex patterns largely unnecessary.

This was causing an issue where a parsing roundtrip for code insights would mangle a query like `content:"TEST" patterntype:regex` to `content:/"TEST"/`, which is obviously not ideal.

(cherry picked from commit b621b8b)
camdencheek added a commit that referenced this pull request Dec 13, 2023
…quoted (#58904)

Search: do not interpret `content:` as regex if it is quoted (#57679)

We have some logic that annotates a `content:` node as regex if `patterntype:regex` is set. This allows us to do things like `content:^a:b`, since escaping colons used to be a significant part of how `content:` was used.

However, another significant way `content:` is used is to just search for string literals that would otherwise be interpreted as search syntax. Like `content:"a and b"`. It's a pretty solid heuristic to not treat a quoted string as regex, especially since we now have `/.../`-delimited regex literals, which makes using `content:` for regex patterns largely unnecessary.

This was causing an issue where a parsing roundtrip for code insights would mangle a query like `content:"TEST" patterntype:regex` to `content:/"TEST"/`, which is obviously not ideal.

(cherry picked from commit b621b8b)

Co-authored-by: Camden Cheek <camden@ccheek.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clicking an Insight data point mangles query content param

3 participants