Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

[Backport 5.2] Search: do not interpret content: as regex if it is quoted#58904

Merged
camdencheek merged 1 commit into
5.2from
backport-57679-to-5.2
Dec 13, 2023
Merged

[Backport 5.2] Search: do not interpret content: as regex if it is quoted#58904
camdencheek merged 1 commit into
5.2from
backport-57679-to-5.2

Conversation

@sourcegraph-release-bot

Copy link
Copy Markdown
Collaborator

We have some logic that annotates a content: node as regex if patterntype:regex is set. This allows us to do things like content:^a:b, since escaping colons used to be a significant part of how content: was used.

However, another significant way content: is used is to just search for string literals that would otherwise be interpreted as search syntax. Like content:"a and b". It's a pretty solid heuristic to not treat a quoted string as regex, especially since we now have /.../-delimited regex literals, which makes using content: for regex patterns largely unnecessary.

This was causing an issue where a parsing roundtrip for code insights would mangle a query like content:"TEST" patterntype:regex to content:/"TEST"/, which is obviously not ideal.

Fixes #57323

Test plan

Added a unit test.


Backport b621b8b from #57679

We have some logic that annotates a `content:` node as regex if `patterntype:regex` is set. This allows us to do things like `content:^a:b`, since escaping colons used to be a significant part of how `content:` was used.

However, another significant way `content:` is used is to just search for string literals that would otherwise be interpreted as search syntax. Like `content:"a and b"`. It's a pretty solid heuristic to not treat a quoted string as regex, especially since we now have `/.../`-delimited regex literals, which makes using `content:` for regex patterns largely unnecessary.

This was causing an issue where a parsing roundtrip for code insights would mangle a query like `content:"TEST" patterntype:regex` to `content:/"TEST"/`, which is obviously not ideal.

(cherry picked from commit b621b8b)
@sourcegraph-bot

Copy link
Copy Markdown
Contributor

Codenotify: Notifying subscribers in CODENOTIFY files for diff 021f43a...d13ea21.

Notify File(s)
@jtibshirani internal/search/query/transformer.go
@keegancsmith internal/search/query/transformer.go
@sourcegraph/code-insights-backend internal/insights/query/querybuilder/builder_test.go

@sourcegraph-bot

Copy link
Copy Markdown
Contributor

📖 Storybook live preview

@keegancsmith

Copy link
Copy Markdown
Member

I realise this is a backport, but I'm unsure if this makes sense. By default we search both filenames and content, so content is a way to limit it to just content. Additionally changing how the search language works to fix a bug in how code insights works gives me pause.

FYI We are currently rethinking some of this in search platform. In particular one of our ideas is getting rid of type: and patterntype:. If you want to just search files, do content: to be more specific. If you want to do a regex, enclose your string with /.../, etc. This isn't fully fleshed out (I am currently prototyping it to play around with the ideas).

@camdencheek

Copy link
Copy Markdown
Member

@keegancsmith do you think this is intentional behavior? My understanding is that patterntype: was only intended to apply to the "pattern" part of the query, which all the literals that are not filters. I would not expect content: to be subject to the patterntype rules, so I was under the impression that this was a bug (thus the fix) rather than an intentional search behavior. Note that repo: does not respect patterntype either. Maybe the better solution is just to not the /.../ delimiters for content:?

(unrelated, but I'd be like 300% in favor of getting rid of type: and patterntype: entirely, so I don't want to do anything to get in the way of that 😄)

@keegancsmith

Copy link
Copy Markdown
Member

ok you have convinced me that this is a correct change :)

@camdencheek camdencheek merged commit e6cb79c into 5.2 Dec 13, 2023
@camdencheek camdencheek deleted the backport-57679-to-5.2 branch December 13, 2023 15:17
@varungandhi-src varungandhi-src mentioned this pull request Jan 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants