[Backport 5.2] Search: do not interpret content: as regex if it is quoted#58904
Conversation
We have some logic that annotates a `content:` node as regex if `patterntype:regex` is set. This allows us to do things like `content:^a:b`, since escaping colons used to be a significant part of how `content:` was used. However, another significant way `content:` is used is to just search for string literals that would otherwise be interpreted as search syntax. Like `content:"a and b"`. It's a pretty solid heuristic to not treat a quoted string as regex, especially since we now have `/.../`-delimited regex literals, which makes using `content:` for regex patterns largely unnecessary. This was causing an issue where a parsing roundtrip for code insights would mangle a query like `content:"TEST" patterntype:regex` to `content:/"TEST"/`, which is obviously not ideal. (cherry picked from commit b621b8b)
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 021f43a...d13ea21.
|
|
I realise this is a backport, but I'm unsure if this makes sense. By default we search both filenames and content, so content is a way to limit it to just content. Additionally changing how the search language works to fix a bug in how code insights works gives me pause. FYI We are currently rethinking some of this in search platform. In particular one of our ideas is getting rid of type: and patterntype:. If you want to just search files, do content: to be more specific. If you want to do a regex, enclose your string with |
|
@keegancsmith do you think this is intentional behavior? My understanding is that (unrelated, but I'd be like 300% in favor of getting rid of |
|
ok you have convinced me that this is a correct change :) |
We have some logic that annotates a
content:node as regex ifpatterntype:regexis set. This allows us to do things likecontent:^a:b, since escaping colons used to be a significant part of howcontent:was used.However, another significant way
content:is used is to just search for string literals that would otherwise be interpreted as search syntax. Likecontent:"a and b". It's a pretty solid heuristic to not treat a quoted string as regex, especially since we now have/.../-delimited regex literals, which makes usingcontent:for regex patterns largely unnecessary.This was causing an issue where a parsing roundtrip for code insights would mangle a query like
content:"TEST" patterntype:regextocontent:/"TEST"/, which is obviously not ideal.Fixes #57323
Test plan
Added a unit test.
Backport b621b8b from #57679