LUCENE-9445 Add support for case insensitive regex searches in QueryParser#1708
LUCENE-9445 Add support for case insensitive regex searches in QueryParser#1708markharwood wants to merge 2 commits intoapache:masterfrom
Conversation
jimczi
left a comment
There was a problem hiding this comment.
+1 to add the support.
I wonder if the parsing should be more strict and only matches if there is a separator or ends after the i ? That would be "less" breaking even though I think we should consider this change as breaking anyway. Maybe we can introduce it in 8x under a flag ? Is it what you had in mind ?
Yep - end-of-string or space should be required after the end of the regex. Maybe brackets too for Boolean logic? I hadn't considered adding a flag for 8.x. If we do I'd prefer to see it support the new behaviour by default - the rationale being that it is better to give an escape hatch to the few admins concerned about BWC for an edge case than continue the legacy of silent failures for potentially many regex searchers assuming |
|
Progress update - I'm struggling a bit with how to make the parser stricter i.e. ensuring there's a space between |
@jimczi What is the behaviour for a non-match? |
I'd prefer that we throw an error if any of the character attached to a regexp is not recognized. |
|
OK. @romseygeek suggested the BWC flag is called "allow_modifiers" and, if false, legacy behaviour is used ie there would be no errors for characters after trailing |
|
@jimczi The TL/DR is I think it's going to be too hard to implement the stricter parsing logic. I spoke with @romseygeek and we couldn't see a neat way that the string after the closing Eager option - pass everything immediately after closing
|
…arser using the standard /.../i regex syntax
This PR uses the standard /.../i regex syntax to denote case insensitive queries, exposing the underlying case insensitive regex support added in LUCENE-9386
This could be considered a breaking change if users had a regex immediately followed by the letter
ibut I imagine a case insensitive search would have been the intention of the searcher all along.Jira issue: https://issues.apache.org/jira/browse/LUCENE-9445