Search - add case insensitive support for regex queries.#59441
Search - add case insensitive support for regex queries.#59441markharwood merged 9 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-search (:Search/Search) |
server/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java
Outdated
Show resolved
Hide resolved
We do state in the docs that case sensitivity (or insensitivity) is in relation to the indexed content so we have a justification for the logic. I don't have a strong opinion either way on whether to error or not. The bit where I come unstuck is thinking about how to describe the default value (something we normally document). |
server/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java
Outdated
Show resolved
Hide resolved
e0f9244 to
98b0e04
Compare
|
Following discussion with @jimczi we may want to reconsider the implementation here. Rather than an explicit flag in the JSON we could consider adding passing the case insensitivity option as part of the regular expression string. There are a number of ways regex engines do this - either globally using The advantage of expressing the option as part of the regex string is that query-string and KQL based inputs could be extended to make these choices (if the users know to use this syntax). |
I made this comment for the |
Ah - I thought we'd agreed we would want KQL, query_string and RegExpQuery to all support the same string syntax if we were to expand it. If there's two ways to express the option (in-string and in JSON boolean) there's the potential for conflicting arguments. |
870371b to
6111094
Compare
|
run elasticsearch-ci/docs |
jimczi
left a comment
There was a problem hiding this comment.
The change looks good to me. I left some comments regarding the testing of the new options.
server/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/query/RegexpQueryBuilderTests.java
Outdated
Show resolved
Hide resolved
|
Thanks for the review, @jimczi Otherwise is this good to go? |
Can we be more lenient ? It's a breaking change imo so we could apply 0xff as a mask to clear out the extra bits ? |
|
Good to go @jimczi ? |
jimczi
left a comment
There was a problem hiding this comment.
I left one question for cleanup, LGTM otherwise
server/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java
Outdated
Show resolved
Hide resolved
Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master. This can be removed when 8.7 Lucene is released.
(Backport) Added case insensitive support for regex queries. Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master. This can be removed when 8.7 Lucene is released. Closes elastic#59235
|
Hi, not sure if caused by this PR. We use regex in query_string and want the search to be case insensitive. It has been working until recently we upgraded our AWS ES cluster from 7.7 to 7.10. The same regex query became case sensitive now (it only works with lower case as in the standard analyzer). I'm unable to set Changing regex to wildcard makes it case insensitive again. |
While Lucene has more complex bitmask flags for regex match settings like case insensitivity the JSON representation we add here is a simple
case_sensitiveboolean parameter which defaults to true. Currently this case logic only works with ASCII characters.To implement this feature I have forked a copy of Lucene’s RegexpQuery and RegExp classes from Lucene master.
These classes can be removed when 8.7 Lucene is released which is the first release with the case insensitivity option.