Add preserve_original setting in edge ngram token filter#55766
Conversation
…c and new test class
1c4cf4e to
ba598da
Compare
|
Pinging @elastic/es-search (:Search/Analysis) |
cbuescher
left a comment
There was a problem hiding this comment.
HI @amitmbm, thanks for opening this PR, looks great. I only left a few very minor remarks around formatting etc., the rest is okay. I will enabling running the tests so everything should be run past CI once you push another commit.
| package org.elasticsearch.analysis.common; | ||
|
|
||
| import org.apache.lucene.analysis.Tokenizer; | ||
| import org.apache.lucene.analysis.core.WhitespaceTokenizer; |
There was a problem hiding this comment.
nit: this seems unused, our checkstyle rules will complain about unused imports, so better to remove it now before running the tests.
There was a problem hiding this comment.
My intelliJ removed unused import wasn't configured for elasticsearch project, enabled it now :)
|
|
||
| `preserve_original`:: | ||
| (Optional, boolean) | ||
| If set to true then it would also emit the original token. Defaults to `false`. |
There was a problem hiding this comment.
nit: wording might be better sth like "Emits original token then set to true. Defaults to false.
There was a problem hiding this comment.
changed to Emits original token when set to true. Defaults to false. --> notice changed to when from then in the suggested edit.
| /** | ||
| * Test class for edge_ngram token filter. | ||
| * | ||
| * @author Amit Khandelwal |
There was a problem hiding this comment.
nit: we usually don't add @author tags to classes or test classes but rely on the commit history rather than code comments to track authors
| * @author Amit Khandelwal | ||
| */ | ||
| public class EdgeNGramTokenFilterFactoryTests extends ESTokenStreamTestCase { | ||
| public void testDefault() throws IOException { |
There was a problem hiding this comment.
nit: maybe add newline befor first test method.
There was a problem hiding this comment.
Just observed this in so many other test classes and copy-pasted the initial test setup :). nvm removed this.
|
@elasticmahine ok to test |
|
@elasticmachine run elasticsearch-ci/bwc |
|
@cbuescher thanks for kicking another test try for |
I agree, we'd still like to get a clean test run. Lets try this again. |
|
@elasticmachine update branch |
…gram-token-filter
|
@cbuescher looks like merging master into my feature branch fixed the test failures. Let me know if you can merge it if all looks OK. |
|
Hi @amitmbm, I merged your change to master and will also port it to the latest 7.x branch. Thanks for picking this up. |
The Lucene `preserve_original` setting is currently not supported in the `edge_ngram` token filter. This change adds it with a default value of `false`. Closes #55767
|
@cbuescher I'm really glad as it's my first commit merged to Elastic code base, I had raised another similar PR #55432 which is almost reviewed by your colleague Mark Harwood, but then there is no update on this PR from last 4 days. Hope he is safe and if you get time please look into this. After this, I want to pick some more changes and one of them is deprecating |
|
Thanks, great to hear you enjoyed working on the PR. We try to review user PRs in a timely manner but please don't expect anyone to respond to new commits etc... immediately because we all handle this differently and asynchronously. |
|
@cbuescher I understand that Elastic as a whole company work in async mode and my intent is not to push my PRs for review, it was stuck so I thought to bring this to you notice. Anyway thanks a lot for explaining this and I would keep this in mind. Also, reg. the deprecation changes, As you pointed out it requires more discussion, I would open a new issue and will discuss it there. Have a great day ahead 😄 |
|
Do we have any update on this PR? Did you translate the discussion? |
|
@mmoreram I raised PR but as @cbuescher mention in this comment , AS it's related to deprecation, hence he wanted to discuss this more with his team, I am also waiting for his update. @cbuescher let me know if you have any update. |
Raised this PR to address the bug #55767