[Analysis] Support normalizer in request param#24767
Conversation
cbuescher
left a comment
There was a problem hiding this comment.
@johtani I like this PR and had fun reviewing it and learning more about this analysis feature. I left some comments but I have to appologize in advance that I'm not an expert in this area yet, however I hope the comments might be useful
There was a problem hiding this comment.
Maybe we can start having a unit test for the AnalyzeRequest in which e.g. the validate method and the serialization can be checked.
There was a problem hiding this comment.
Can you add a test for this parsing part to RestAnalyzeActionTests?
There was a problem hiding this comment.
This can maybe go inside the following else branch.
There was a problem hiding this comment.
A question out of curiosity: the analyzer we get here doesn't have to be closed (via closeAnalyzer) because its not a new instance? I don't know enough about the lifecycle of these objects yet I'm afraid.
There was a problem hiding this comment.
Yes, it already exists instance that created by IndexService or something. Only close if TransportAnalyzeAction create CustomAnalyzer
There was a problem hiding this comment.
Wouldn't it be better to throw an error here? As far as I see specifying a normalizer and analyzer or tokenizer doesn't make sense? This combination can already be detected earlier on the request I think (is validate()) always called?
There was a problem hiding this comment.
I will add check logic in request.validate() method.
Unfortunately, it is not always called. If you call shardOperation yourself directly, validate() method is not called.
There was a problem hiding this comment.
Thanks, I think its better than nothing
There was a problem hiding this comment.
Can this be split into the two cases request.normalizer() != null and (request.tokenFilters() != null && request.tokenFilters().size() > 0) || (request.charFilters() != null && request.charFilters().size() > 0) in two separate else if blocks instead of separating these cases later? I'm not entirely sure if this works, but I think it would make this part easier to read.
There was a problem hiding this comment.
Would it be possible to add a test for the second code path added in this PR (the case where normalizer == null but filter or char_filter is not null and tokenizer/analyzer is null)? I don't know if it is possible with this test setup but it might be useful
There was a problem hiding this comment.
Ah, I added that test case in rest api test. Now, we are moving to filter/char_filter to analysis-common module, so I think it would be better than in this test class.
There was a problem hiding this comment.
replace twitter with the new index name
There was a problem hiding this comment.
nit: "can analyze",
nit: "... or if char_filter/filter is set and tokenizer/analyzer is not set"
|
@elasticmachine test this please |
a2dbf1d to
39c3eec
Compare
|
@cbuescher Passed CI, please review again after the conference :) |
There was a problem hiding this comment.
++ thanks for adding these checks
There was a problem hiding this comment.
More of a question: I see how we use this in other bwc tests as well, I guess it represents the request. How did you get that String, do we have tools for that?
There was a problem hiding this comment.
I'm not sure... I made the string using Base64.getEncoder() and sysout...
There was a problem hiding this comment.
can you add a comment saying what request it represents and which version it has been generated with?
There was a problem hiding this comment.
nit: maybe use VandomUtils#randomVersionBetween()
There was a problem hiding this comment.
Oh, good to know. I don't know it :)
jpountz
left a comment
There was a problem hiding this comment.
Please call getMultiTermComponent on factories, but otherwise it looks good to me!
There was a problem hiding this comment.
looks like you are missing the call to MultiTermAwareComponent.getMultiTermComponent?
There was a problem hiding this comment.
can you add a comment saying what request it represents and which version it has been generated with?
Support normalizer param Support custom normalizer with char_filter/filter param Closes elastic#23347
Add AnalyzeRequestTest Fix some comments
Fix some comments Remove non-use imports elastic#23347
6b06274 to
8d72356
Compare
Fix some comments
8d72356 to
c6dd360
Compare
|
@jpountz Rebased master and moved check and call logic into parseTokenFilterFactories |
* master: [Analysis] Support normalizer in request param (elastic#24767) Remove deprecated IdsQueryBuilder constructor (elastic#25529) Adds check for negative search request size (elastic#25397) test: also inspect the upgrade api response to check whether the upgrade really ran [DOCS] restructure java clients docs pages (elastic#25517)
Support normalizer param and custom normalizer with char_filter/filter param.
In this PR, I didn't change a response.
If user send a request with keyword field name or normalizer name, analyze api display a response with tokenizer that is KeywordTokenizer.
Should we change a response format for normalizer?
Closes #23347