[Analysis] Support normalizer in request param by johtani · Pull Request #24767 · elastic/elasticsearch

johtani · 2017-05-18T08:54:28Z

Support normalizer param and custom normalizer with char_filter/filter param.

In this PR, I didn't change a response.
If user send a request with keyword field name or normalizer name, analyze api display a response with tokenizer that is KeywordTokenizer.
Should we change a response format for normalizer?

Closes #23347

cbuescher

@johtani I like this PR and had fun reviewing it and learning more about this analysis feature. I left some comments but I have to appologize in advance that I'm not an expert in this area yet, however I hope the comments might be useful

cbuescher · 2017-05-31T13:17:16Z

core/src/main/java/org/elasticsearch/action/admin/indices/analyze/AnalyzeRequest.java

Maybe we can start having a unit test for the AnalyzeRequest in which e.g. the validate method and the serialization can be checked.

cbuescher · 2017-05-31T13:18:40Z

core/src/main/java/org/elasticsearch/rest/action/admin/indices/RestAnalyzeAction.java

Can you add a test for this parsing part to RestAnalyzeActionTests?

cbuescher · 2017-05-31T13:31:12Z

core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java

This can maybe go inside the following else branch.

cbuescher · 2017-05-31T13:41:25Z

core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java

A question out of curiosity: the analyzer we get here doesn't have to be closed (via closeAnalyzer) because its not a new instance? I don't know enough about the lifecycle of these objects yet I'm afraid.

Yes, it already exists instance that created by IndexService or something. Only close if TransportAnalyzeAction create CustomAnalyzer

cbuescher · 2017-05-31T13:47:20Z

core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java

Wouldn't it be better to throw an error here? As far as I see specifying a normalizer and analyzer or tokenizer doesn't make sense? This combination can already be detected earlier on the request I think (is validate()) always called?

I will add check logic in request.validate() method.
Unfortunately, it is not always called. If you call shardOperation yourself directly, validate() method is not called.

Thanks, I think its better than nothing

cbuescher · 2017-05-31T13:57:46Z

core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java

Can this be split into the two cases request.normalizer() != null and (request.tokenFilters() != null && request.tokenFilters().size() > 0) || (request.charFilters() != null && request.charFilters().size() > 0) in two separate else if blocks instead of separating these cases later? I'm not entirely sure if this works, but I think it would make this part easier to read.

cbuescher · 2017-05-31T14:05:56Z

core/src/test/java/org/elasticsearch/action/admin/indices/TransportAnalyzeActionTests.java

Would it be possible to add a test for the second code path added in this PR (the case where normalizer == null but filter or char_filter is not null and tokenizer/analyzer is null)? I don't know if it is possible with this test setup but it might be useful

Ah, I added that test case in rest api test. Now, we are moving to filter/char_filter to analysis-common module, so I think it would be better than in this test class.

makes sense

cbuescher · 2017-05-31T14:11:16Z

docs/reference/indices/analyze.asciidoc

replace twitter with the new index name

good catch :)

cbuescher · 2017-05-31T14:14:45Z

docs/reference/migration/migrate_6_0/rest.asciidoc

nit: "is requiring a"

cbuescher · 2017-05-31T14:17:03Z

docs/reference/migration/migrate_6_0/rest.asciidoc

nit: "can analyze",
nit: "... or if char_filter/filter is set and tokenizer/analyzer is not set"

johtani · 2017-06-11T10:35:14Z

@elasticmachine test this please

johtani · 2017-06-12T10:35:11Z

@cbuescher Passed CI, please review again after the conference :)

cbuescher

Thanks @johtani, LGTM.
I left a few minor comments, feel free to adapt or simply ignore them. The question I left is only for my own understanding.

cbuescher · 2017-06-20T10:16:55Z

core/src/test/java/org/elasticsearch/action/admin/indices/analyze/AnalyzeRequestTests.java

++ thanks for adding these checks

cbuescher · 2017-06-20T10:18:04Z

core/src/test/java/org/elasticsearch/rest/action/admin/indices/RestAnalyzeActionTests.java

nit: s/Nromalizer/Normalizer

Good catch :)

cbuescher · 2017-06-20T10:48:21Z

core/src/test/java/org/elasticsearch/action/admin/indices/analyze/AnalyzeRequestTests.java

More of a question: I see how we use this in other bwc tests as well, I guess it represents the request. How did you get that String, do we have tools for that?

I'm not sure... I made the string using Base64.getEncoder() and sysout...

can you add a comment saying what request it represents and which version it has been generated with?

cbuescher · 2017-06-20T10:49:36Z

core/src/test/java/org/elasticsearch/action/admin/indices/analyze/AnalyzeRequestTests.java

nit: maybe use VandomUtils#randomVersionBetween()

Oh, good to know. I don't know it :)

jpountz

Please call getMultiTermComponent on factories, but otherwise it looks good to me!

jpountz · 2017-06-22T10:31:06Z

core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java

looks like you are missing the call to MultiTermAwareComponent.getMultiTermComponent?

jpountz · 2017-06-22T10:33:08Z

core/src/test/java/org/elasticsearch/action/admin/indices/analyze/AnalyzeRequestTests.java

can you add a comment saying what request it represents and which version it has been generated with?

Support normalizer param Support custom normalizer with char_filter/filter param Closes elastic#23347

Add AnalyzeRequestTest Fix some comments

Fix some comments Remove non-use imports elastic#23347

Fix some comments

johtani · 2017-06-28T08:09:06Z

@jpountz Rebased master and moved check and call logic into parseTokenFilterFactories
Could you review this again?

jpountz

LGTM

* master: [Analysis] Support normalizer in request param (elastic#24767) Remove deprecated IdsQueryBuilder constructor (elastic#25529) Adds check for negative search request size (elastic#25397) test: also inspect the upgrade api response to check whether the upgrade really ran [DOCS] restructure java clients docs pages (elastic#25517)

johtani added :Search Relevance/Analysis How text is split into tokens >enhancement v6.0.0 review labels May 18, 2017

cbuescher requested changes May 31, 2017

View reviewed changes

johtani force-pushed the support_normalizer_in_analyze_api branch 4 times, most recently from a2dbf1d to 39c3eec Compare June 12, 2017 05:47

cbuescher self-assigned this Jun 20, 2017

cbuescher approved these changes Jun 20, 2017

View reviewed changes

jpountz suggested changes Jun 22, 2017

View reviewed changes

johtani added 3 commits June 26, 2017 17:32

[Analysis] Support normalizer in request param

012749d

Support normalizer param Support custom normalizer with char_filter/filter param Closes elastic#23347

[Analysis] Support normalizer in request param

6dc62f5

Add AnalyzeRequestTest Fix some comments

[Analysis] Support normalizer in request param

37bb2f1

Fix some comments Remove non-use imports elastic#23347

johtani force-pushed the support_normalizer_in_analyze_api branch from 6b06274 to 8d72356 Compare June 27, 2017 22:34

[Analysis] Support normalizer in request param

c6dd360

Fix some comments

johtani force-pushed the support_normalizer_in_analyze_api branch from 8d72356 to c6dd360 Compare June 28, 2017 06:49

jpountz approved these changes Jul 4, 2017

View reviewed changes

johtani merged commit 6894ef6 into elastic:master Jul 4, 2017

clintongormley added v6.0.0-beta1 and removed v6.0.0 labels Jul 25, 2017

jimczi mentioned this pull request Sep 4, 2017

_analyze API skips char_filter when no tokenizer/filters specified #26495

Closed

Conversation

johtani commented May 18, 2017

Uh oh!

cbuescher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johtani Jun 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johtani commented Jun 11, 2017

Uh oh!

johtani commented Jun 12, 2017

Uh oh!

cbuescher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johtani commented Jun 28, 2017

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

johtani Jun 11, 2017 •

edited

Loading