[DOCS] Reformat apostrophe token filter docs#48076
[DOCS] Reformat apostrophe token filter docs#48076jrodewig merged 5 commits intoelastic:masterfrom jrodewig:reformat.apos-token-filter
Conversation
|
Pinging @elastic/es-search (:Search/Analysis) |
|
Pinging @elastic/es-docs (>docs) |
| The `apostrophe` token filter produces the following tokens: | ||
|
|
||
| [source,text] | ||
| --------------------------- | ||
| [ Istanbul, veya, Istanbul ] | ||
| --------------------------- | ||
|
|
||
| ///////////////////// | ||
| [source,console-result] |
There was a problem hiding this comment.
I would love any feedback here.
The token example is based on the one from the simple analyzer:
https://www.elastic.co/guide/en/elasticsearch/reference/master/analysis-simple-analyzer.html#_example_7
There was a problem hiding this comment.
From my understanding, this token filter was built to be used with Turkish. For clarity, it could be worth mentioning that explicitly, perhaps linking to the Turkish analyzer (which makes use of this filter).
There was a problem hiding this comment.
That's a great idea. I've added this information to the description at the top with 04a33a6.
jtibshirani
left a comment
There was a problem hiding this comment.
The overall format seems like a nice improvement to me. It's great to give an example of how the text gets analyzed, and I think using the _analyze API will help more users become aware of this helpful endpoint.
One thought about the documentation format -- we could consider linking to the Lucene documentation when it exists, as it often contains more detailed information or paper references. This also gives more clarity around where the implementation lives, so users know where to go to dig into code or file a bug.
| The `apostrophe` token filter produces the following tokens: | ||
|
|
||
| [source,text] | ||
| --------------------------- | ||
| [ Istanbul, veya, Istanbul ] | ||
| --------------------------- | ||
|
|
||
| ///////////////////// | ||
| [source,console-result] |
There was a problem hiding this comment.
From my understanding, this token filter was built to be used with Turkish. For clarity, it could be worth mentioning that explicitly, perhaps linking to the Turkish analyzer (which makes use of this filter).
| "settings" : { | ||
| "analysis" : { | ||
| "analyzer" : { | ||
| "default" : { |
There was a problem hiding this comment.
Small comment, maybe we don't want to always call the analyzer default. We could use a more specific name like apostrophe or standard_apostrophe.
There was a problem hiding this comment.
I changed the analyzer name to standard_apostrophe with 04a33a6.
There was a problem hiding this comment.
To add to my comment: when experimenting with token filters, it seems more likely that the user will want to create a custom analyzer as opposed to overriding the default one. Maybe we could make a few small tweaks to clarify this, including using a specific analyzer name other than defaultand linking to the custom analyzer docs.
There was a problem hiding this comment.
Oops, I had a race condition with my comment. This looks good to me, we could also link to the custom analyzer docs if you think it's helpful.
|
Thanks for your feedback @jtibshirani. I made a few changes based on comments with 04a33a6. This includes linking to Lucene's docs, which is a good idea! |
| ==== Add to an analyzer | ||
|
|
||
| The following <<indices-create-index,create index API>> request adds the | ||
| apostrophe token filter to an analyzer. |
There was a problem hiding this comment.
One last small suggestion -- instead of 'adding to an an analyzer' it could be clearer/ more precise to say 'uses the token filter to configure a new analyzer'.
There was a problem hiding this comment.
Thanks for the clarification! Reworded with a847eeb..
jtibshirani
left a comment
There was a problem hiding this comment.
This looks good to me.
Reformats the
apostrophetoken filter docs. This PR adds:I hope to re-use this format for other token filter docs. All feedback is welcome!