Skip to content

[DOCS] Rewrite analysis intro#51184

Merged
jrodewig merged 5 commits intoelastic:masterfrom
jrodewig:docs__analysis_intro
Jan 30, 2020
Merged

[DOCS] Rewrite analysis intro#51184
jrodewig merged 5 commits intoelastic:masterfrom
jrodewig:docs__analysis_intro

Conversation

@jrodewig
Copy link
Copy Markdown
Contributor

@jrodewig jrodewig commented Jan 17, 2020

Changes

  • Rewrites 'Text analysis' page intro as high-level definition.
    Adds guidance on when users should configure text analysis
  • Rewrites and splits index/search analysis content:
    • Conceptual content -> 'Index and search analysis' under 'Concepts'
    • Task-based content -> 'Specify an analyzer' under 'Configure...'
  • Adds detailed examples for when to use the same index/search analyzer
    and when not.
  • Adds new example snippets for specifying search analyzers

Previews

@jrodewig jrodewig added >docs General docs changes :Search Relevance/Analysis How text is split into tokens labels Jan 17, 2020
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-docs (>docs)

@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search (:Search/Analysis)

@jrodewig jrodewig changed the title [DOCS] Rewrite analysis intro. Move index/search analysis content. [DOCS] Rewrite analysis intro. Jan 17, 2020
@jrodewig jrodewig changed the title [DOCS] Rewrite analysis intro. [DOCS] Rewrite analysis intro Jan 17, 2020
@jrodewig jrodewig added WIP and removed WIP labels Jan 18, 2020
* Rewrites 'Text analysis' page intro as high-level definition.
  Adds guidance on when users should configure text analysis
* Rewrites and splits index/search analysis content:
  * Conceptual content -> 'Index and search analysis' under 'Concepts'
  * Task-based content -> 'Specify an analyzer' under 'Configure...'
* Adds detailed examples for when to use the same index/search analyzer
  and when not.
* Adds new example snippets for specifying search analyzers
@jrodewig jrodewig marked this pull request as ready for review January 18, 2020 17:37
@jrodewig jrodewig requested a review from debadair January 18, 2020 17:37
@jrodewig
Copy link
Copy Markdown
Contributor Author

@debadair Any feedback on this one? Thanks!


For instance, at index time the built-in <<english-analyzer,`english`>> _analyzer_
will first convert the sentence:
{es} comes with smart defaults for text analysis. These defaults work well for
Copy link
Copy Markdown
Contributor

@mayya-sharipova mayya-sharipova Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smart defaults? ES just uses a default (standard) analyzer for text fields, which many not be the best option for many use cases.

stopwords ("the") and reduce the terms to their word stems (foxes -> fox,
jumped -> jump, lazy -> lazi). In the end, the following terms will be added
to the inverted index:
However, there are less common cases where configuring text analysis is
Copy link
Copy Markdown
Contributor

@mayya-sharipova mayya-sharipova Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that configuring an analyzer is an uncommon use case, I would think it should be quite common for text fields.
May be to rephrase something like, if your index doesn't use text fields, you may skip chapters in this section.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! It's clearer if we just directly state that that if you use text fields, take a look. If not, go ahead and skip this section.

Made those changes with a2f08d2.


In most cases, a simple approach works best: Specify an analyzer for each
`text` field, as outlined in <<specify-index-field-analyzer>>. No other
analyzers need to be specified.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No other analyzers need to be specified.

This is not very clear for me. Sorry, if I misinterpreted this paragraph.

Analyzers are only specified for text fields, so it is impossible to specify analyzers for any other field types.

Copy link
Copy Markdown
Contributor Author

@jrodewig jrodewig Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sentence was referring to specifying an index analyzer or field-level search analyzer. However, I agree with you: this paragraph is clearer without that sentence. Thanks!

|`appli` | | X
|===

This means the search would erroneously match `apple`. Not only that, it would
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good example.

Another example could be a use case with synonyms, where we specify synonym filter only during search, as it is redundant to use synonyms both at indexing and querying.

Copy link
Copy Markdown
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrodewig Thanks, great PR.
I can't comment and review the organization of files, but the content LGTM.

@jrodewig
Copy link
Copy Markdown
Contributor Author

Thanks so much for your review @mayya-sharipova!

@jrodewig jrodewig merged commit 3c28a10 into elastic:master Jan 30, 2020
@jrodewig jrodewig deleted the docs__analysis_intro branch January 30, 2020 14:19
jrodewig added a commit that referenced this pull request Jan 30, 2020
* [DOCS] Rewrite analysis intro. Move index/search analysis content.

* Rewrites 'Text analysis' page intro as high-level definition.
  Adds guidance on when users should configure text analysis
* Rewrites and splits index/search analysis content:
  * Conceptual content -> 'Index and search analysis' under 'Concepts'
  * Task-based content -> 'Specify an analyzer' under 'Configure...'
* Adds detailed examples for when to use the same index/search analyzer
  and when not.
* Adds new example snippets for specifying search analyzers

* clarifications

* Add toc. Decrement headings.

* Reword 'When to configure' section

* Remove sentence from tip
jrodewig added a commit that referenced this pull request Jan 30, 2020
* [DOCS] Rewrite analysis intro. Move index/search analysis content.

* Rewrites 'Text analysis' page intro as high-level definition.
  Adds guidance on when users should configure text analysis
* Rewrites and splits index/search analysis content:
  * Conceptual content -> 'Index and search analysis' under 'Concepts'
  * Task-based content -> 'Specify an analyzer' under 'Configure...'
* Adds detailed examples for when to use the same index/search analyzer
  and when not.
* Adds new example snippets for specifying search analyzers

* clarifications

* Add toc. Decrement headings.

* Reword 'When to configure' section

* Remove sentence from tip
jrodewig added a commit that referenced this pull request Jan 30, 2020
* [DOCS] Rewrite analysis intro. Move index/search analysis content.

* Rewrites 'Text analysis' page intro as high-level definition.
  Adds guidance on when users should configure text analysis
* Rewrites and splits index/search analysis content:
  * Conceptual content -> 'Index and search analysis' under 'Concepts'
  * Task-based content -> 'Specify an analyzer' under 'Configure...'
* Adds detailed examples for when to use the same index/search analyzer
  and when not.
* Adds new example snippets for specifying search analyzers

* clarifications

* Add toc. Decrement headings.

* Reword 'When to configure' section

* Remove sentence from tip
@jrodewig
Copy link
Copy Markdown
Contributor Author

Backport commits

master 3c28a10
7.x 4fcf5a9
7.6 2e0ed04
7.5 a6edbbd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants