As part of the larger effort to update and improve Elasticsearch docs, the Analysis section is in need of a revamp. Relevant issues: the topics vary in depth and completeness; some have dated examples or examples that are not consistent; chunking may need to be removed or added; and the organization is arbitrary and does not always show the relationship between one topic and another.
To make these changes, docs covered by this issue will incorporate a revised, standardized structure. For example, in token filters, I'll add examples, configuration parameters, and customization options, and replace circular definitions such as "NGram Token Filter: A token filter of type ngram" with a complete definition and explanation about when the user would expect to employ that filter.
Proposed structure:
- Title (Level 2): Definition and explanation of topic
- Example (Level 3): Vanilla example and output
- Configure parameters (Level 3): Parameters available with descriptions
- Customize (Level 3): How to customize
- Example (Level 4): Customize example and output
PRs will be revised as I work through the topics. Some PRs may include more than one topic when changes are small and comparable across topics.
Top Level Docs
Sections
Analyzers #58362
Character Filters
Token Filters
Tokenizers #58361
As part of the larger effort to update and improve Elasticsearch docs, the Analysis section is in need of a revamp. Relevant issues: the topics vary in depth and completeness; some have dated examples or examples that are not consistent; chunking may need to be removed or added; and the organization is arbitrary and does not always show the relationship between one topic and another.
To make these changes, docs covered by this issue will incorporate a revised, standardized structure. For example, in token filters, I'll add examples, configuration parameters, and customization options, and replace circular definitions such as "NGram Token Filter: A token filter of type ngram" with a complete definition and explanation about when the user would expect to employ that filter.
Proposed structure:
PRs will be revised as I work through the topics. Some PRs may include more than one topic when changes are small and comparable across topics.
Top Level Docs
Sections
Analyzers #58362
Character Filters
html_stripchar filter #57764mappingcharfilter #57818Token Filters
flatten_graphtoken filter #54268hunspelltoken filter #56955keyword_markertoken filter #54076keyword_repeattoken filter #54428kstemtoken filter #55823min_hashtoken filter docs #57181multiplexertoken filter #57555pattern_capturetoken filter #57664pattern_replacetoken filter #57699porter_stemtoken filter #56053predicate_token_filtertokenfilter #57705remove_duplicatestoken filter #53608shingletoken filter #57040snowballtoken filter #56394stemmer_overridetoken filter #56840stemmertoken filter #55693stoptoken filter #53059synonym_graphtoken filter #53901word_delimiter_graphtoken filter #53170word_delimitertoken filter #53387Tokenizers #58361