Skip to content

Commit 36ae8eb

Browse files
authored
[DOCS] Reformat porter_stem token filter (#56053)
Makes the following changes to the `porter_stem` token filter docs: * Rewrites description and adds a Lucene link * Adds detailed analyze example * Adds an analyzer example
1 parent c0ee6d2 commit 36ae8eb

1 file changed

Lines changed: 108 additions & 12 deletions

File tree

docs/reference/analysis/tokenfilters/porterstem-tokenfilter.asciidoc

Lines changed: 108 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,111 @@
44
<titleabbrev>Porter stem</titleabbrev>
55
++++
66

7-
A token filter of type `porter_stem` that transforms the token stream as
8-
per the Porter stemming algorithm.
9-
10-
Note, the input to the stemming filter must already be in lower case, so
11-
you will need to use
12-
<<analysis-lowercase-tokenfilter,Lower
13-
Case Token Filter>> or
14-
<<analysis-lowercase-tokenizer,Lower
15-
Case Tokenizer>> farther down the Tokenizer chain in order for this to
16-
work properly!. For example, when using custom analyzer, make sure the
17-
`lowercase` filter comes before the `porter_stem` filter in the list of
18-
filters.
7+
Provides <<algorithmic-stemmers,algorithmic stemming>> for the English language,
8+
based on the http://snowball.tartarus.org/algorithms/porter/stemmer.html[Porter
9+
stemming algorithm].
10+
11+
This filter tends to stem more aggressively than other English
12+
stemmer filters, such as the <<analysis-kstem-tokenfilter,`kstem`>> filter.
13+
14+
The `porter_stem` filter is equivalent to the
15+
<<analysis-stemmer-tokenfilter,`stemmer`>> filter's
16+
<<analysis-stemmer-tokenfilter-language-parm,`english`>> variant.
17+
18+
The `porter_stem` filter uses Lucene's
19+
{lucene-analysis-docs}/en/PorterStemFilter.html[PorterStemFilter].
20+
21+
[[analysis-porterstem-tokenfilter-analyze-ex]]
22+
==== Example
23+
24+
The following analyze API request uses the `porter_stem` filter to stem
25+
`the foxes jumping quickly` to `the fox jump quickli`:
26+
27+
[source,console]
28+
----
29+
GET /_analyze
30+
{
31+
"tokenizer": "standard",
32+
"filter": [ "porter_stem" ],
33+
"text": "the foxes jumping quickly"
34+
}
35+
----
36+
37+
The filter produces the following tokens:
38+
39+
[source,text]
40+
----
41+
[ the, fox, jump, quickli ]
42+
----
43+
44+
////
45+
[source,console-result]
46+
----
47+
{
48+
"tokens": [
49+
{
50+
"token": "the",
51+
"start_offset": 0,
52+
"end_offset": 3,
53+
"type": "<ALPHANUM>",
54+
"position": 0
55+
},
56+
{
57+
"token": "fox",
58+
"start_offset": 4,
59+
"end_offset": 9,
60+
"type": "<ALPHANUM>",
61+
"position": 1
62+
},
63+
{
64+
"token": "jump",
65+
"start_offset": 10,
66+
"end_offset": 17,
67+
"type": "<ALPHANUM>",
68+
"position": 2
69+
},
70+
{
71+
"token": "quickli",
72+
"start_offset": 18,
73+
"end_offset": 25,
74+
"type": "<ALPHANUM>",
75+
"position": 3
76+
}
77+
]
78+
}
79+
----
80+
////
81+
82+
[[analysis-porterstem-tokenfilter-analyzer-ex]]
83+
==== Add to an analyzer
84+
85+
The following <<indices-create-index,create index API>> request uses the
86+
`porter_stem` filter to configure a new <<analysis-custom-analyzer,custom
87+
analyzer>>.
88+
89+
[IMPORTANT]
90+
====
91+
To work properly, the `porter_stem` filter requires lowercase tokens. To ensure
92+
tokens are lowercased, add the <<analysis-lowercase-tokenfilter,`lowercase`>>
93+
filter before the `porter_stem` filter in the analyzer configuration.
94+
====
95+
96+
[source,console]
97+
----
98+
PUT /my_index
99+
{
100+
"settings": {
101+
"analysis": {
102+
"analyzer": {
103+
"my_analyzer": {
104+
"tokenizer": "whitespace",
105+
"filter": [
106+
"lowercase",
107+
"porter_stem"
108+
]
109+
}
110+
}
111+
}
112+
}
113+
}
114+
----

0 commit comments

Comments
 (0)