Skip to content

Commit e6a469c

Browse files
committed
[DOCS] Reformat uppercase token filter docs (#50555)
* Updates the description and adds a Lucene link * Adds analyze and custom analyzer snippets
1 parent ca0828b commit e6a469c

1 file changed

Lines changed: 99 additions & 2 deletions

File tree

docs/reference/analysis/tokenfilters/uppercase-tokenfilter.asciidoc

Lines changed: 99 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,102 @@
44
<titleabbrev>Uppercase</titleabbrev>
55
++++
66

7-
A token filter of type `uppercase` that normalizes token text to upper
8-
case.
7+
Changes token text to uppercase. For example, you can use the `uppercase` filter
8+
to change `the Lazy DoG` to `THE LAZY DOG`.
9+
10+
This filter uses Lucene's
11+
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/core/UpperCaseFilter.html[UpperCaseFilter].
12+
13+
[WARNING]
14+
====
15+
Depending on the language, an uppercase character can map to multiple
16+
lowercase characters. Using the `uppercase` filter could result in the loss of
17+
lowercase character information.
18+
19+
To avoid this loss but still have a consistent lettercase, use the <<analysis-lowercase-tokenfilter,`lowercase`>> filter instead.
20+
====
21+
22+
[[analysis-uppercase-tokenfilter-analyze-ex]]
23+
==== Example
24+
25+
The following <<indices-analyze,analyze API>> request uses the default
26+
`uppercase` filter to change the `the Quick FoX JUMPs` to uppercase:
27+
28+
[source,console]
29+
--------------------------------------------------
30+
GET _analyze
31+
{
32+
"tokenizer" : "standard",
33+
"filter" : ["uppercase"],
34+
"text" : "the Quick FoX JUMPs"
35+
}
36+
--------------------------------------------------
37+
38+
The filter produces the following tokens:
39+
40+
[source,text]
41+
--------------------------------------------------
42+
[ THE, QUICK, FOX, JUMPS ]
43+
--------------------------------------------------
44+
45+
/////////////////////
46+
[source,console-result]
47+
--------------------------------------------------
48+
{
49+
"tokens" : [
50+
{
51+
"token" : "THE",
52+
"start_offset" : 0,
53+
"end_offset" : 3,
54+
"type" : "<ALPHANUM>",
55+
"position" : 0
56+
},
57+
{
58+
"token" : "QUICK",
59+
"start_offset" : 4,
60+
"end_offset" : 9,
61+
"type" : "<ALPHANUM>",
62+
"position" : 1
63+
},
64+
{
65+
"token" : "FOX",
66+
"start_offset" : 10,
67+
"end_offset" : 13,
68+
"type" : "<ALPHANUM>",
69+
"position" : 2
70+
},
71+
{
72+
"token" : "JUMPS",
73+
"start_offset" : 14,
74+
"end_offset" : 19,
75+
"type" : "<ALPHANUM>",
76+
"position" : 3
77+
}
78+
]
79+
}
80+
--------------------------------------------------
81+
/////////////////////
82+
83+
[[analysis-uppercase-tokenfilter-analyzer-ex]]
84+
==== Add to an analyzer
85+
86+
The following <<indices-create-index,create index API>> request uses the
87+
`uppercase` filter to configure a new
88+
<<analysis-custom-analyzer,custom analyzer>>.
89+
90+
[source,console]
91+
--------------------------------------------------
92+
PUT uppercase_example
93+
{
94+
"settings" : {
95+
"analysis" : {
96+
"analyzer" : {
97+
"whitespace_uppercase" : {
98+
"tokenizer" : "whitespace",
99+
"filter" : ["uppercase"]
100+
}
101+
}
102+
}
103+
}
104+
}
105+
--------------------------------------------------

0 commit comments

Comments
 (0)