Skip to content

Commit 24a50eb

Browse files
authored
[DOCS] Reformat mapping charfilter (#57818) (#57885)
Changes: * Adds title abbreviation * Adds Lucene link to description * Adds standard headings * Simplifies analyze example * Simplifies analyzer example and adds contextual text
1 parent c171214 commit 24a50eb

1 file changed

Lines changed: 98 additions & 120 deletions

File tree

Lines changed: 98 additions & 120 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
[[analysis-mapping-charfilter]]
2-
=== Mapping Char Filter
2+
=== Mapping character filter
3+
++++
4+
<titleabbrev>Mapping</titleabbrev>
5+
++++
36

47
The `mapping` character filter accepts a map of keys and values. Whenever it
58
encounters a string of characters that is the same as a key, it replaces them
@@ -8,75 +11,53 @@ with the value associated with that key.
811
Matching is greedy; the longest pattern matching at a given point wins.
912
Replacements are allowed to be the empty string.
1013

11-
[float]
12-
=== Configuration
14+
The `mapping` filter uses Lucene's
15+
{lucene-analysis-docs}/charfilter/MappingCharFilter.html[MappingCharFilter].
1316

14-
The `mapping` character filter accepts the following parameters:
17+
[[analysis-mapping-charfilter-analyze-ex]]
18+
==== Example
1519

16-
[horizontal]
17-
`mappings`::
18-
19-
A array of mappings, with each element having the form `key => value`.
20-
21-
`mappings_path`::
22-
23-
A path, either absolute or relative to the `config` directory, to a UTF-8
24-
encoded text mappings file containing a `key => value` mapping per line.
25-
26-
Either the `mappings` or `mappings_path` parameter must be provided.
27-
28-
[float]
29-
=== Example configuration
30-
31-
In this example, we configure the `mapping` character filter to replace Arabic
32-
numerals with their Latin equivalents:
20+
The following <<indices-analyze,analyze API>> request uses the `mapping` filter
21+
to convert Hindu-Arabic numerals (٠‎١٢٣٤٥٦٧٨‎٩‎) into their Arabic-Latin
22+
equivalents (0123456789), changing the text `My license plate is ٢٥٠١٥` to
23+
`My license plate is 25015`.
3324

3425
[source,console]
35-
----------------------------
36-
PUT my_index
26+
----
27+
GET /_analyze
3728
{
38-
"settings": {
39-
"analysis": {
40-
"analyzer": {
41-
"my_analyzer": {
42-
"tokenizer": "keyword",
43-
"char_filter": [
44-
"my_char_filter"
45-
]
46-
}
47-
},
48-
"char_filter": {
49-
"my_char_filter": {
50-
"type": "mapping",
51-
"mappings": [
52-
"٠ => 0",
53-
"١ => 1",
54-
"٢ => 2",
55-
"٣ => 3",
56-
"٤ => 4",
57-
"٥ => 5",
58-
"٦ => 6",
59-
"٧ => 7",
60-
"٨ => 8",
61-
"٩ => 9"
62-
]
63-
}
64-
}
29+
"tokenizer": "keyword",
30+
"char_filter": [
31+
{
32+
"type": "mapping",
33+
"mappings": [
34+
"٠ => 0",
35+
"١ => 1",
36+
"٢ => 2",
37+
"٣ => 3",
38+
"٤ => 4",
39+
"٥ => 5",
40+
"٦ => 6",
41+
"٧ => 7",
42+
"٨ => 8",
43+
"٩ => 9"
44+
]
6545
}
66-
}
67-
}
68-
69-
POST my_index/_analyze
70-
{
71-
"analyzer": "my_analyzer",
46+
],
7247
"text": "My license plate is ٢٥٠١٥"
7348
}
74-
----------------------------
49+
----
50+
51+
The filter produces the following text:
7552

76-
/////////////////////
53+
[source,text]
54+
----
55+
[ My license plate is 25015 ]
56+
----
7757

58+
////
7859
[source,console-result]
79-
----------------------------
60+
----
8061
{
8162
"tokens": [
8263
{
@@ -88,37 +69,58 @@ POST my_index/_analyze
8869
}
8970
]
9071
}
91-
----------------------------
72+
----
73+
////
9274

93-
/////////////////////
75+
[[analysis-mapping-charfilter-configure-parms]]
76+
==== Configurable parameters
9477

78+
`mappings`::
79+
(Required*, array of strings)
80+
Array of mappings, with each element having the form `key => value`.
81+
+
82+
Either this or the `mappings_path` parameter must be specified.
9583

96-
The above example produces the following term:
84+
`mappings_path`::
85+
(Required*, string)
86+
Path to a file containing `key => value` mappings.
87+
+
88+
This path must be absolute or relative to the `config` location, and the file
89+
must be UTF-8 encoded. Each mapping in the file must be separated by a line
90+
break.
91+
+
92+
Either this or the `mappings` parameter must be specified.
9793

98-
[source,text]
99-
---------------------------
100-
[ My license plate is 25015 ]
101-
---------------------------
94+
[[analysis-mapping-charfilter-customize]]
95+
==== Customize and add to an analyzer
96+
97+
To customize the `mappings` filter, duplicate it to create the basis for a new
98+
custom character filter. You can modify the filter using its configurable
99+
parameters.
102100

103-
Keys and values can be strings with multiple characters. The following
104-
example replaces the `:)` and `:(` emoticons with a text equivalent:
101+
The following <<indices-create-index,create index API>> request
102+
configures a new <<analysis-custom-analyzer,custom analyzer>> using a custom
103+
`mappings` filter, `my_mappings_char_filter`.
104+
105+
The `my_mappings_char_filter` filter replaces the `:)` and `:(` emoticons
106+
with a text equivalent.
105107

106108
[source,console]
107-
----------------------------
108-
PUT my_index
109+
----
110+
PUT /my_index
109111
{
110112
"settings": {
111113
"analysis": {
112114
"analyzer": {
113115
"my_analyzer": {
114116
"tokenizer": "standard",
115117
"char_filter": [
116-
"my_char_filter"
118+
"my_mappings_char_filter"
117119
]
118120
}
119121
},
120122
"char_filter": {
121-
"my_char_filter": {
123+
"my_mappings_char_filter": {
122124
"type": "mapping",
123125
"mappings": [
124126
":) => _happy_",
@@ -129,67 +131,43 @@ PUT my_index
129131
}
130132
}
131133
}
134+
----
135+
136+
The following <<indices-analyze,analyze API>> request uses the custom
137+
`my_mappings_char_filter` to replace `:(` with `_sad_` in
138+
the text `I'm delighted about it :(`.
132139

133-
POST my_index/_analyze
140+
[source,console]
141+
----
142+
GET /my_index/_analyze
134143
{
135-
"analyzer": "my_analyzer",
144+
"tokenizer": "keyword",
145+
"char_filter": [ "my_mappings_char_filter" ],
136146
"text": "I'm delighted about it :("
137147
}
138-
----------------------------
148+
----
149+
// TEST[continued]
139150

151+
The filter produces the following text:
140152

141-
/////////////////////
153+
[source,text]
154+
---------------------------
155+
[ I'm delighted about it _sad_ ]
156+
---------------------------
142157

158+
////
143159
[source,console-result]
144-
----------------------------
160+
----
145161
{
146162
"tokens": [
147163
{
148-
"token": "I'm",
164+
"token": "I'm delighted about it _sad_",
149165
"start_offset": 0,
150-
"end_offset": 3,
151-
"type": "<ALPHANUM>",
152-
"position": 0
153-
},
154-
{
155-
"token": "delighted",
156-
"start_offset": 4,
157-
"end_offset": 13,
158-
"type": "<ALPHANUM>",
159-
"position": 1
160-
},
161-
{
162-
"token": "about",
163-
"start_offset": 14,
164-
"end_offset": 19,
165-
"type": "<ALPHANUM>",
166-
"position": 2
167-
},
168-
{
169-
"token": "it",
170-
"start_offset": 20,
171-
"end_offset": 22,
172-
"type": "<ALPHANUM>",
173-
"position": 3
174-
},
175-
{
176-
"token": "_sad_",
177-
"start_offset": 23,
178166
"end_offset": 25,
179-
"type": "<ALPHANUM>",
180-
"position": 4
167+
"type": "word",
168+
"position": 0
181169
}
182170
]
183171
}
184-
----------------------------
185-
186-
187-
/////////////////////
188-
189-
190-
The above example produces the following terms:
191-
192-
[source,text]
193-
---------------------------
194-
[ I'm, delighted, about, it, _sad_ ]
195-
---------------------------
172+
----
173+
////

0 commit comments

Comments
 (0)