11[[analysis-mapping-charfilter]]
2- === Mapping Char Filter
2+ === Mapping character filter
3+ ++++
4+ <titleabbrev>Mapping</titleabbrev>
5+ ++++
36
47The `mapping` character filter accepts a map of keys and values. Whenever it
58encounters a string of characters that is the same as a key, it replaces them
@@ -8,75 +11,53 @@ with the value associated with that key.
811Matching is greedy; the longest pattern matching at a given point wins.
912Replacements are allowed to be the empty string.
1013
11- [float]
12- === Configuration
14+ The `mapping` filter uses Lucene's
15+ {lucene-analysis-docs}/charfilter/MappingCharFilter.html[MappingCharFilter].
1316
14- The `mapping` character filter accepts the following parameters:
17+ [[analysis-mapping-charfilter-analyze-ex]]
18+ ==== Example
1519
16- [horizontal]
17- `mappings`::
18-
19- A array of mappings, with each element having the form `key => value`.
20-
21- `mappings_path`::
22-
23- A path, either absolute or relative to the `config` directory, to a UTF-8
24- encoded text mappings file containing a `key => value` mapping per line.
25-
26- Either the `mappings` or `mappings_path` parameter must be provided.
27-
28- [float]
29- === Example configuration
30-
31- In this example, we configure the `mapping` character filter to replace Arabic
32- numerals with their Latin equivalents:
20+ The following <<indices-analyze,analyze API>> request uses the `mapping` filter
21+ to convert Hindu-Arabic numerals (٠١٢٣٤٥٦٧٨٩) into their Arabic-Latin
22+ equivalents (0123456789), changing the text `My license plate is ٢٥٠١٥` to
23+ `My license plate is 25015`.
3324
3425[source,console]
35- ----------------------------
36- PUT my_index
26+ ----
27+ GET /_analyze
3728{
38- "settings": {
39- "analysis": {
40- "analyzer": {
41- "my_analyzer": {
42- "tokenizer": "keyword",
43- "char_filter": [
44- "my_char_filter"
45- ]
46- }
47- },
48- "char_filter": {
49- "my_char_filter": {
50- "type": "mapping",
51- "mappings": [
52- "٠ => 0",
53- "١ => 1",
54- "٢ => 2",
55- "٣ => 3",
56- "٤ => 4",
57- "٥ => 5",
58- "٦ => 6",
59- "٧ => 7",
60- "٨ => 8",
61- "٩ => 9"
62- ]
63- }
64- }
29+ "tokenizer": "keyword",
30+ "char_filter": [
31+ {
32+ "type": "mapping",
33+ "mappings": [
34+ "٠ => 0",
35+ "١ => 1",
36+ "٢ => 2",
37+ "٣ => 3",
38+ "٤ => 4",
39+ "٥ => 5",
40+ "٦ => 6",
41+ "٧ => 7",
42+ "٨ => 8",
43+ "٩ => 9"
44+ ]
6545 }
66- }
67- }
68-
69- POST my_index/_analyze
70- {
71- "analyzer": "my_analyzer",
46+ ],
7247 "text": "My license plate is ٢٥٠١٥"
7348}
74- ----------------------------
49+ ----
50+
51+ The filter produces the following text:
7552
76- /////////////////////
53+ [source,text]
54+ ----
55+ [ My license plate is 25015 ]
56+ ----
7757
58+ ////
7859[source,console-result]
79- ----------------------------
60+ ----
8061{
8162 "tokens": [
8263 {
@@ -88,37 +69,58 @@ POST my_index/_analyze
8869 }
8970 ]
9071}
91- ----------------------------
72+ ----
73+ ////
9274
93- /////////////////////
75+ [[analysis-mapping-charfilter-configure-parms]]
76+ ==== Configurable parameters
9477
78+ `mappings`::
79+ (Required*, array of strings)
80+ Array of mappings, with each element having the form `key => value`.
81+ +
82+ Either this or the `mappings_path` parameter must be specified.
9583
96- The above example produces the following term:
84+ `mappings_path`::
85+ (Required*, string)
86+ Path to a file containing `key => value` mappings.
87+ +
88+ This path must be absolute or relative to the `config` location, and the file
89+ must be UTF-8 encoded. Each mapping in the file must be separated by a line
90+ break.
91+ +
92+ Either this or the `mappings` parameter must be specified.
9793
98- [source,text]
99- ---------------------------
100- [ My license plate is 25015 ]
101- ---------------------------
94+ [[analysis-mapping-charfilter-customize]]
95+ ==== Customize and add to an analyzer
96+
97+ To customize the `mappings` filter, duplicate it to create the basis for a new
98+ custom character filter. You can modify the filter using its configurable
99+ parameters.
102100
103- Keys and values can be strings with multiple characters. The following
104- example replaces the `:)` and `:(` emoticons with a text equivalent:
101+ The following <<indices-create-index,create index API>> request
102+ configures a new <<analysis-custom-analyzer,custom analyzer>> using a custom
103+ `mappings` filter, `my_mappings_char_filter`.
104+
105+ The `my_mappings_char_filter` filter replaces the `:)` and `:(` emoticons
106+ with a text equivalent.
105107
106108[source,console]
107- ----------------------------
108- PUT my_index
109+ ----
110+ PUT / my_index
109111{
110112 "settings": {
111113 "analysis": {
112114 "analyzer": {
113115 "my_analyzer": {
114116 "tokenizer": "standard",
115117 "char_filter": [
116- "my_char_filter "
118+ "my_mappings_char_filter "
117119 ]
118120 }
119121 },
120122 "char_filter": {
121- "my_char_filter ": {
123+ "my_mappings_char_filter ": {
122124 "type": "mapping",
123125 "mappings": [
124126 ":) => _happy_",
@@ -129,67 +131,43 @@ PUT my_index
129131 }
130132 }
131133}
134+ ----
135+
136+ The following <<indices-analyze,analyze API>> request uses the custom
137+ `my_mappings_char_filter` to replace `:(` with `_sad_` in
138+ the text `I'm delighted about it :(`.
132139
133- POST my_index/_analyze
140+ [source,console]
141+ ----
142+ GET /my_index/_analyze
134143{
135- "analyzer": "my_analyzer",
144+ "tokenizer": "keyword",
145+ "char_filter": [ "my_mappings_char_filter" ],
136146 "text": "I'm delighted about it :("
137147}
138- ----------------------------
148+ ----
149+ // TEST[continued]
139150
151+ The filter produces the following text:
140152
141- /////////////////////
153+ [source,text]
154+ ---------------------------
155+ [ I'm delighted about it _sad_ ]
156+ ---------------------------
142157
158+ ////
143159[source,console-result]
144- ----------------------------
160+ ----
145161{
146162 "tokens": [
147163 {
148- "token": "I'm",
164+ "token": "I'm delighted about it _sad_ ",
149165 "start_offset": 0,
150- "end_offset": 3,
151- "type": "<ALPHANUM>",
152- "position": 0
153- },
154- {
155- "token": "delighted",
156- "start_offset": 4,
157- "end_offset": 13,
158- "type": "<ALPHANUM>",
159- "position": 1
160- },
161- {
162- "token": "about",
163- "start_offset": 14,
164- "end_offset": 19,
165- "type": "<ALPHANUM>",
166- "position": 2
167- },
168- {
169- "token": "it",
170- "start_offset": 20,
171- "end_offset": 22,
172- "type": "<ALPHANUM>",
173- "position": 3
174- },
175- {
176- "token": "_sad_",
177- "start_offset": 23,
178166 "end_offset": 25,
179- "type": "<ALPHANUM> ",
180- "position": 4
167+ "type": "word ",
168+ "position": 0
181169 }
182170 ]
183171}
184- ----------------------------
185-
186-
187- /////////////////////
188-
189-
190- The above example produces the following terms:
191-
192- [source,text]
193- ---------------------------
194- [ I'm, delighted, about, it, _sad_ ]
195- ---------------------------
172+ ----
173+ ////
0 commit comments