Support analyzer for keyword type

Sometimes you want to analyze text to make it consistent when running aggregations on top of it.

For example, let's say I have a `city` field mapped as a `keyword`.

This field can contain `San Francisco`, `SAN FRANCISCO`, `San francisco`...

If I build a terms aggregation on top of it, I will end up with

```
San Francisco: 1
SAN FRANCISCO: 1
San francisco: 1
```

I'd like to be able to analyze this text before it gets indexed. Of course I could use a `text` field instead and set `fielddata: true` but that would not create doc values for this field.

I can imagine that we allow an analyzer at index time for this field.

We can restrict its usage if we wish and only allows analyzers which are using tokenizers like `lowercase`, `keyword`, `path` but I would let the user decide.

If we allow setting `analyzer: simple` for example, my aggregation will become:

```
san francisco: 3
```

Same applies for path tokenizer.

Let say I'm building a dir tree like:

```
/tmp/dir1/file1.txt
/tmp/dir1/file2.txt
/tmp/dir2/file3.txt
/tmp/dir2/file4.txt
```

Applying a path tokenizer would help me to generate an aggregation like:

```
/tmp/dir1: 2
/tmp/dir2: 2
/tmp: 4
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support analyzer for keyword type #18064

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Support analyzer for keyword type #18064

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions