Skip to content

Support analyzer for keyword type #18064

@dadoonet

Description

@dadoonet

Sometimes you want to analyze text to make it consistent when running aggregations on top of it.

For example, let's say I have a city field mapped as a keyword.

This field can contain San Francisco, SAN FRANCISCO, San francisco...

If I build a terms aggregation on top of it, I will end up with

San Francisco: 1
SAN FRANCISCO: 1
San francisco: 1

I'd like to be able to analyze this text before it gets indexed. Of course I could use a text field instead and set fielddata: true but that would not create doc values for this field.

I can imagine that we allow an analyzer at index time for this field.

We can restrict its usage if we wish and only allows analyzers which are using tokenizers like lowercase, keyword, path but I would let the user decide.

If we allow setting analyzer: simple for example, my aggregation will become:

san francisco: 3

Same applies for path tokenizer.

Let say I'm building a dir tree like:

/tmp/dir1/file1.txt
/tmp/dir1/file2.txt
/tmp/dir2/file3.txt
/tmp/dir2/file4.txt

Applying a path tokenizer would help me to generate an aggregation like:

/tmp/dir1: 2
/tmp/dir2: 2
/tmp: 4

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions