Make truncation of keyword field values easier

Currently it seems difficult for users that are not completely in control of the data they ingest into a keyword field to truncate those values (see https://github.com/elastic/elasticsearch/issues/57984).
Lucene enforces a maximum term length of 32766 which, when exceeded, causes a rejection of the indexed document, so a user reading e.g. from a database with values out of control needs to somehow prevent this.

Here are some things that don't immediately work:
* using 'length' or 'truncate' token filters isn't currently allowed in keyword normalizers
* using the keyword fields 'ignore_above' option will prevent the document from being rejected, but will also ignore those values completely if it otherwise would be okay to save the truncated versions and e.g. sort on them

Using a 'script' ingest processor for truncation seems like a viable, but not the easiest option. 

I'm opening this issue to discuss the following options:
* should we allow at least the 'truncate' token filter in normalizers?
* should we add a keyword field option that safely truncates input values?
* maybe this would also be a reason to introduce a decicated ' truncate' ingest processor that's easier to use than the script?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make truncation of keyword field values easier #60329

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make truncation of keyword field values easier #60329

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions