Skip to content

Make truncation of keyword field values easier #60329

@cbuescher

Description

@cbuescher

Currently it seems difficult for users that are not completely in control of the data they ingest into a keyword field to truncate those values (see #57984).
Lucene enforces a maximum term length of 32766 which, when exceeded, causes a rejection of the indexed document, so a user reading e.g. from a database with values out of control needs to somehow prevent this.

Here are some things that don't immediately work:

  • using 'length' or 'truncate' token filters isn't currently allowed in keyword normalizers
  • using the keyword fields 'ignore_above' option will prevent the document from being rejected, but will also ignore those values completely if it otherwise would be okay to save the truncated versions and e.g. sort on them

Using a 'script' ingest processor for truncation seems like a viable, but not the easiest option.

I'm opening this issue to discuss the following options:

  • should we allow at least the 'truncate' token filter in normalizers?
  • should we add a keyword field option that safely truncates input values?
  • maybe this would also be a reason to introduce a decicated ' truncate' ingest processor that's easier to use than the script?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions