-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Closed
Labels
:Distributed/Ingest NodeExecution or management of Ingest PipelinesExecution or management of Ingest Pipelines:Search Foundations/MappingIndex mappings, including merging and defining field typesIndex mappings, including merging and defining field types:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>enhancementTeam:Data Management (obsolete)DO NOT USE. This team no longer exists.DO NOT USE. This team no longer exists.Team:Search FoundationsMeta label for the Search Foundations team in ElasticsearchMeta label for the Search Foundations team in ElasticsearchTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearchteam-discuss
Description
Currently it seems difficult for users that are not completely in control of the data they ingest into a keyword field to truncate those values (see #57984).
Lucene enforces a maximum term length of 32766 which, when exceeded, causes a rejection of the indexed document, so a user reading e.g. from a database with values out of control needs to somehow prevent this.
Here are some things that don't immediately work:
- using 'length' or 'truncate' token filters isn't currently allowed in keyword normalizers
- using the keyword fields 'ignore_above' option will prevent the document from being rejected, but will also ignore those values completely if it otherwise would be okay to save the truncated versions and e.g. sort on them
Using a 'script' ingest processor for truncation seems like a viable, but not the easiest option.
I'm opening this issue to discuss the following options:
- should we allow at least the 'truncate' token filter in normalizers?
- should we add a keyword field option that safely truncates input values?
- maybe this would also be a reason to introduce a decicated ' truncate' ingest processor that's easier to use than the script?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
:Distributed/Ingest NodeExecution or management of Ingest PipelinesExecution or management of Ingest Pipelines:Search Foundations/MappingIndex mappings, including merging and defining field typesIndex mappings, including merging and defining field types:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>enhancementTeam:Data Management (obsolete)DO NOT USE. This team no longer exists.DO NOT USE. This team no longer exists.Team:Search FoundationsMeta label for the Search Foundations team in ElasticsearchMeta label for the Search Foundations team in ElasticsearchTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearchteam-discuss