-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[Meta] Better handling of single-valued fields #80825
Copy link
Copy link
Open
Labels
:Search Foundations/MappingIndex mappings, including merging and defining field typesIndex mappings, including merging and defining field types>enhancementTeam:Search FoundationsMeta label for the Search Foundations team in ElasticsearchMeta label for the Search Foundations team in Elasticsearch
Description
Background
For a long time elasticsearch has been very permissive about JSON documents and has made no distinction between single values and arrays of values. This permissive approach has several downsides:
- Client code and scripts are made more complex. To be robust, code must be written to handle both single-valued fields and arrays of fields.
- Kibana does some strange things. e.g. Kibana will happily try "AND" multiple values from a bar chart/pie chart which never makes sense for values taken from a single-valued field. This produces no matches because no document can be
OS:iosandOS:androidsimultaneously - Administrators cannot easily "lock down" the mapping. Custom ingest scripts are required to prevent multi-valued documents being added (and ingest scripts can still be circumvented by clients sending documents?).
All of the above is unfortunate because the majority of fields in common use are single-valued. A weblog's fields are a good example (timestamp, IP, OS, user agent, URL, referrer, country etc are all single values).
Proposed changes
The solution is a 2-pronged approach :
Enforcement: for new indices we can give administrators the option of rejecting documents with multiple-values.
Reporting: for both new and old indices we can report if the index contains only documents with single values
- Add an
is_single_valuedflag to field caps output which indicates if all documents have single values for a field Field caps api - report back if fields are single-valued or not. #80730 - Add a
boolean allowsMultipleValues()method to FieldMapper and remove existing validation code in single-valued fields that is slow. The DocumentParser class should instead assume responsibility for checking single-valued fields don't receive multiple values - Add an
allow_multiple_valuesflag to field mappings that can reject documents presenting arrays New field mapping flag - allow_multiple_values #80289 - Remove existing code from always-singular fields like AggregateDoubleMetricFieldMapper that checks for arrays. This logic is sometimes slow and these classes can instead override FieldMapper.allowMultipleValues() to declare false and let DocumentParser do all the array detection/rejection.
- Optimise performance of the field-caps reporting to avoid looking at index contents when the
allow_multiple_valuesfield mapping is set and we know this is enforced at ingest time - Consider enhancing the storage types used for fields where
allow_multiple_valuesis set to false (using NumericDocValuesField instead of SortedNumericDocValuesField and SortedDocValuesField instead of SortedSetDocValuesField) - Change ECS to support single-valued fields (RFC opened https://github.com/elastic/ecs/blob/main/rfcs/text/0029-enforce-single-value-fields.md )
- Any Kibana-related changes to make use of the
is_single_valuedfeedback in field-caps (e.g. not ANDing values from this field in filter pills). Mention of related progress here
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
:Search Foundations/MappingIndex mappings, including merging and defining field typesIndex mappings, including merging and defining field types>enhancementTeam:Search FoundationsMeta label for the Search Foundations team in ElasticsearchMeta label for the Search Foundations team in Elasticsearch
Type
Fields
Give feedbackNo fields configured for issues without a type.