Skip to content

[Meta] Better handling of single-valued fields #80825

@markharwood

Description

@markharwood

Background

For a long time elasticsearch has been very permissive about JSON documents and has made no distinction between single values and arrays of values. This permissive approach has several downsides:

  1. Client code and scripts are made more complex. To be robust, code must be written to handle both single-valued fields and arrays of fields.
  2. Kibana does some strange things. e.g. Kibana will happily try "AND" multiple values from a bar chart/pie chart which never makes sense for values taken from a single-valued field. This produces no matches because no document can be OS:ios and OS:android simultaneously
  3. Administrators cannot easily "lock down" the mapping. Custom ingest scripts are required to prevent multi-valued documents being added (and ingest scripts can still be circumvented by clients sending documents?).

All of the above is unfortunate because the majority of fields in common use are single-valued. A weblog's fields are a good example (timestamp, IP, OS, user agent, URL, referrer, country etc are all single values).

Proposed changes

The solution is a 2-pronged approach :
Enforcement: for new indices we can give administrators the option of rejecting documents with multiple-values.
Reporting: for both new and old indices we can report if the index contains only documents with single values

  • Add an is_single_valued flag to field caps output which indicates if all documents have single values for a field Field caps api - report back if fields are single-valued or not. #80730
  • Add a boolean allowsMultipleValues() method to FieldMapper and remove existing validation code in single-valued fields that is slow. The DocumentParser class should instead assume responsibility for checking single-valued fields don't receive multiple values
  • Add an allow_multiple_values flag to field mappings that can reject documents presenting arrays New field mapping flag - allow_multiple_values #80289
  • Remove existing code from always-singular fields like AggregateDoubleMetricFieldMapper that checks for arrays. This logic is sometimes slow and these classes can instead override FieldMapper.allowMultipleValues() to declare false and let DocumentParser do all the array detection/rejection.
  • Optimise performance of the field-caps reporting to avoid looking at index contents when the allow_multiple_values field mapping is set and we know this is enforced at ingest time
  • Consider enhancing the storage types used for fields where allow_multiple_values is set to false (using NumericDocValuesField instead of SortedNumericDocValuesField and SortedDocValuesField instead of SortedSetDocValuesField)
  • Change ECS to support single-valued fields (RFC opened https://github.com/elastic/ecs/blob/main/rfcs/text/0029-enforce-single-value-fields.md )
  • Any Kibana-related changes to make use of the is_single_valued feedback in field-caps (e.g. not ANDing values from this field in filter pills). Mention of related progress here

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions