Skip to content

Remove support for 'external values' in document parsing? #56063

@jtibshirani

Description

@jtibshirani

For fields whose value can be a JSON array or object like geopoints, the standard way of supporting multi-fields doesn’t work. When parsing a document, the parent mapper consumes the whole complex value, so when the parser is passed on to its subfields, they aren't able to parse the source value. To help support this case, parent fields can pass a parsed object through ParseContext#externalValue. Subfields can then use this parsed object instead of parsing the source value.

This support for 'external values' has a few downsides:

  • It makes it harder to understand what data ended up getting indexed for a field, since a parent field is allowed to supply literally anything as an 'external value'. This in turn makes it hard to consult the _source to find field values (as we hope to do in Search 'fields' option design + implementation #55363).
  • In FieldMapper#parseCreateField, each field mapper must check whether an external value is set, regardless of whether it makes sense to pass custom values to the mapper. Making sure to check for an external value is a silent contract and is easy to forget.

I wanted to raise the idea of removing support for external values. Within our own code, here are the mappers that would be affected:

  • ParentJoinFieldMapper passes the IDs to its internal ParentIdFieldMapper objects. This could be replaced by a custom method on ParentIdFieldMapper.
  • PercolatorFieldMapper passes the query builder to an internal BinaryFieldMapper. This could also be replaced by a custom method.
  • CompletionFieldMapper passes a parsed completion object to its sub-fields (Completion types with multi-fields support #34081). I wonder how important it is to support multi-fields for completion mappings, or if we could just drop support.
  • GeoPointFieldMapper passes a geohash to its sub-fields. This allows it to have multi-fields that are geopoints or even keywords. I also wonder about the use cases + importance of this functionality.

It looks like external values were also used in the attachment mapper, but this mapper was removed and migrated to an ingest processor.

Here is a proposed process for removing this support

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions