Skip to content

Better storage of _source #9034

@jpountz

Description

@jpountz

Today we store the _source is a single big binary stored field. While this is great for simplicity, this also has the bad side-effect to encourage to store fields individually in order to save some json parsing when there are a couple of large field values and we are only interested in some short values. Maybe we could try to be a bit smarter and store the _source across several stored fields so that it would not be an issue anymore?

Random idea: given a document that looks like:

{
  "title": "short_string",
  "body": "very_very_very_very_long_string",
  "array": [2, 3, 10],
  "foo": {
    "foo": 42,
    "bar": "baz"
  }
}

we could for instance store all the top-level fields into their own stored field

Field Values
title "short_string"
body "very_very_very_very_long_string"
array [2, 3, 10]
foo {"foo": 42, "bar": "baz"}

or maybe even each value individually (but it becomes more complicated with arrays of objects):

Field Values
title "short_string"
body "very_very_very_very_long_string"
array [2, 3, 10]
foo.foo 42
foo.bar "baz"

Then we would have to make _source filtering aware of the way fields are stored, and for instance if we store only top-level fields into their own stored field then we could translate an include rule like foo.* to "retrieve field foo", and foo.bar.* to "get everything under bar for field foo".

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions