Synthetic source: load text from stored fields by nik9000 · Pull Request #87480 · elastic/elasticsearch

nik9000 · 2022-06-07T19:28:55Z

Adds support for loading text and keyword fields that have
store: true. We could likely load any stored fields, but I
wanted to blaze the trail using something fairly useful.

Adds support for loading `text` fields that have `store: true`. We could likely load *any* stored fields, but I wanted to blaze the trail using something fairly useful.

nik9000 · 2022-06-07T19:29:12Z

Force push incoming to resolve merge conflicts.

nik9000 · 2022-06-09T14:40:44Z

cloud deploy robot, please build me an image

nik9000 · 2022-06-09T17:35:33Z

run elasticsearch-ci/part-2

nik9000 · 2022-07-27T20:11:05Z

server/src/main/java/org/elasticsearch/index/mapper/SourceLoader.java

             * Write values for this document.
             */
-            void write(XContentBuilder b) throws IOException;
+            void write(FieldsVisitor fieldsVisitor, XContentBuilder b) throws IOException;


I don't think I actually need fieldsVisitor here - I think advanceToDoc can grab it.

Yep. I wonder if we can avoid having it as a parameter in any of these methods and instead pass it StoredFieldSourceLoader implementations directly? Having a method param that is only used by a specific subset of implementations feels off to me.

Like in the ctor?

I could move it to the leaf method pretty easily. But it's kind of tricky because you have to advance the state in a specific way. And holding on to a reference to the thing for a while feels like it is more "at a distance". Like, we take a docId as a parameter, but we only use it if we're using doc values.

wandergeek · 2022-08-10T22:47:08Z

@elasticmachine retest this please

nik9000 · 2022-08-15T15:46:35Z

@romseygeek i think this is ready for another round when you are ready for it!

romseygeek

I like the API! I left a few questions.

romseygeek · 2022-08-16T12:36:37Z

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java

-                "field [" + name() + "] of type [" + typeName() + "] doesn't support synthetic source because it doesn't have doc values"
-            );
-        }
        if (fieldType().ignoreAbove() != Defaults.IGNORE_ABOVE) {


Does ignore_above not work if stored=true?

It doesn't store the field if it is above ignore_above.

romseygeek · 2022-08-16T12:40:35Z

...src/main/java/org/elasticsearch/index/mapper/SortedNumericDocValuesSyntheticFieldLoader.java

+public abstract class SortedNumericDocValuesSyntheticFieldLoader implements SourceLoader.SyntheticFieldLoader {
+    private final String name;
+    private final String simpleName;
+    private CheckedConsumer<XContentBuilder, IOException> writer = b -> {};


This reads a bit weirdly to me, does it make more sense to leave write as abstract and just overload it in the two implementations?

Those are per-segment writers. I'll see if I can make it less janky.

romseygeek · 2022-08-16T12:41:18Z

server/src/main/java/org/elasticsearch/index/mapper/SortedSetDocValuesSyntheticFieldLoader.java

+
+    private final String name;
+    private final String simpleName;
+    private CheckedConsumer<XContentBuilder, IOException> writer = b -> {};


server/src/main/java/org/elasticsearch/index/mapper/SourceLoader.java

romseygeek · 2022-08-16T12:45:57Z

server/src/main/java/org/elasticsearch/index/mapper/TextFieldMapper.java

-                    && kwd.hasNormalizer() == false
-                    && kwd.fieldType().ignoreAbove() == KeywordFieldMapper.Defaults.IGNORE_ABOVE) {
+                if (kwd.hasNormalizer() == false
+                    && kwd.fieldType().ignoreAbove() == KeywordFieldMapper.Defaults.IGNORE_ABOVE


Should this work with ignore_above=true and stored=true on the keyword subfield?

Same deal. We don't store the field if it is above ignore_above.

nik9000 · 2022-08-16T20:16:26Z

@romseygeek , I pushed a patch to remove the weird:

private CheckedConsumer<XContentBuilder, IOException> writer = b -> {};

thing. I think it's more like what we want when we want to support ignore_above as well. And I think it's more readable. Have a look!

nik9000 · 2022-08-16T20:39:20Z

run elasticsearch-ci/bwc

romseygeek

I think there are more cleanups to do around stored field loading, but this is a great start. Thanks for all the iterations!

nik9000 · 2022-08-17T14:18:03Z

I think there are more cleanups to do around stored field loading, but this is a great start. Thanks for all the iterations!

Woooh! Thanks for all the iterations too. I think we got something much nicer through them.

nik9000 · 2022-08-17T14:19:07Z

I'll work on adding some docs for this after I cover ignore_above. The words will merge conflict otherwise.

When I added support for stored fields to synthetic _source (elastic#87480) I accidentally caused a performance regression. Our friends working on building the nightly charts for tsdb caught it. It looked like: ``` | 50th percentile latency | default_1k | 20.1228 | 41.289 | 21.1662 | ms | +105.18% | | 90th percentile latency | default_1k | 26.7402 | 42.5878 | 15.8476 | ms | +59.27% | | 99th percentile latency | default_1k | 37.0881 | 45.586 | 8.49786 | ms | +22.91% | | 99.9th percentile latency | default_1k | 43.7346 | 48.222 | 4.48742 | ms | +10.26% | | 100th percentile latency | default_1k | 46.057 | 56.8676 | 10.8106 | ms | +23.47% | ``` This fixes the regression and puts us in line with how we were: ``` | 50th percentile latency | default_1k | 20.1228 | 24.023 | 3.90022 | ms | +19.38% | | 90th percentile latency | default_1k | 26.7402 | 29.7841 | 3.04392 | ms | +11.38% | | 99th percentile latency | default_1k | 37.0881 | 36.8038 | -0.28428 | ms | -0.77% | | 99.9th percentile latency | default_1k | 43.7346 | 39.0192 | -4.71531 | ms | -10.78% | | 100th percentile latency | default_1k | 46.057 | 42.9181 | -3.13889 | ms | -6.82% | ``` A 20% bump in the 50% latency isn't great, but it four microseconds per document which is acceptable.

When I added support for stored fields to synthetic _source (#87480) I accidentally caused a performance regression. Our friends working on building the nightly charts for tsdb caught it. It looked like: ``` | 50th percentile latency | default_1k | 20.1228 | 41.289 | 21.1662 | ms | +105.18% | | 90th percentile latency | default_1k | 26.7402 | 42.5878 | 15.8476 | ms | +59.27% | | 99th percentile latency | default_1k | 37.0881 | 45.586 | 8.49786 | ms | +22.91% | | 99.9th percentile latency | default_1k | 43.7346 | 48.222 | 4.48742 | ms | +10.26% | | 100th percentile latency | default_1k | 46.057 | 56.8676 | 10.8106 | ms | +23.47% | ``` This fixes the regression and puts us in line with how we were: ``` | 50th percentile latency | default_1k | 20.1228 | 24.023 | 3.90022 | ms | +19.38% | | 90th percentile latency | default_1k | 26.7402 | 29.7841 | 3.04392 | ms | +11.38% | | 99th percentile latency | default_1k | 37.0881 | 36.8038 | -0.28428 | ms | -0.77% | | 99.9th percentile latency | default_1k | 43.7346 | 39.0192 | -4.71531 | ms | -10.78% | | 100th percentile latency | default_1k | 46.057 | 42.9181 | -3.13889 | ms | -6.82% | ``` A 20% bump in the 50% latency isn't great, but it four microseconds per document which is acceptable.

nik9000 added 2 commits June 7, 2022 15:21

Synthetic source: load text from stored fields

a0fb6c2

Adds support for loading `text` fields that have `store: true`. We could likely load *any* stored fields, but I wanted to blaze the trail using something fairly useful.

Merge branch 'master' into synthetic_source_stored_1

0bc549f

nik9000 marked this pull request as draft June 7, 2022 19:28

elasticsearchmachine added the v8.4.0 label Jun 7, 2022

This was referenced Jun 7, 2022

Synthetic Source #86603

Closed

Docs for synthetic source #87416

Merged

nik9000 added >non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics cloud-deploy Publish cloud docker image for Cloud-First-Testing labels Jun 7, 2022

Merge branch 'master' into synthetic_source_stored_1

3b1e84c

elasticsearchmachine changed the base branch from master to main July 22, 2022 23:06

nik9000 mentioned this pull request Jul 26, 2022

Synthetic source: Support ignore_above for keyword fields #88826

Closed

nik9000 added 3 commits July 26, 2022 17:30

Merge branch 'main' into synthetic_source_stored_1

f6adea8

Finish merge

23c9bee

foo

e43738c

mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022

nik9000 added 2 commits July 27, 2022 13:39

Merge branch 'main' into synthetic_source_stored_1

168e115

Woah this was important

95968ca

nik9000 commented Jul 27, 2022

View reviewed changes

nik9000 added 4 commits July 27, 2022 16:14

Merge branch 'main' into synthetic_source_stored_1

06d1543

Merge branch 'main' into synthetic_source_stored_1

20f6752

Come on

658e6a8

Merge branch 'main' into synthetic_source_stored_1

af35023

csoulios self-requested a review August 1, 2022 14:35

nik9000 requested a review from romseygeek August 1, 2022 15:15

nik9000 added 5 commits August 4, 2022 12:19

Fixup

43b7bae

Fixup

3212c70

Words

0104b8f

Spotless

69b3725

TODONE

3850fcd

wandergeek added cloud-deploy Publish cloud docker image for Cloud-First-Testing and removed cloud-deploy Publish cloud docker image for Cloud-First-Testing labels Aug 10, 2022

nik9000 mentioned this pull request Aug 15, 2022

[Fleet] [Meta] Support for time series indexing, doc-value-only fields, and synthetic source elastic/kibana#132818

Closed

14 tasks

nik9000 added 2 commits August 15, 2022 11:07

Merge branch 'main' into synthetic_source_stored_1

47d2277

Merge

e8ca30e

nik9000 requested a review from romseygeek August 15, 2022 15:46

Message

531ca17

romseygeek reviewed Aug 16, 2022

View reviewed changes

nik9000 added 3 commits August 16, 2022 11:48

Merge branch 'main' into synthetic_source_stored_1

360dd89

Words

414f3a1

Maybe this is better?

0c51599

About this?

a81d20e

romseygeek approved these changes Aug 17, 2022

View reviewed changes

nik9000 merged commit 79a8979 into elastic:main Aug 17, 2022

nik9000 mentioned this pull request Aug 18, 2022

Formalize dual text/keyword mappings #53181

Open

nik9000 mentioned this pull request Aug 24, 2022

Speed up synthetic source again #89600

Merged

Conversation

nik9000 commented Jun 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nik9000 commented Jun 7, 2022

Uh oh!

nik9000 commented Jun 9, 2022

Uh oh!

nik9000 commented Jun 9, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wandergeek commented Aug 10, 2022

Uh oh!

nik9000 commented Aug 15, 2022

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Aug 16, 2022

Uh oh!

nik9000 commented Aug 16, 2022

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Aug 17, 2022

Uh oh!

nik9000 commented Aug 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nik9000 commented Jun 7, 2022 •

edited

Loading