Synthetic _source: support ignore_above by nik9000 · Pull Request #89466 · elastic/elasticsearch

nik9000 · 2022-08-18T14:38:57Z

This allows you to use ignore_above with keyword fields in synthetic
source. Ignored values are stored in a "backup" stored field and added
to the end of the list of results. This makes ignore_above work pretty
much the same way as it does when you don't have synthetic source. The
only difference is the order of the results. But synthetic source
changes the order of results anyway. That should be fine.

elasticsearchmachine · 2022-08-18T14:39:21Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

elasticsearchmachine · 2022-08-18T14:39:21Z

Hi @nik9000, I've created a changelog YAML for you.

elasticsearchmachine · 2022-08-18T14:39:21Z

Pinging @elastic/es-search (Team:Search)

This allows you to use `ignore_above` with `keyword` fields in synthetic source. Ignored values are stored in a "backup" stored field and added to the end of the list of results. This makes `ignore_above` work pretty much the same way as it does when you don't have synthetic source. The only difference is the order of the results. But synthetic source changes the order of results anyway. That should be fine.

nik9000 · 2022-08-18T15:16:07Z

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search/400_synthetic_source.yml

          query:
            ids:
              values: [1]
+  - is_false: hits.hits.0.fields


There's a different bug here that I'm fixing along side ignore_above but I only noticed it when I was working on this. Previously we were adding all stored fields we loaded to the fetched field list. Now we only add if you ask for them.

nik9000 · 2022-08-18T15:49:08Z

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

-                    return new CustomFieldsVisitor(sourceLoader.requiredStoredFields(), true);
+                    // add the stored fields needed to load the source mapping to an empty set so they aren't returned
+                    sourceLoader.requiredStoredFields().forEach(fieldName -> storedToRequestedFields.putIfAbsent(fieldName, Set.of()));
+                    return new CustomFieldsVisitor(storedToRequestedFields.keySet(), true);


This change fixes that leaking fields in the search response. It's tricky!

Sneaky. Nice catch.

nik9000 · 2022-08-18T15:51:47Z

server/src/main/java/org/elasticsearch/index/fieldvisitor/FieldsVisitor.java

            MappedFieldType fieldType = fieldTypeLookup.apply(entry.getKey());
+            if (fieldType == null) {
+                continue; // TODO this is lame
+            }


I don't like adding such looseness. I'm sure there is a better way than this, but I couldn't think of something off the bat and figured we'd be refactoring here before too long anyway. So I wanted to see what other folks think.

I don't particularly like that this valueForDisplay thing is mixed up into the Lucene loading. That feels like it should be something we take care of externally but I have no idea how mixed up it is with other things.

Yeah this really should be pulled out elsewhere, but I think I'm fine with it staying in here for the time being.

That's what I was thinking. I don't want us to start relying on this null-ok behavior in other places. I don't particularly want to build some fancy abstraction for this either. Any ideas on something a little more defensive than this but not super overkill if we're going to rip it out?

nik9000 · 2022-08-18T15:55:59Z

Note for those following along at home - this doesn't really effect the performance of loading synthetic source any - even if there are many, many fields, all of which have ignore_above. It's in the microseconds to build the extra HashMap entries. If you are worrying about microseconds you want to keep the mapping small.

romseygeek

Thanks, I think this makes sense. I have a question around what to do if the field is already being stored.

romseygeek · 2022-08-23T09:03:58Z

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java

            context.addIgnoredField(name());
+            if (context.isSyntheticSource()) {
+                // Save a copy of the field so synthetic source can load it
+                context.doc().add(new Field(originalName(), value, ORIGINAL_FIELD_TYPE));


You can just add a new StoredField here, no need to create a new FieldType

If stored=true is set on here then we store it twice, right? Is it worth detecting that case and deferring to the 'normal' stored field rather than always using the hidden field?

If I did that then fields longer than the limit would start being loadable via stored fields. That feels like something folks could accidentally rely on.

To be clear - I don't think we store it twice with the code here.

romseygeek · 2022-08-23T09:04:58Z

server/src/main/java/org/elasticsearch/index/mapper/SortedSetDocValuesSyntheticFieldLoader.java

    }

-    private interface Values {
+    private interface DocValues {


DocumentValues so that we don't get the name collision?

The point is that they wrap the doc values interface. Not sure if DocumentValues is as descriptive. DocValuesValues or ColumnarValues or something?

romseygeek · 2022-08-23T11:21:56Z

server/src/main/java/org/elasticsearch/index/fieldvisitor/FieldsVisitor.java

            MappedFieldType fieldType = fieldTypeLookup.apply(entry.getKey());
+            if (fieldType == null) {
+                continue; // TODO this is lame
+            }


Yeah this really should be pulled out elsewhere, but I think I'm fine with it staying in here for the time being.

romseygeek · 2022-08-23T11:26:43Z

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

-                    return new CustomFieldsVisitor(sourceLoader.requiredStoredFields(), true);
+                    // add the stored fields needed to load the source mapping to an empty set so they aren't returned
+                    sourceLoader.requiredStoredFields().forEach(fieldName -> storedToRequestedFields.putIfAbsent(fieldName, Set.of()));
+                    return new CustomFieldsVisitor(storedToRequestedFields.keySet(), true);


Sneaky. Nice catch.

Adds more tests for the enrich processor around different index types. Right now they all work fine (yay!) but this feels like a good amount of paranoia.

nik9000 · 2022-08-31T17:37:06Z

Ready for you again @romseygeek !

romseygeek

LGTM

romseygeek · 2022-09-01T09:34:53Z

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/get/100_synthetic_source.yml

  - match:
      _source:
        kwd: foo
+#  - is_false: fields  TODO fix me


Is this something separate?

It's the same problem as we have on fetch! I just noticed it and didn't want to forget it. I thought maybe I'd fix it like I did the fetch one. But it was a little more complex so I didn't. But I'll grab it soon!

romseygeek · 2022-09-01T10:00:02Z

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

+            if (loadSource) {
+                if (false == sourceLoader.requiredStoredFields().isEmpty()) {
+                    // add the stored fields needed to load the source mapping to an empty set so they aren't returned
+                    sourceLoader.requiredStoredFields().forEach(fieldName -> storedToRequestedFields.putIfAbsent(fieldName, Set.of()));


Ooh sneaky. Yes, we should definitely make this less hacky...

nik9000 added >feature :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics v8.5.0 labels Aug 18, 2022

elasticsearchmachine added Team:Search Meta label for search team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Aug 18, 2022

nik9000 requested a review from romseygeek August 18, 2022 14:39

nik9000 added 3 commits August 18, 2022 10:41

Update docs/changelog/89466.yaml

c61eaa8

Cleaner?

f225e86

nik9000 commented Aug 18, 2022

View reviewed changes

nik9000 mentioned this pull request Aug 18, 2022

Update synthetic source config elastic/rally-tracks#300

Merged

nik9000 added 2 commits August 18, 2022 12:09

Fixup changelog

94450cf

More error message

c05b6b3

nik9000 mentioned this pull request Aug 18, 2022

Synthetic Source #86603

Closed

50 tasks

romseygeek reviewed Aug 23, 2022

View reviewed changes

nik9000 added 2 commits August 23, 2022 11:41

More tests for enrich processor

0d9fe2d

Adds more tests for the enrich processor around different index types. Right now they all work fine (yay!) but this feels like a good amount of paranoia.

Merge branch 'main' into synthetic_source_ignore_above_3

8352a2c

nik9000 mentioned this pull request Aug 23, 2022

[Fleet] [Meta] Support for time series indexing, doc-value-only fields, and synthetic source elastic/kibana#132818

Closed

14 tasks

nik9000 added 2 commits August 23, 2022 14:36

Rename

fbce3f7

Merge branch 'main' into synthetic_source_ignore_above_3

91b3b96

axw mentioned this pull request Aug 31, 2022

Enable synthetic source for metrics data streams elastic/apm-server#9010

Closed

nik9000 added 2 commits August 31, 2022 11:17

Merge branch 'main' into synthetic_source_ignore_above_3

15e4e2f

Update after merge

904a662

Done?

5e0caaf

romseygeek approved these changes Sep 1, 2022

View reviewed changes

kpollich mentioned this pull request Sep 1, 2022

[Fleet] Create UI for experimental indexing features elastic/kibana#139862

Closed

14 tasks

nik9000 merged commit 703571a into elastic:main Sep 1, 2022

felixbarny mentioned this pull request Sep 12, 2022

Synthetic _source: support ignore_malformed #90007

Closed

lkts mentioned this pull request Mar 14, 2024

Text fields are stored by default in TSDB indices #106338

Merged

Conversation

nik9000 commented Aug 18, 2022

Uh oh!

elasticsearchmachine commented Aug 18, 2022

Uh oh!

elasticsearchmachine commented Aug 18, 2022

Uh oh!

elasticsearchmachine commented Aug 18, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Aug 18, 2022

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Aug 31, 2022

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants