Synthetic _source: support ignore_above#89466
Conversation
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
|
Hi @nik9000, I've created a changelog YAML for you. |
|
Pinging @elastic/es-search (Team:Search) |
This allows you to use `ignore_above` with `keyword` fields in synthetic source. Ignored values are stored in a "backup" stored field and added to the end of the list of results. This makes `ignore_above` work pretty much the same way as it does when you don't have synthetic source. The only difference is the order of the results. But synthetic source changes the order of results anyway. That should be fine.
| query: | ||
| ids: | ||
| values: [1] | ||
| - is_false: hits.hits.0.fields |
There was a problem hiding this comment.
There's a different bug here that I'm fixing along side ignore_above but I only noticed it when I was working on this. Previously we were adding all stored fields we loaded to the fetched field list. Now we only add if you ask for them.
| return new CustomFieldsVisitor(sourceLoader.requiredStoredFields(), true); | ||
| // add the stored fields needed to load the source mapping to an empty set so they aren't returned | ||
| sourceLoader.requiredStoredFields().forEach(fieldName -> storedToRequestedFields.putIfAbsent(fieldName, Set.of())); | ||
| return new CustomFieldsVisitor(storedToRequestedFields.keySet(), true); |
There was a problem hiding this comment.
This change fixes that leaking fields in the search response. It's tricky!
| MappedFieldType fieldType = fieldTypeLookup.apply(entry.getKey()); | ||
| if (fieldType == null) { | ||
| continue; // TODO this is lame | ||
| } |
There was a problem hiding this comment.
I don't like adding such looseness. I'm sure there is a better way than this, but I couldn't think of something off the bat and figured we'd be refactoring here before too long anyway. So I wanted to see what other folks think.
I don't particularly like that this valueForDisplay thing is mixed up into the Lucene loading. That feels like it should be something we take care of externally but I have no idea how mixed up it is with other things.
There was a problem hiding this comment.
Yeah this really should be pulled out elsewhere, but I think I'm fine with it staying in here for the time being.
There was a problem hiding this comment.
That's what I was thinking. I don't want us to start relying on this null-ok behavior in other places. I don't particularly want to build some fancy abstraction for this either. Any ideas on something a little more defensive than this but not super overkill if we're going to rip it out?
|
Note for those following along at home - this doesn't really effect the performance of loading synthetic source any - even if there are many, many fields, all of which have |
romseygeek
left a comment
There was a problem hiding this comment.
Thanks, I think this makes sense. I have a question around what to do if the field is already being stored.
| context.addIgnoredField(name()); | ||
| if (context.isSyntheticSource()) { | ||
| // Save a copy of the field so synthetic source can load it | ||
| context.doc().add(new Field(originalName(), value, ORIGINAL_FIELD_TYPE)); |
There was a problem hiding this comment.
You can just add a new StoredField here, no need to create a new FieldType
There was a problem hiding this comment.
If stored=true is set on here then we store it twice, right? Is it worth detecting that case and deferring to the 'normal' stored field rather than always using the hidden field?
There was a problem hiding this comment.
If I did that then fields longer than the limit would start being loadable via stored fields. That feels like something folks could accidentally rely on.
There was a problem hiding this comment.
To be clear - I don't think we store it twice with the code here.
| } | ||
|
|
||
| private interface Values { | ||
| private interface DocValues { |
There was a problem hiding this comment.
DocumentValues so that we don't get the name collision?
There was a problem hiding this comment.
The point is that they wrap the doc values interface. Not sure if DocumentValues is as descriptive. DocValuesValues or ColumnarValues or something?
| MappedFieldType fieldType = fieldTypeLookup.apply(entry.getKey()); | ||
| if (fieldType == null) { | ||
| continue; // TODO this is lame | ||
| } |
There was a problem hiding this comment.
Yeah this really should be pulled out elsewhere, but I think I'm fine with it staying in here for the time being.
| return new CustomFieldsVisitor(sourceLoader.requiredStoredFields(), true); | ||
| // add the stored fields needed to load the source mapping to an empty set so they aren't returned | ||
| sourceLoader.requiredStoredFields().forEach(fieldName -> storedToRequestedFields.putIfAbsent(fieldName, Set.of())); | ||
| return new CustomFieldsVisitor(storedToRequestedFields.keySet(), true); |
Adds more tests for the enrich processor around different index types. Right now they all work fine (yay!) but this feels like a good amount of paranoia.
|
Ready for you again @romseygeek ! |
| - match: | ||
| _source: | ||
| kwd: foo | ||
| # - is_false: fields TODO fix me |
There was a problem hiding this comment.
Is this something separate?
There was a problem hiding this comment.
It's the same problem as we have on fetch! I just noticed it and didn't want to forget it. I thought maybe I'd fix it like I did the fetch one. But it was a little more complex so I didn't. But I'll grab it soon!
| if (loadSource) { | ||
| if (false == sourceLoader.requiredStoredFields().isEmpty()) { | ||
| // add the stored fields needed to load the source mapping to an empty set so they aren't returned | ||
| sourceLoader.requiredStoredFields().forEach(fieldName -> storedToRequestedFields.putIfAbsent(fieldName, Set.of())); |
There was a problem hiding this comment.
Ooh sneaky. Yes, we should definitely make this less hacky...
This allows you to use
ignore_abovewithkeywordfields in syntheticsource. Ignored values are stored in a "backup" stored field and added
to the end of the list of results. This makes
ignore_abovework prettymuch the same way as it does when you don't have synthetic source. The
only difference is the order of the results. But synthetic source
changes the order of results anyway. That should be fine.