Add fetch fields support for runtime fields#60775
Add fetch fields support for runtime fields#60775nik9000 wants to merge 13 commits intoelastic:feature/runtime_fieldsfrom
Conversation
This adds support to the `fields` fetch phase for runtime fields. To do so it reworks the mechanism that the fetch phase uses to "prepare" to fetch fields which is much more compatible with runtime fields. Rather than implement fetching directly by running the script I chose to implement fetching from doc values, which in turn runs the script. This allowed me to reuse the doc values fetching code from the doc values fetching phase which bought me a few tests "automatically".
|
Pinging @elastic/es-search (:Search/Search) |
| if (sourceValue == null) { | ||
| return List.of(); | ||
| } | ||
| public abstract ValueFetcher valueFetcher(SearchLookup lookup, @Nullable String format); |
There was a problem hiding this comment.
I decided to go "leaf by leaf" here because that is the kind of iterations that you'd need to read doc values runtime fields. Another option would be to make fetch take both the LeafReaderContext and the docId and have the doc values and runtime fields based implementations build their own leaf readers. That feels worse to me but it is certainly an option.
| DocValueFormat dvFormat = fieldType().docValueFormat(format, null); | ||
| IndexFieldData<?> fd = lookup.doc().getForField(fieldType()); | ||
| return ctx -> fd.load(ctx).buildFetcher(dvFormat); | ||
| } |
There was a problem hiding this comment.
I decided to implement fetching runtime fields by pointing them at the doc values. This gave me a convenient excuse to remove most of the instanceofs in FetchDocValuesPhase while implementing my feature. As a bonus all of the tests for fetching doc values cover this.
There was a problem hiding this comment.
I can see why this was convenient. It also opens up the possibility retrieving doc values in the fetch fields phase (even outside of runtime fields).
I wonder if runtime fields will ever want to make a different choice about how to execute if they're called in fetch vs. aggs? For example if a script refers to a value that's present both in the _source and doc values, we could choose to load from _source just for fetch. I'm just brainstorming, it seems unlikely such an optimization will be important.
There was a problem hiding this comment.
I do imagine a day where we can "fake out" doc values from _source, but I think we'd do that by building a "funny SearchLookup".
| return true; | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
This doesn't feel like a great place for these test methods. We do have a FieldMapperTestCase but we don't use it consistently.
modules/parent-join/src/main/java/org/elasticsearch/join/mapper/ParentJoinFieldMapper.java
Outdated
Show resolved
Hide resolved
| return sb.toString(); | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
This one doesn't extend from ESSingleNodeTestCase! We probably should stop all of them from extending from it, but that is a job for another day.
| throw new UnsupportedOperationException(); | ||
| public ValueFetcher valueFetcher(SearchLookup lookup, String format) { | ||
| return docValuesFetcher(lookup, format); | ||
| } |
There was a problem hiding this comment.
After all that, this is the entire implementation! Everything else is shared and it all just works by virtue of supporting doc values.
There was a problem hiding this comment.
I didn't add any unit tests for this because we have unit tests for doc values. I did add a few integration tests just to make sure everything is plugged in.
|
@elasticmachine update branch |
8968ca3 to
41fa35b
Compare
|
@jtibshirani I've merged |
jtibshirani
left a comment
There was a problem hiding this comment.
I did an initial pass. It's exciting to see this come together! I'm assuming that after a couple rounds of review, some refactors will be pulled into master for easier maintainability?
x-pack/plugin/src/test/resources/rest-api-spec/test/runtime_fields/10_keyword.yml
Outdated
Show resolved
Hide resolved
x-pack/plugin/src/test/resources/rest-api-spec/test/runtime_fields/30_double.yml
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/RangeFieldMapper.java
Show resolved
Hide resolved
| DocValueFormat dvFormat = fieldType().docValueFormat(format, null); | ||
| IndexFieldData<?> fd = lookup.doc().getForField(fieldType()); | ||
| return ctx -> fd.load(ctx).buildFetcher(dvFormat); | ||
| } |
There was a problem hiding this comment.
I can see why this was convenient. It also opens up the possibility retrieving doc values in the fetch fields phase (even outside of runtime fields).
I wonder if runtime fields will ever want to make a different choice about how to execute if they're called in fetch vs. aggs? For example if a script refers to a value that's present both in the _source and doc values, we could choose to load from _source just for fetch. I'm just brainstorming, it seems unlikely such an optimization will be important.
|
|
||
| /** | ||
| * Given access to a document's _source, return this field's values. | ||
| * Build a {@linkplain ValueFetcher} to fetch values for the fields fetch api. |
There was a problem hiding this comment.
I'm getting worried that FieldMapper is growing too large and hard to understand. (This is partially my fault for adding _source retrieval logic to it in FieldMapper#lookupValues!) For me the extensive use of anonymous classes and function references also makes it harder to follow.
Some ideas:
- We could instantiate some of these anonymous classes, even though they're quite simple. For example there could be a concrete
SourceValueFetcherclass. ValueFetchercould live in its own file. Maybe it could contain static factory methods to create source and doc values fetchers?
There was a problem hiding this comment.
I'd be happy to move ValueFetcher to its own file and move the implementations over. After I do that we can see how we feel about it.
| List<?> values = fieldMapper.lookupValues(sourceLookup, context.format); | ||
| parsedValues.addAll(values); | ||
| for (LeafValueFetcher fetcher : context.leafFetchers) { | ||
| parsedValues.addAll(fetcher.fetch(hitContext.docId() - hitContext.readerContext().docBase)); |
There was a problem hiding this comment.
I think that HitContext.docId is already the leaf reader doc ID? It's passed in as subDocId:
It feels like we should have test coverage for this, maybe
FieldValueRetrieverTests would be a good place?
server/src/test/java/org/elasticsearch/search/fetch/subphase/FieldValueRetrieverTests.java
Show resolved
Hide resolved
| FieldsVisitor fieldsVisitor = createStoredFieldsVisitor(context, storedToRequestedFields); | ||
|
|
||
| try { | ||
| SearchLookup lookup = new SearchLookup(context.mapperService(), context.getQueryShardContext()::getForField); |
There was a problem hiding this comment.
One issue here is that the runtime scripts don't have access to the shared _source. It seems like the SearchLookup that's passed to fielddataBuilder will always be the one on QueryShardContext?
It actually seems confusing that SearchLookup lets you perform getForField to get field data, but then this method uses a potentially different SearchLookup to create it.
There was a problem hiding this comment.
Good call! I'll dig into it.
I'd hoped to land this in the branch and then cherry-pick it over to master, dropping out all of the "in the branch" stuff. |
| docMap = new DocLookup(mapperService, fieldDataLookup); | ||
| public SearchLookup( | ||
| MapperService mapperService, | ||
| BiFunction<MappedFieldType, Supplier<SearchLookup>, IndexFieldData<?>> fieldDataLookup |
There was a problem hiding this comment.
Supplier<SearchLookup> is a little weird here but it is compatible with #60318 and I think it fits well with how QueryShardContext uses it.
|
@jtibshirani I pushed some updates for all of the things you mentioned. I did move |
|
And I seem to have broken some tests. One moment! |
All better now. I hope. |
|
run elasticsearch-ci/1 |
This adds support to the
fieldsfetch phase for runtime fields. To doso it reworks the mechanism that the fetch phase uses to "prepare" to
fetch fields which is much more compatible with runtime fields.
Rather than implement fetching directly by running the script I chose to
implement fetching from doc values, which in turn runs the script. This
allowed me to reuse the doc values fetching code from the doc values
fetching phase which bought me a few tests "automatically".