Add a simple 'fetch fields' phase.#55639
Add a simple 'fetch fields' phase.#55639jtibshirani merged 9 commits intoelastic:field-retrievalfrom
Conversation
10d498d to
358ade3
Compare
There was a problem hiding this comment.
We'll no longer use a flat list of fields once we introduce support for formatting the values.
There was a problem hiding this comment.
You just have this temporarily because it is simpler to parse?
There was a problem hiding this comment.
I tried hard to write proper unit tests (in the style of FetchSourcePhaseTests), but it's currently challenging to stub out the required pieces like source lookup. I think some broader refactors are necessary before it's possible to unit test these classes.
There was a problem hiding this comment.
What if you made something like:
public void hitsExecute(Function<SearchHit, Map<String, Object>> sourceLookup, SearchHit[] hits)
and unit tested that? Then and the only thing missing a unit test would be the map from search context to "simple" classes?
There was a problem hiding this comment.
I had been hesitant to do this, because I thought it would make FetchFieldsPhase harder to read... but I actually think it turned out pretty cleanly. I don't think I followed your suggestion exactly though, curious to hear what you think.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
358ade3 to
facc1cc
Compare
|
Pinging @elastic/es-search (:Search/Search) |
👍 |
nik9000
left a comment
There was a problem hiding this comment.
Looks like a great start! I left some "for later" stuff and some "maybe now" stuff.
There was a problem hiding this comment.
You just have this temporarily because it is simpler to parse?
| // all. | ||
| // This call doesn't really need to support writing any kind of object, since the values | ||
| // here are always serializable to xContent. Each value could be a leaf types like a string, | ||
| // number, or boolean, a list of such values, or a map of such values with string keys. |
There was a problem hiding this comment.
Ah! Except it won't be for stuff like geo_points, right?
There was a problem hiding this comment.
I do wonder if it'd be nicer to have the "type" of the field available so we don't need all the nasty reflection in value.
There was a problem hiding this comment.
But that is a problem for another time, I think.
There was a problem hiding this comment.
I agree, this is a really unfortunate part of the current PR. It's on the top of my mind in terms of what needs figuring out for a follow-up.
| trackTotalHitsUpTo = in.readOptionalInt(); | ||
|
|
||
| if (in.getVersion().onOrAfter(Version.V_8_0_0)) { | ||
| if (in.readBoolean()) { |
| out.writeOptionalInt(trackTotalHitsUpTo); | ||
|
|
||
| if (out.getVersion().onOrAfter(Version.V_8_0_0)) { | ||
| out.writeBoolean(fetchFields != null); |
| while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) { | ||
| docValueFields.add(FieldAndFormat.fromXContent(parser)); | ||
| } | ||
| } else if (FETCH_FIELDS_FIELD.match(currentFieldName, parser.getDeprecationHandler())) { |
There was a problem hiding this comment.
I wonder when it'll be time to convert this to ObjectParser.....
| Object value = entry.getValue(); | ||
| List<Object> values = value instanceof List | ||
| ? (List<Object>) value | ||
| : List.of(value); |
There was a problem hiding this comment.
I could see wanting to switch this to something a little less instanceof soon. But this'll get the job done for now!
|
|
||
| public abstract SearchContext docValuesContext(FetchDocValuesContext docValuesContext); | ||
|
|
||
| public abstract FetchFieldsContext fetchFieldsContext(); |
There was a problem hiding this comment.
I think we should have javadoc for this.
| /** | ||
| * For each of the provided paths, return its value in the source. Note that | ||
| */ | ||
| public Map<String, Object> extractValues(Collection<String> paths) { |
There was a problem hiding this comment.
I wonder if this'd be better kept in the new fetch phase, especially if we're going to migrate away from it.
I wonder if a better interface for collecting the values is Map<String, Consumer<?>> or something shaped like that. I feel like it'd let you build flexible casting and formatting pretty naturally. But that can wait too.
There was a problem hiding this comment.
+1, I'll move this up into FetchFieldsPhase. I'll wait to refine the interface until I have a better handle on the approach to value formatting.
| SearchHit searchHit = createTestItem(xContentType, true, true); | ||
| BytesReference originalBytes = toXContent(searchHit, xContentType, true); | ||
| Predicate<String> pathsToExclude = path -> (path.endsWith("highlight") || path.endsWith("fields") || path.contains("_source") | ||
| Predicate<String> pathsToExclude = path -> (path.endsWith("highlight") || path.contains("fields") || path.contains("_source") |
There was a problem hiding this comment.
👍
This on looks like it took a while to find!
There was a problem hiding this comment.
What if you made something like:
public void hitsExecute(Function<SearchHit, Map<String, Object>> sourceLookup, SearchHit[] hits)
and unit tested that? Then and the only thing missing a unit test would be the map from search context to "simple" classes?
And none of it really blocks merging, especially to a feature branch. But I do think now would be a simpler time to talk about the unit test and where to put the source extraction logic. |
|
Thanks @nik9000 for the speedy review, it's ready for another look. |
| /** | ||
| * The context needed to retrieve fields. | ||
| */ | ||
| public class FetchFieldsContext { |
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Note: the PR is opened against the
field-retrievalfeature branch.Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. This PR just lays out the initial class structure and tests -- much of the interesting functionality will come in subsequent PRs.
Known issues that will be addressed in future commits:
SourceLookupmethod they have planned to efficiently load particular values: Partially parsesourcedocuments to speed upsourceaccess #52591. I plan to make use of this when it's available, but for now added a placeholder methodextractValues.Relates to #55363.