Add source fallback for keyword fields#87765
Conversation
|
Pinging @elastic/es-search (Team:Search) |
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
Pinging @elastic/clients-team (Team:Clients) |
|
I didn't review the change thouroughly, only the
If you asked me several years ago, I would have agreed on the trapiness of the performance implication. I'm less convinced now, we make it so easy to follow the slow path using runtime fields, which you can even provide as part of the search request. Maybe we should just embrace it. Given that fields have doc values by default, if it's slow, it means that users opted in for slowness so we shouldn't worry?
What inconsistencies do you have in mind? The fielddata instance returned by runtime fields, which you are reusing, is supposed to be consistent with the instance that is returned when fields have doc values, so this change should be transparent to scripting APIs? |
|
@jpountz Thank you for the feedback! I realize we discussed this already, and my main concern still lies in the expected results. Now users will get results from source when they may have anticipated an exception, but as you say it's probably just an improvement outside of performance implications. |
|
Hi @jdconrad, I've created a changelog YAML for you. |
|
Removing the WIP label as this is the pattern I would like to follow moving forward. |
There was a problem hiding this comment.
I tend to prefer removing the elseif you return from the if. No big deal either way.
There was a problem hiding this comment.
I don't have a strong opinion, so I will change this. :)
There was a problem hiding this comment.
It's probably worth javadoc to explain how this is different from PARSE_FROM_SOURCE.
There was a problem hiding this comment.
The runtime fields change is definitely the part I was least confident in the solution for. Should source paths be used for all source-only runtime fields? Would that make more sense than having two of these? Does it ever make sense to specify a source-only field as part of the runtime mappings where the source would be part of the parent field?
There was a problem hiding this comment.
Should source paths be used for all source-only runtime fields?
Truly runtime fields can't have sub-fields. That concept only exists in the mapping and runtime fields just have a single path.
There was a problem hiding this comment.
That isn't a simple thing, for what it's worth. We could have built sub-fields into runtime fields. We just didn't because we didn't know why we would have. We didn't think, at the time, that they'd do anything.
There was a problem hiding this comment.
It's probably worth adding javadoc to explain how this is different from emitFromSource.
There was a problem hiding this comment.
Agreed, at a minimum! Questions from the previous comment are relevant here too.
There was a problem hiding this comment.
I wonder if this is worth doing for all field scripts. I get that we shouldn't do it on every call of the ones that need it, but maybe we shouldn't have this up here. I wonder if it'd be better to move all of the changes in this file to StringFieldScript. I know you want to do this for other types too, but maybe it's not too bad? I'm not sure exactly what to do, but I do think it'd be nice to skip this if we don't need it.
There was a problem hiding this comment.
I'm happy to make this change. I do think this will be extended to all available runtime fields, though. Maybe it would make sense to provide an additional constructor or a boolean value to have this be null?
There was a problem hiding this comment.
I have an instinct that if you implemented it with copy and paste for two field types, like, keyword and ip or something, and then tried to remove the duplications something would pop out. I don't know what. I just think it might be worth trying before adding this for all runtime fields.
|
The design makes sense to me. I left some pretty small stuff to be honest. I'd shoot for adding a test to |
|
@nik9000 Thank you for the feedback! Left some questions in the comments. Also, do you know if there are any negative effects that could come from synthetic source or not having source at all? Would it just output no values? |
|
@javanna @jtibshirani Was wondering if either of you had any thoughts, suggestions, or concerns? I'd like to collect all the feedback before I make any further changes. Thanks! |
|
I posted my quick thoughts yesterday. I think this change deserves more discussion, but it may very well be that this is my feeling as I was not involved when the decision was made to go down this route. |
|
@javanna Thanks for the feedback! Apologies as I missed your comment in the other issue. |
|
Github closed this on me when I was trying to fix the changelog. Will open a new issue with the updated design. |
This change adds source fallback specifically for the keyword field type as part of its mapped field type as an example of how this could be done for all the fields types required for scripting (and possibly other areas such as aggs and sorting?)
I've tried to implement the design as discussed with @jpountz @jtibshirani and @nik9000 as closely as possible with the minimally invasive change. The design is discussed further in this comment (#80504 (comment)).
My questions for y'all are the following:
doc['field'].value) also depends on this code path, without the boolean there is a possible inconsistency in what values a user may get from a script.