Skip to content

Add source fallback for keyword fields#87765

Closed
jdconrad wants to merge 0 commit intoelastic:masterfrom
jdconrad:sourcefallback
Closed

Add source fallback for keyword fields#87765
jdconrad wants to merge 0 commit intoelastic:masterfrom
jdconrad:sourcefallback

Conversation

@jdconrad
Copy link
Copy Markdown
Contributor

This change adds source fallback specifically for the keyword field type as part of its mapped field type as an example of how this could be done for all the fields types required for scripting (and possibly other areas such as aggs and sorting?)

I've tried to implement the design as discussed with @jpountz @jtibshirani and @nik9000 as closely as possible with the minimally invasive change. The design is discussed further in this comment (#80504 (comment)).

My questions for y'all are the following:

  1. Are there any design changes that you think would be an improvement? Or is the design as it stands now good?
  2. I've only added one test section as part of Painless. What other tests would you expect to get decent coverage? Outside of scripting, I'm not very familiar with what else would need to be tested directly.
  3. Should we add a source fallback enabled boolean to fielddataBuilder? I think it's important at least initially that source fallback be optional since it has some heavy performance implications, and as I discussed with @jpountz, since the old-style scripting syntax for doc (doc['field'].value) also depends on this code path, without the boolean there is a possible inconsistency in what values a user may get from a script.

@jdconrad jdconrad added >enhancement WIP :Search Foundations/Mapping Index mappings, including merging and defining field types :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache v8.4.0 labels Jun 16, 2022
@elasticmachine elasticmachine added Team:Search Meta label for search team Team:Core/Infra Meta label for core/infra team labels Jun 16, 2022
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@sethmlarson sethmlarson added the Team:Clients Meta label for clients team label Jun 16, 2022
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/clients-team (Team:Clients)

@javanna javanna self-requested a review June 17, 2022 15:12
@jpountz
Copy link
Copy Markdown
Contributor

jpountz commented Jun 17, 2022

I didn't review the change thouroughly, only the KeywordFieldMapper change, and it has the if (hasDocValues()) { /* use doc values */ } otherwise { /* use source */ } logic that I was expected, in my opinion this is the right way to implement source fallback.

Should we add a source fallback enabled boolean to fielddataBuilder? I think it's important at least initially that source fallback be optional since it has some heavy performance implications,

If you asked me several years ago, I would have agreed on the trapiness of the performance implication. I'm less convinced now, we make it so easy to follow the slow path using runtime fields, which you can even provide as part of the search request. Maybe we should just embrace it. Given that fields have doc values by default, if it's slow, it means that users opted in for slowness so we shouldn't worry?

since the old-style scripting syntax for doc (doc['field'].value) also depends on this code path, without the boolean there is a possible inconsistency in what values a user may get from a script.

What inconsistencies do you have in mind? The fielddata instance returned by runtime fields, which you are reusing, is supposed to be consistent with the instance that is returned when fields have doc values, so this change should be transparent to scripting APIs?

@jdconrad
Copy link
Copy Markdown
Contributor Author

@jpountz Thank you for the feedback! I realize we discussed this already, and my main concern still lies in the expected results. Now users will get results from source when they may have anticipated an exception, but as you say it's probably just an improvement outside of performance implications.

@jdconrad jdconrad removed the WIP label Jun 20, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @jdconrad, I've created a changelog YAML for you.

@jdconrad
Copy link
Copy Markdown
Contributor Author

Removing the WIP label as this is the pattern I would like to follow moving forward.

@jdconrad jdconrad removed the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Jun 20, 2022
@elasticmachine elasticmachine removed the Team:Core/Infra Meta label for core/infra team label Jun 20, 2022
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to prefer removing the elseif you return from the if. No big deal either way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, so I will change this. :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth javadoc to explain how this is different from PARSE_FROM_SOURCE.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime fields change is definitely the part I was least confident in the solution for. Should source paths be used for all source-only runtime fields? Would that make more sense than having two of these? Does it ever make sense to specify a source-only field as part of the runtime mappings where the source would be part of the parent field?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should source paths be used for all source-only runtime fields?

Truly runtime fields can't have sub-fields. That concept only exists in the mapping and runtime fields just have a single path.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That isn't a simple thing, for what it's worth. We could have built sub-fields into runtime fields. We just didn't because we didn't know why we would have. We didn't think, at the time, that they'd do anything.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth adding javadoc to explain how this is different from emitFromSource.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, at a minimum! Questions from the previous comment are relevant here too.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is worth doing for all field scripts. I get that we shouldn't do it on every call of the ones that need it, but maybe we shouldn't have this up here. I wonder if it'd be better to move all of the changes in this file to StringFieldScript. I know you want to do this for other types too, but maybe it's not too bad? I'm not sure exactly what to do, but I do think it'd be nice to skip this if we don't need it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to make this change. I do think this will be extended to all available runtime fields, though. Maybe it would make sense to provide an additional constructor or a boolean value to have this be null?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an instinct that if you implemented it with copy and paste for two field types, like, keyword and ip or something, and then tried to remove the duplications something would pop out. I don't know what. I just think it might be worth trying before adding this for all runtime fields.

@nik9000
Copy link
Copy Markdown
Member

nik9000 commented Jun 20, 2022

The design makes sense to me. I left some pretty small stuff to be honest.

I'd shoot for adding a test to KeywordFieldMapperTests and I think you can test this with docvalue_fields in regular search without needing painless. Hitting it in painless is a nice bit of paranoia though.

@jdconrad
Copy link
Copy Markdown
Contributor Author

@nik9000 Thank you for the feedback! Left some questions in the comments. Also, do you know if there are any negative effects that could come from synthetic source or not having source at all? Would it just output no values?

@jdconrad
Copy link
Copy Markdown
Contributor Author

@javanna @jtibshirani Was wondering if either of you had any thoughts, suggestions, or concerns? I'd like to collect all the feedback before I make any further changes. Thanks!

@javanna
Copy link
Copy Markdown
Contributor

javanna commented Jun 21, 2022

I posted my quick thoughts yesterday. I think this change deserves more discussion, but it may very well be that this is my feeling as I was not involved when the decision was made to go down this route.

@jdconrad
Copy link
Copy Markdown
Contributor Author

@javanna Thanks for the feedback! Apologies as I missed your comment in the other issue.

@jdconrad jdconrad added the WIP label Jun 21, 2022
@jdconrad jdconrad removed the request for review from jtibshirani June 24, 2022 21:45
@jdconrad jdconrad requested a review from romseygeek June 24, 2022 21:45
@jdconrad jdconrad closed this Jun 24, 2022
@jdconrad
Copy link
Copy Markdown
Contributor Author

Github closed this on me when I was trying to fix the changelog. Will open a new issue with the updated design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Clients Meta label for clients team Team:Search Meta label for search team v8.4.0 WIP

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants