Skip to content

Pass SearchLookup through to fielddataBuilder#58527

Closed
javanna wants to merge 1 commit intoelastic:masterfrom
javanna:enhancement/fielddata_context
Closed

Pass SearchLookup through to fielddataBuilder#58527
javanna wants to merge 1 commit intoelastic:masterfrom
javanna:enhancement/fielddata_context

Conversation

@javanna
Copy link
Copy Markdown
Contributor

@javanna javanna commented Jun 25, 2020

I am opening this PR as an early draft to start the discussion on this. I don't expect this to be merged as-is as I am not incredibly happy with the proposed solution and I expect concerns to be raised around it, but it should help understanding the problem we need to solve.

Scripted runtime fields will have different fielddata implementations (depending on their runtime type) which will need access to SearchLookup to read from doc_values, source or stored fields of existing fields (note that doc_values will also return values of other runtime fields).

A script will be executed for each document in order to determine the value of the runtime field, and the script needs the SearchLookup in its context.

With this commit we add an argument to MappedFieldType#fielddataBuilder that allows us to pass through the SearchLookup and make it available when fielddata is being built/loaded.

Scripted runtime fields will have different fielddata implementations (depending on their runtime type) which will need access to SearchLookup to read from doc_values, source or stored fields of existing fields.

A script will be executed for each document in order to determine the value of the runtime field, and the script needs the SearchLookup in its context.

With this commit we add an argument to MappedFieldType#fielddataBuilder that allows us to pass through the SearchLookup and make it available when fielddata is being built/loaded.
@javanna javanna added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types v8.0.0 v7.9.0 labels Jun 25, 2020
Copy link
Copy Markdown
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some questions but the change makes sense to me. It seems useful to be able to access the context of the document when creating a field data for a specific field.

* when this error occurs in a request. see: {@link org.elasticsearch.ExceptionsHelper#status}
*/
public IndexFieldData.Builder fielddataBuilder(String fullyQualifiedIndexName) {
public IndexFieldData.Builder fielddataBuilder(String fullyQualifiedIndexName, FieldDataContext context) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a Supplier<SearchLookup> ? Why not pass the SearchLookup directly or null if not applicable (for index sorting for instance) ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder how we can protect against self-reference ? It should be forbidden to call getForField on the same field name since this would create an infinite loop ? Maybe a wrapper around SearchLookup is enough but this would only detect it at runtime, curious to hear @nik9000 thoughts on this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed it could just be a supplier. I tried the context idea because it gives room to add more in the future, and also because I was initially thinking of just an interface and make query shard context implement it but I changed approach on the way. I think supplier would be fine as well and we can always change it if we need something different. On the other hand, there was another recent request to pass shard info through, so maybe we should take that one into account? see #57231 .

One alternative we could consider, that is less invasive but also less visible, would be to not change the signature of fielddataBuilder and rather augment the fielddata that gets loaded only when needed.

Recursion is indeed something to think about. I am not sure if it can be checked at compile time and whether such check belongs in this PR, it could also be up to the field data impl once the script is known to check for self references.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we know enough about how this'll be used to know how to deal with recursive loops.

I do like the idea of a wrapper to detect the loops though. That was other parts of the code don't have to deal with the overhead. I'm fairly sure that we can't know up front what fields a script will try to access without some interesting changes either in the script infrastructure or the way users write scripts or both. But I don't think we know enough yet. I'd certainly be willing to play with it some in the next few days though.

I'm comfortable adding this without any explicit loop detection for now and figuring that out as we go.

Also, I would have suggested the Supplier route, but given #57231 maybe we should just add the int shardId to the interface? It won't be functional any more, but that isn't too bad.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am concerned because the pr targets master. I am totally fine leaving the detect recursive loop to later as long as this code lives in a branch in the meantime. @javanna why not targeting the scripted field branch ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like passing the lookup here because some fields will be able to know up front which fields they need so it'd be nice to be able to snag them early. Or at least validate that they are snag-able.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not targeting the scripted field branch

Ideally the branch has changes only related to the plugin that it adds. I thought that the need is clear and I am not so keen on having the branch grow big in terms of infra that is needed to wire dependencies. In short, we could have hacked this in the branch but I prefer to discuss and tackle this specific problem sooner rather than later. I also did not think that we would want to tackle recursion straight-away, but if we are seeing that we need to know more before we make this change, we can make it to the feature branch for now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like passing the lookup here because some fields will be able to know up front which fields they need so it'd be nice to be able to snag them early. Or at least validate that they are snag-able.

@nik9000 I don't follow, can you expand?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 I don't follow, can you expand?

Sure! I think that script-based fields won't know up front how they are going to use the lookup, but that a, say, a runtime field that runs an ml model will know which fields it needs from the SearchLookup. It could call getDoc().getForField(blah) up front, even failing if the field that it needs doesn't exist.

public boolean isAggregatable() {
try {
fielddataBuilder("");
fielddataBuilder("", () -> null);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit deserves a TODO or, well, something. Because it ain't gonna work properly for anything that needs the context for real.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only to see if the method throws error because the field is not aggregatable, when would the null lookup matter? Btw this method will most likely need to be overridden by the runtime fields mapped filed type to return true all the time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I do think that runtime fields will just want to override this to true. Or something like that. I'm fine with the change as it stands.

* when this error occurs in a request. see: {@link org.elasticsearch.ExceptionsHelper#status}
*/
public IndexFieldData.Builder fielddataBuilder(String fullyQualifiedIndexName) {
public IndexFieldData.Builder fielddataBuilder(String fullyQualifiedIndexName, FieldDataContext context) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like passing the lookup here because some fields will be able to know up front which fields they need so it'd be nice to be able to snag them early. Or at least validate that they are snag-able.

@javanna
Copy link
Copy Markdown
Contributor Author

javanna commented Jun 28, 2020

Thanks for the discussion @jimczi and @nik9000 . I will close this and reopen once we know enough to work on protection against recursion. Note that the change I am going to make in the feature branch will have the goal to limit merge conflicts hence it will not modify the fielddataBuilder method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants