Skip to content

Add fielddata accessors (.value/.values/.distance()/etc)#18169

Closed
rmuir wants to merge 3 commits intoelastic:masterfrom
rmuir:expand_painless
Closed

Add fielddata accessors (.value/.values/.distance()/etc)#18169
rmuir wants to merge 3 commits intoelastic:masterfrom
rmuir:expand_painless

Conversation

@rmuir
Copy link
Copy Markdown
Contributor

@rmuir rmuir commented May 5, 2016

Currently fields in painless are recommended to be accessed as input.doc['field'].0.

Instead we should work closer to the current syntax input.doc['field'].value. To do this, we have to add ScriptDocValues to the whitelist so we can see its getValue() method.

This has other benefits, it means we expose more of the scripting api (see https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-groovy.html#_doc_value_properties_and_methods)

So input.doc['field_name'].lat and doc['field_name'].arcDistance(34, 56) work too and so on, and geo fields become usable.

I TODO'd whitelisting any joda-time stuff to be a separate change.

Field loads for this common .value case with some temporary cheating. This is a 467% performance improvement for scripts accessing document fields from painless, so I think we should do it for now. We can do fancy stuff later and remove this, but it means at least common use cases are much faster.

I cutover the integration tests and docs to use this syntax.

copyStruct("Strings", "List<String>", "Collection<String>", "Object");
copyStruct("Longs", "List<Object>", "Collection<Object>", "Object");
copyStruct("Doubles", "List<Object>", "Collection<Object>", "Object");
copyStruct("GeoPoints", "List<Object>", "Collection<Object>", "Object");
Copy link
Copy Markdown
Contributor

@jdconrad jdconrad May 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for Longs, Doubles, and GeoPoints it's safer to use List (List < def > ) so as an example Longs would be

copyStruct("Longs", "List", "Collection", "Object");

The reason being is that if anyone ever casts to one of these then we probably want to be able to handle the Object coming out of it without requiring a second cast.

((Longs)longField).get(0) is now a def type instead of an Object type. Really rare that this would ever happen, but I think for consistency this could be important.

@jdconrad
Copy link
Copy Markdown
Contributor

jdconrad commented May 5, 2016

Looks great! Just one comment.

@rjernst
Copy link
Copy Markdown
Member

rjernst commented May 5, 2016

LGMT, awesome speedup!

@rmuir
Copy link
Copy Markdown
Contributor Author

rmuir commented May 5, 2016

@jdconrad i cut these over to <Def>

@jdconrad
Copy link
Copy Markdown
Contributor

jdconrad commented May 5, 2016

LGTM. Thanks!

@rmuir
Copy link
Copy Markdown
Contributor Author

rmuir commented May 5, 2016

thanks for reviewing. there is a lot to followup with for this one, I will take care of it.

@rmuir rmuir closed this in e3ce6c9 May 5, 2016
@jdconrad jdconrad mentioned this pull request May 6, 2016
18 tasks
@clintongormley
Copy link
Copy Markdown
Contributor

@rmuir should i move these properties/methods (https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-groovy.html#_doc_value_properties_and_methods) back to the general scripting section instead of leaving them in groovy? or perhaps add them to the painless docs?

@rmuir
Copy link
Copy Markdown
Contributor Author

rmuir commented May 6, 2016

@clintongormley I don't know what to do here!

These properties/methods, lets call it "the scripting api", is really exposed to all script engines via Java. So today groovy, javascript, python, and painless support it.

Expressions basically exposes what looks like a subset of this API to users, but its a lie: it in fact bypasses the whole thing entirely, and the expressions API is inconsistent in random annoying ways: e.g. doc['field'].count() vs doc['field'].size(). So not really a proper subset :(

On one hand, I think painless needs to support this API, to get people off of groovy. On the other hand, I'm not sure if we want to declare it front-and-center "the scripting api", because its not very good. I don't mean this in an offensive way, the API is both slow and bad, its just a fact. Expressions is 2x faster than anything else because it bypasses it completely.

One reason it is cumbersome and slow is that it exposes a document's fields as always being multivalued lists, but its not clear to me user's even care about multi-valuedness whatsoever: (4ddf916)

There are also tons and tons of geo functions, but many of these look very obscure. I think its ok to continue supporting them, to not break anyones scripts, but do we really have to document all the obscure stuff?

I think it makes things overwhelming. An easy win might be to try to drop some of these obscure methods from the documentation. I think all the multi-valued methods (e.g. .values, .lats, .lons) should probably be dropped from the documentation too. It would be better to instead have a footnote at the bottom that says "each field is really a list so to get access to the other values you can use .get(1)/.size()/etc"

@jdconrad
Copy link
Copy Markdown
Contributor

jdconrad commented May 6, 2016

I really like the proposal to drop the more obscure stuff other than as a footnote. On the differences in API between Expressions and others -- maybe we can deprecate the ones that don't match and add ones that do match?

@s1monw
Copy link
Copy Markdown
Contributor

s1monw commented May 9, 2016

On one hand, I think painless needs to support this API, to get people off of groovy. On the other hand, I'm not sure if we want to declare it front-and-center "the scripting api", because its not very good. I don't mean this in an offensive way, the API is both slow and bad, its just a fact. Expressions is 2x faster than anything else because it bypasses it completely.

I do understand that we need to provide an API which is similar in it's functionality but putting ourself into the position of building painless on top of an already known problem sounds weird to me. I think we can build a new API that allows for good performance based on a subset of the current API and deprecate the current one. All other scripting languages can potentially have access to both such that folks can move gradually but we can build a new and fast API? Does this make sense and / or is this feasible at all?

@clintongormley clintongormley changed the title Painless: add fielddata accessors (.value/.values/.distance()/etc) Add fielddata accessors (.value/.values/.distance()/etc) May 17, 2016
@clintongormley clintongormley added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache and removed :Plugin Lang Painless labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement v5.0.0-alpha3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants