Add fielddata accessors (.value/.values/.distance()/etc)#18169
Add fielddata accessors (.value/.values/.distance()/etc)#18169rmuir wants to merge 3 commits intoelastic:masterfrom
Conversation
| copyStruct("Strings", "List<String>", "Collection<String>", "Object"); | ||
| copyStruct("Longs", "List<Object>", "Collection<Object>", "Object"); | ||
| copyStruct("Doubles", "List<Object>", "Collection<Object>", "Object"); | ||
| copyStruct("GeoPoints", "List<Object>", "Collection<Object>", "Object"); |
There was a problem hiding this comment.
I think for Longs, Doubles, and GeoPoints it's safer to use List (List < def > ) so as an example Longs would be
copyStruct("Longs", "List", "Collection", "Object");
The reason being is that if anyone ever casts to one of these then we probably want to be able to handle the Object coming out of it without requiring a second cast.
((Longs)longField).get(0) is now a def type instead of an Object type. Really rare that this would ever happen, but I think for consistency this could be important.
|
Looks great! Just one comment. |
|
LGMT, awesome speedup! |
|
@jdconrad i cut these over to |
|
LGTM. Thanks! |
|
thanks for reviewing. there is a lot to followup with for this one, I will take care of it. |
|
@rmuir should i move these properties/methods (https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-groovy.html#_doc_value_properties_and_methods) back to the general scripting section instead of leaving them in groovy? or perhaps add them to the painless docs? |
|
@clintongormley I don't know what to do here! These properties/methods, lets call it "the scripting api", is really exposed to all script engines via Java. So today groovy, javascript, python, and painless support it. Expressions basically exposes what looks like a subset of this API to users, but its a lie: it in fact bypasses the whole thing entirely, and the expressions API is inconsistent in random annoying ways: e.g. On one hand, I think painless needs to support this API, to get people off of groovy. On the other hand, I'm not sure if we want to declare it front-and-center "the scripting api", because its not very good. I don't mean this in an offensive way, the API is both slow and bad, its just a fact. Expressions is 2x faster than anything else because it bypasses it completely. One reason it is cumbersome and slow is that it exposes a document's fields as always being multivalued lists, but its not clear to me user's even care about multi-valuedness whatsoever: (4ddf916) There are also tons and tons of geo functions, but many of these look very obscure. I think its ok to continue supporting them, to not break anyones scripts, but do we really have to document all the obscure stuff? I think it makes things overwhelming. An easy win might be to try to drop some of these obscure methods from the documentation. I think all the multi-valued methods (e.g. |
|
I really like the proposal to drop the more obscure stuff other than as a footnote. On the differences in API between Expressions and others -- maybe we can deprecate the ones that don't match and add ones that do match? |
I do understand that we need to provide an API which is similar in it's functionality but putting ourself into the position of building painless on top of an already known problem sounds weird to me. I think we can build a new API that allows for good performance based on a subset of the current API and deprecate the current one. All other scripting languages can potentially have access to both such that folks can move gradually but we can build a new and fast API? Does this make sense and / or is this feasible at all? |
Currently fields in painless are recommended to be accessed as
input.doc['field'].0.Instead we should work closer to the current syntax
input.doc['field'].value. To do this, we have to add ScriptDocValues to the whitelist so we can see itsgetValue()method.This has other benefits, it means we expose more of the scripting api (see https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-groovy.html#_doc_value_properties_and_methods)
So
input.doc['field_name'].latanddoc['field_name'].arcDistance(34, 56)work too and so on, and geo fields become usable.I TODO'd whitelisting any joda-time stuff to be a separate change.
Field loads for this common
.valuecase with some temporary cheating. This is a 467% performance improvement for scripts accessing document fields from painless, so I think we should do it for now. We can do fancy stuff later and remove this, but it means at least common use cases are much faster.I cutover the integration tests and docs to use this syntax.