Switch painless dynamic calls to invokedynamic, remove perf hack/cheat#18201
Switch painless dynamic calls to invokedynamic, remove perf hack/cheat#18201rmuir wants to merge 10 commits intoelastic:masterfrom
Conversation
Remove performance hack for accessing a document's fields, its not needed. Add support for accessing is-getter methods like List.isEmpty() as .empty
|
I cleaned up and ran benchmarks: this is currently a script that runs on all of geonames and looks like this: I altered the way we access document fields and values to show perf for different dynamic language features:
The static version (with casts) is ~ |
| * Called when a new type is encountered (or, when we have encountered more than {@code MAX_DEPTH} | ||
| * types at this call site and given up on caching). | ||
| */ | ||
| public static Object fallback(InliningCacheCallSite callSite, Object[] args) throws Throwable { |
There was a problem hiding this comment.
This method can be private, as we only refer via methodhandle.
If compiler emits warning, add a @SuppressWarnings to it
|
Also I want to defer trying to "improve" the signature we create here. Currently we often know the real types of parameters, e.g. We can later refactor to improve that (e.g. all tests pass if i do this: rmuir@e029b9e but thats really hacky). |
|
This is phenomenal - nice work! |
++ |
|
Ho Robert, I have seen you made the suggested changes already. Thanks! To me this looks fine - I only reviewed the CallSite part. Looks great. I would also do the other optimizations in separate issues, like Array.getLength() (which may be slower than a real access to the length field in bytecode) to some static&private getters (see your TODO). The problem is that getting the array length is only a property in Java syntax, but behind the scenes all is a separate bytecode for array load/store/length: https://en.wikipedia.org/wiki/Java_bytecode_instruction_listings |
|
P.S.: Thanks for the fruitful discussions this weekend! The whole MethodHandle stuff is so great. I was happy to help and add suggestions. I will review the remaining code a bit later and provide some feedback if there are questions open. |
|
We are now running simple script search performance tests in the nightly classic benchmarks: https://benchmarks.elastic.co/index.html#search_qps_scripts It shows already a big speedup (I will add annots!) for painless dynamic from the last few improvements. |
|
Hi Robert, This code has a static method where you can pass an primitive or x-extends-Object array and you get a MethodHandle to retun its length. This should be faster than the reflective Array.getLength (because it is statically compiled and we have a separate method for every primitive type). Maybe we shold benchmark it. Basically it holds a static map array->type to methodhandle (dynamically build) for primitives and Object[], and if the actual type is not found there (e.g, String[]), it adds a necessary cast to Object[]. The code has test as main() method inline. |
|
Thanks Uwe, I will fold in your ArrayUtil class instead of calling the slow Array.getLength. |
|
LGTM. Thanks for doing this! |
| final ExtNodeMetadata sourceenmd = metadata.getExtNodeMetadata(source); | ||
| final List<ExpressionContext> arguments = source.arguments().expression(); | ||
|
|
||
| StringBuilder signature = new StringBuilder(); |
There was a problem hiding this comment.
I would build the signature with the ASM Type class or Java's MethodType class maybe... Bt this is also fine - I just hate those hardcoded stringified bytecode constants!
Suggestion: MethodType genericMethodType:
MethodType.genericMethodType(arguments.size() + 1 /*receiver*/).toMethodDescriptorString();
|
@uschindler for the array length, i want to defer that to another issue. I don't want us having possibility of slowness, but Array.getLength is an intrinsic method, it may not really be slow. We can benchmark it to see. |
|
@rmuir that's a great improvement. :) I think that besides looking at the median, it's also worth looking at the percentile distribution (esp. higher percentiles like 99%, 99.9%) to get a better overall impression. Just out of curiosity: Did you have a chance to have a look at these too? |
|
@danielmitterdorfer The nightly benchmark is just to explain what the differences mean at a high level, it is not really what I used to develop the changes. That benchmark is extremely noisy and only outputs the median. This isn't a micro-optimization, I didn't tune anything with any benchmark. A branch in the code + exact methodhandle call is night and day vs looking up a methodhandle from our whitelist and adapting it, and the situation is well studied and understood. Its just important to use correct datastructures for the problem (e.g. polymorphic cache) so that we don't call lookups over and over. |
|
@rmuir Oh, I didn't expect that you'd use the benchmark to tune anything (and I also know that they're quite noisy. I think this is partly system-inherent but we should (a) also ensure that we eliminate as much noise as possible on the platform where we run nightlies and (b) adopt approaches that help us to deal with noise (both topics are on my todo list)). Thanks for your explanation. |
|
The nightlies could use big improvement. I think traditionally effort has been on indexing performance and not query performance. All the queries are noisy and nothing is setup for e.g. a developer to iterate/explore with (means not reindexing every time you run the benchmark, and being reasonably fast). For scripts, we also only test a fictional search script to at least have something. But for example, we don't even have a fictional update script yet, forget about having any variety in the types of scripts we want to benchmark or anything like that. |
Yes, there should be a separate issue abbout the arrays and array-like structures. The arrayLoad and arrayStore methods in Def can be removed completely and similarly handled by invokeDynamic. This is especially important because the reflective Array.get/Array.set are slow like hell (see http://stackoverflow.com/questions/30306160/performance-of-java-lang-reflect-array, https://bugs.openjdk.java.net/browse/JDK-8051447). If we use invokeDyanmic here, too, the bootstrap method (maybe a different bootstrap method - not sure) can select the right MethodHandles correctly based on types. Partly it is already implemented for the map gets, this can be done in a similar way for the list get/sets and using MethodHandles.arrayElementGetter/Setter. This should bring a huge improvement for array accesses! |
|
+1 as a first step. LGTM |
|
Thanks @uschindler for reviewing. |
|
@rmuir thanks for your feedback. I have created elastic/rally#96 and elastic/rally#97. Feel free to create more tickets if you find more shortcomings or just approach me. I really appreciate your input. |
|
@danielmitterdorfer thanks for opening those issues! FYI I hope it didn't come across as a criticism of rally, just our current state, lots of room for improvement. |
|
@rmuir You're welcome. Your feedback did not come as criticism at all, quite the contrary. I know that we have a lot of room for improvement and I also have lots of ideas on how to improve the situation but I really value your feedback. |
Currently using any dynamic language features of painless is slow. We even have in docs recommending people add a lot of casts to work around this. The hack in #18169 speeds up access to a document's fields, but this is fragile and not a general solution.
This PR switches all dynamic calls, dynamic loads, and dynamic stores over to use invokedynamic with inline caching (cascaded, goes megamorphic at 5 types). I based it on this one (BSD license) https://github.com/qmx/jsr292-cookbook/blob/master/inlining-cache/src/jsr292/cookbook/icache/RT.java
We don't need to do anything with dynamic array load/store, its already a very simple 3-way decision tree and never involves a dynamic lookup. Same goes for the operators: i left that stuff as is.
I removed the instanceof-performance hack for accessing a document's fields, we don't need it anymore.
I cleaned up some getter/setter stuff too, we compute these at compile time. Also its now possible to get access to
isEmpty()-style methods as.empty, we need that to support the scripting api the way its documented for groovy at least.For the nightly benchmark script I see:
SEARCH expression (median): 0.271813 sec
SEARCH painless_static (median): 0.517333 sec
SEARCH painless_dynamic (median): 0.582639 sec
So we should clean up and simplify the documentation, I don't think the dynamic access is trappy anymore, and we shouldn't recommend people add a bunch of casts to their scripts to try to make them faster. If someone wants to eek out extra speed, instead they should use lucene expressions which bypasses the slow scripting API.