Optimize strcmp calls for VM stackwalker#1662
Conversation
| if (ptr2 == n) return n != nullptr; | ||
|
|
||
| if (strcmp(n, "nmethod") == 0) { | ||
| ptr1 = n; |
There was a problem hiding this comment.
Initially I thought about using something like
__atomic_store_n(&ptr, n, __ATOMIC_RELAXED);but I realized the assembly is the same for both clang and gcc, on x86 and ARM.
apangin
left a comment
There was a problem hiding this comment.
TBH, 10% improvement looks unrealistic to me. I believe stack walking performance should be dominated by NMethod lookup, ScopeDesc parsing and jmethodID retrieval. It is worth validating measurement methodology.
In any case, I think this could be implemented better (if necessary). In particular, proposed PR only caches positive lookups, but once the canonical name is obtained it can be used to return both positives and negatives.
| if (ptr == n) return true; | ||
| if (strncmp(n, "StubRoutines", 12) == 0) { | ||
| // There might be multiple pointers in use for this name, | ||
| // we just keep the last one seen as an optimization |
There was a problem hiding this comment.
This is an anti-optimization.
If two or more threads have frames from different StubRoutines sections, they will start overwriting this variable over and over again, causing lots of CPU cache invalidations. Furthermore, this may slow down access to unrelated neighbor variables due to false sharing. FYI, there was a notorious JDK issue of the same kind - this bug is a great source of wisdom :)
There was a problem hiding this comment.
I see, my thinking here was that since we don't impose any strong synchronization mechanism the scenario you describe wouldn't be so bad, but that would require some measurements of course. It's probably best to leave this out.
|
Thank you for the review and for finding this this optimization opportunity. |
Description
I noticed
StackWalker::walkVMcallsstrcmpseveral times. We could save the lastchar*which matched thestrcmpas an optimization.The following table shows the value of
stackwalk_ns_avgonmasterfor several benchmarks in Renaissance, and how much it is reduced by this PR:mastersave-ptrRelated issues
n/a
Motivation and context
It's likely that the
char*refers to a string literal, and thus won't change throughout the whole lifetime of the program.How has this been tested?
There might be some noise as I ran all the benchmarks at the same time in Docker Compose. Setup here.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.