Skip to content

Optimize strcmp calls for VM stackwalker#1662

Closed
fandreuz wants to merge 3 commits into
async-profiler:masterfrom
fandreuz:save-ptr
Closed

Optimize strcmp calls for VM stackwalker#1662
fandreuz wants to merge 3 commits into
async-profiler:masterfrom
fandreuz:save-ptr

Conversation

@fandreuz

@fandreuz fandreuz commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

Description

I noticed StackWalker::walkVM calls strcmp several times. We could save the last char* which matched the strcmp as an optimization.

The following table shows the value of stackwalk_ns_avg on master for several benchmarks in Renaissance, and how much it is reduced by this PR:

Bench master save-ptr % reduction
akka-uct 19645 17634 10.24
chi-square 19530 18501 5.27
future-genetic 22795 22750 0.20
mnemonics 38996 34033 12.73
page-rank 20354 19890 2.28
scala-kmeans 22907 19887 13.18

Related issues

n/a

Motivation and context

It's likely that the char* refers to a string literal, and thus won't change throughout the whole lifetime of the program.

Thread 2 "java" hit Breakpoint 1.2, CodeBlob::CodeBlob (this=this@entry=0x7fee1b49a408, name=name@entry=0x7fee336461ab "Interpreter", kind=kind@entry=CodeBlobKind::Buffer, size=size@entry=1109016, 
    header_size=header_size@entry=80) at /jdk/src/hotspot/share/code/codeBlob.cpp:169
169     CodeBlob::CodeBlob(const char* name, CodeBlobKind kind, int size, uint16_t header_size) :
(gdb) bt
#0  CodeBlob::CodeBlob (this=this@entry=0x7fee1b49a408, name=name@entry=0x7fee336461ab "Interpreter", kind=kind@entry=CodeBlobKind::Buffer, size=size@entry=1109016, header_size=header_size@entry=80)
    at /jdk/src/hotspot/share/code/codeBlob.cpp:169
#1  0x00007fee320c6a42 in RuntimeBlob::RuntimeBlob (header_size=80, size=1109016, kind=CodeBlobKind::Buffer, name=0x7fee336461ab "Interpreter", this=0x7fee1b49a408)
    at /jdk/src/hotspot/share/code/codeBlob.hpp:332
#2  BufferBlob::BufferBlob (header_size=80, size=1109016, kind=CodeBlobKind::Buffer, name=0x7fee336461ab "Interpreter", this=0x7fee1b49a408) at /jdk/src/hotspot/share/code/codeBlob.cpp:392
#3  BufferBlob::create (name=name@entry=0x7fee336461ab "Interpreter", buffer_size=buffer_size@entry=1108928) at /jdk/src/hotspot/share/code/codeBlob.cpp:406
#4  0x00007fee331fab0c in StubQueue::StubQueue (this=this@entry=0x7fee2c1bd830, stub_interface=0x7fee2c1bd710, buffer_size=buffer_size@entry=1108928, lock=lock@entry=0x0, 
    name=name@entry=0x7fee336461ab "Interpreter") at /jdk/src/hotspot/share/code/stubs.cpp:70
#5  0x00007fee33269bc6 in TemplateInterpreter::initialize_stub () at /jdk/src/hotspot/share/interpreter/templateInterpreter.cpp:55

[...]

  _code = new StubQueue(new InterpreterCodeletInterface, code_size + max_aligned_bytes, nullptr,
                        "Interpreter");
#0  CodeBlob::CodeBlob (this=this@entry=0x7f97a3cf3008, name=name@entry=0x7f97bd77b3d2 "native nmethod", kind=kind@entry=CodeBlobKind::Nmethod, cb=cb@entry=0x7f97bb512ad0, size=1392, 
    header_size=header_size@entry=240, frame_complete_offset=47, frame_size=10, oop_maps=0x7f97b402d108, caller_must_gc_arguments=false, mutable_data_size=48) at /jdk/src/hotspot/share/code/codeBlob.cpp:125
#1  0x00007f97bcd6da4a in nmethod::nmethod (this=0x7f97a3cf3008, method=0xe235d88, type=<optimized out>, nmethod_size=<optimized out>, compile_id=<optimized out>, offsets=0x7f97bb511bf0, 
    code_buffer=<optimized out>, frame_size=<optimized out>, basic_lock_owner_sp_offset=<optimized out>, basic_lock_sp_offset=<optimized out>, oop_maps=<optimized out>, mutable_data_size=<optimized out>)
    at /jdk/src/hotspot/share/code/nmethod.cpp:1280
#2  0x00007f97bcd6e40f in nmethod::new_native_nmethod (method=..., compile_id=compile_id@entry=5, code_buffer=0x7f97bb512ad0, vep_offset=<optimized out>, frame_complete=frame_complete@entry=47, 
    frame_size=<optimized out>, basic_lock_owner_sp_offset=<optimized out>, basic_lock_sp_offset=<optimized out>, oop_maps=<optimized out>, exception_handler=<optimized out>)
    at /jdk/src/hotspot/share/code/nmethod.cpp:1102

[...]

// For native wrappers
nmethod::nmethod(
  Method* method,
  CompilerType type,
  int nmethod_size,
  int compile_id,
  CodeOffsets* offsets,
  CodeBuffer* code_buffer,
  int frame_size,
  ByteSize basic_lock_owner_sp_offset,
  ByteSize basic_lock_sp_offset,
  OopMapSet* oop_maps,
  int mutable_data_size)
  : CodeBlob("native nmethod", CodeBlobKind::Nmethod, code_buffer, nmethod_size, sizeof(nmethod),
             offsets->value(CodeOffsets::Frame_Complete), frame_size, oop_maps, false, mutable_data_size),
VtableBlob* blob = VtableBlob::create("vtable chunks", bytes);
#define DEFINE_BLOB_INIT_METHOD(blob_name)                              \
  void StubRoutines::initialize_ ## blob_name ## _stubs() {             \
    if (STUBGEN_BLOB_FIELD_NAME(blob_name) == nullptr) {                \
      BlobId blob_id = BlobId:: JOIN3(stubgen, blob_name, id);          \
      int size = _ ## blob_name ## _code_size;                          \
      int max_aligned_size = 10;                                        \
      const char* timer_msg = "StubRoutines generation " # blob_name " stubs"; \
->    const char* name = "StubRoutines (" # blob_name " stubs)";        \
      const char* assert_msg = "_" # blob_name "_code_size";            \
      STUBGEN_BLOB_FIELD_NAME(blob_name) =                              \
        initialize_stubs(blob_id, size, max_aligned_size, timer_msg,    \
                         name, assert_msg);                             \
    }                                                                   \
  }

How has this been tested?

There might be some noise as I ran all the benchmarks at the same time in Docker Compose. Setup here.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Comment thread src/vmStructs.h
if (ptr2 == n) return n != nullptr;

if (strcmp(n, "nmethod") == 0) {
ptr1 = n;

@fandreuz fandreuz Jan 20, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I thought about using something like

__atomic_store_n(&ptr, n, __ATOMIC_RELAXED);

but I realized the assembly is the same for both clang and gcc, on x86 and ARM.

Comment thread src/vmStructs.h Outdated
Comment thread src/vmStructs.h Outdated
Comment thread src/vmStructs.h Outdated
Comment thread src/vmStructs.h Outdated

@apangin apangin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, 10% improvement looks unrealistic to me. I believe stack walking performance should be dominated by NMethod lookup, ScopeDesc parsing and jmethodID retrieval. It is worth validating measurement methodology.

In any case, I think this could be implemented better (if necessary). In particular, proposed PR only caches positive lookups, but once the canonical name is obtained it can be used to return both positives and negatives.

Comment thread src/vmStructs.h Outdated
if (ptr == n) return true;
if (strncmp(n, "StubRoutines", 12) == 0) {
// There might be multiple pointers in use for this name,
// we just keep the last one seen as an optimization

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an anti-optimization.
If two or more threads have frames from different StubRoutines sections, they will start overwriting this variable over and over again, causing lots of CPU cache invalidations. Furthermore, this may slow down access to unrelated neighbor variables due to false sharing. FYI, there was a notorious JDK issue of the same kind - this bug is a great source of wisdom :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, my thinking here was that since we don't impose any strong synchronization mechanism the scenario you describe wouldn't be so bad, but that would require some measurements of course. It's probably best to leave this out.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apangin

apangin commented Mar 18, 2026

Copy link
Copy Markdown
Member

I propose an alternative implementation of this optimization: #1701
@fandreuz wdyt?

@fandreuz

Copy link
Copy Markdown
Contributor Author

I propose an alternative implementation of this optimization: #1701 @fandreuz wdyt?

Looks good!

@apangin

apangin commented Mar 18, 2026

Copy link
Copy Markdown
Member

Thank you for the review and for finding this this optimization opportunity.

@apangin apangin closed this Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants