JEP draft: JEP Draft: Unbiased Stack-Walk JVMTI extension API

OwnerRoman Kennke
TypeFeature
ScopeJDK
StatusDraft
Componenthotspot / jvmti
EffortM
DurationM
Created2026/03/30 13:45
Updated2026/03/31 10:37
Issue8381322

Summary

Provide an API for use by external tools to request and receive unbiased (aka asynchronous) stack-traces.

Goals

Non-Goals

Motivation

One of the great advantages of Java and the JVM is its vast ecosystem of tools that make the lives of developers easier. One such category of tools are profilers. The JVM comes with built-in facilities for profiling (JFR), and there are various external tools that provide a wider and sometimes more useful set of features both open source (e.g., async-profiler [0]) and commercial. One key feature that profiling solutions require from the JVM is the ability to obtain stack-traces. This allows profiling tools to tell the user exactly where potential performance bottlenecks or various other problems (e.g., cache-misses) may be located in their code, and how the execution of the program got there.

There are currently several ways for external profiling tools to obtain stack-traces, and all have drawbacks which make them an insufficient solution:

An example of what is possible using the new API is shown below. This is a flame-graph that represents ~27000 samples of cache-misses obtained using a small profiling agent running with one of the Renaissance benchmark workloads. The profiling agent obtains the samples by setting up a signal handler on Linux perf counter overflow on the hardware cache-misses counter, and requesting stack-traces whenever that signal fires.

Flame-graph showing cache misses in a Renaissance workload

Description

A new API is added as a JVMTI extension. Calling that API requests a stack-trace from the current or specified thread to be reported via the provided callback functions. The API has the following signature:

jvmtiError RequestStackTrace(jvmtiEnv* env, jthread* thread, void* ucontext, jvmtiStackTraceCallbacks callbacks, void* user_data)

Where the method arguments are:

The function returns an error code:

After calling the API, the JVM will call the agent-provided callback functions to report the stack-trace.

jvmtiStackTraceCallbacks is a struct that contains the callback functions:

typedef struct {
  jvmtiBeginStackTraceCallback beginStackTrace;
  jvmtiEndStackTraceCallback endStackTrace;
  jvmtiStackFrameCallback stackFrame;
  jvmtiStackTraceFailureCallback failure;
} jvmtiStackTraceCallbacks;
typedef void (JNICALL *jvmtiBeginStackTraceCallback)
    (jthread thread,
     jboolean biased,
     void* user_data);
typedef void (JNICALL *jvmtiEndStackTraceCallback)
    (jthread thread,
     void* user_data);
typedef jint (JNICALL *jvmtiStackTraceCallback)
    (jvmtiFrameType frameType,
     jmethodID methodId,
     jlocation location,
     void* user_data);
typedef void (JNICALL *jvmtiStackTraceFailureCallback)
    (jthread thread,
     void* user_data);

The functionality needs to be enabled before use by calling the following function:

jvmtiError EnableRequestStackTrace(jvmtiEnv* env)

This will typically be called from JVMTI's Agent_OnLoad to globally enable the functionality. However, it is also possible to call the function later.

The functionality can be disabled by calling the following function:

jvmtiError DisablesRequestStackTrace(jvmtiEnv* env)

This could be called from JVMTI's Agent_OnUnload or at any earlier time to disable the functionality.

Implementation

Much of the functionality that is required for this feature has already been implemented for JEP 509: JFR CPU-Time Profiling (Experimental). The mechanism for asynchronous stack-walking that has been implemented for the CPU-time sampler is generalized and re-used for the Unbiased Stack-Walk API.

In short, the stack-walker works as follows:

  1. The signal-handler (or any other trigger) calls into RequestStackTrace.
  2. RequestStackTrace records the thread's current PC, BCI and SP, and places a stack-walk-request with that information on a queue.
  3. It then arms the thread for safepoint-polling (aka handshaking).
  4. As soon as the thread arrives at the next safepoint-poll, it stops and starts processing all enqueued requests.
  5. For each request, the stack-walker fetches the PC, BCI and SP, and reconstructs the top frame information from that. Notice that we only need to reconstruct the top frame (plus possibly inlined frames), but never any frames below that, because method returns would always run into a safepoint poll.
  6. Once we have the top-frame, the thread walks the stack down by the usual mechanisms.
  7. While visiting frames, call into the agent-provided callback functions to report the frames.

Alternatives

The following alternatives have been considered:

Testing

Risks and Assumptions

TBD