JEP draft: JEP Draft: Unbiased Stack-Walk JFR event trigger
| Owner | Roman Kennke |
| Type | Feature |
| Scope | JDK |
| Status | Draft |
| Component | hotspot / jvmti |
| Effort | M |
| Created | 2026/03/17 12:49 |
| Updated | 2026/03/30 13:25 |
| Issue | 8380294 |
Summary
Provide an API for use by external tools to request and receive unbiased (aka asynchronous) stack-traces.
Goals
- Provide a JVMTI extension API to allow external tools to obtain a stack-trace for a given Java thread.
- The stack-trace is delivered through JFR as an event emitted in an ongoing recording (if any).
- The API is safe to use from signal-handlers to facilitate profiling tools to call this from profiling signals, e.g., perf counter overflow or CPU timer signals.
- The API reports information about Java frames, e.g., method/class-name, byte-code-index.
- The API can obtain stack information that is free from safepoint bias.
Non-Goals
- It is currently not a goal to report native frames. (This feature may be added in a follow-up improvement.)
Motivation
One of the great advantages of Java and the JVM is its vast ecosystem of tools that make the lives of developers easier. One such category of tools are profilers. The JVM comes with built-in facilities for profiling (JFR), and there are various external tools that provide a wider and sometimes more useful set of features both open source (e.g., async-profiler [0]) and commercial. One key feature that profiling solutions require from the JVM is the ability to obtain stack-traces. This allows profiling tools to tell the user exactly where potential performance bottlenecks or various other problems (e.g., cache-misses) may be located in their code, and how the execution of the program got there.
There are currently several ways for external profiling tools to obtain stack-traces, and all have drawbacks which make them an insufficient solution:
- There is a family of official JVMTI APIs to get stack-traces, namely
GetStackTrace,GetAllStackTracesandGetThreadListStackTraces. Those methods are fundamentally not signal-handler-safe because they may do allocations, they transition the caller thread from native to VM, which may block, and they will perform a handshake or even bring all threads to a safepoint, and wait for that to be finished. The stack-traces are only obtained when the thread arrives at a safepoint, which leads to the so-called safepoint-bias. Safepoint bias is a problem, because it skews profiling results such that they only ever point to the safepoints (e.g., method call sites, loop back-edges, etc) as hot areas, even when the real problem is somewhere else. - There is an unofficial API that is used by many profiling tools called
AsyncGetCallTrace. That API is a JVM internal API that is not exposed in a header or through any other means. While it is signal-handler-safe and avoids the safepoint bias problem, it has limitations that make it insufficient still. It walks the stack outside of a safe-point, but is using functions that were designed to only do that only when at a safepoint which can potentially crash the JVM. Also, frames are reported as jmethodIDs. Tools need to collect them in a signal handler, but use them later, outside of the signal handler (to avoid allocations and VM transitions while trying to resolve the jmethodIDs). This is undefined behaviour per JVMTI spec and can lead to crashes, e.g., when the method/class has been unloaded by the time the jmethodID is used. Both HotSpot and the external tools are trying hard to avoid that problem, but they can only ever make it 'very unlikely' which is not sufficient for a stable profiling solution. - Some profiling tools hook into vmStructs directly to walk the stack by their own means. This is inherently dangerous and unspecified, and can change with every JVM release. Crashing is almost the best scenario for this 'solution' - it can also lead to 'silent', more subtle failures, which could be more catastrophic and harder to debug than crashes.
An example of what is possible using the new API is shown below. This is a flame-graph that represents ~27000 samples of cache-misses obtained using a small profiling agent running with one of the Renaissance benchmark workloads. The profiling agent obtains the samples by setting up a signal handler on Linux perf counter overflow on the hardware cache-misses counter, and requesting stack-traces whenever that signal fires.
Description
A new API is added as a JVMTI extension. Calling that API requests a stack-trace from the current thread to be reported via JFR by emitting an StackTraceRequest event. The API has the following signature:
jvmtiError RequestStackTrace(jvmtiEnv* env, jthread* thread, void* ucontext, jlong user_data)
Where the method arguments are:
thread: the Java thread for which a stack-trace is requested. AcceptsNULLfor the current thread. For threads other than the current thread, the stack-trace will be biased.ucontext: the thread context (e.g., as passed from POSIX signal-handlers). AcceptsNULL(e.g., when not available, or when calling from a system that doesn't pass thread context). WhenNULLis passed, the stack-trace will be biased.user_data: arbitrary data passed by the caller. The data will be reported-back in the JFR event. This can typically be used by the profiling agent to associate the stack-trace as reported by JFR with the original event (e.g., a cache-miss-counter-overflow) that triggered the stack-trace.
The function returns an error code:
JVMTI_ERROR_NOT_AVAILABLEwhen the functionality is not available (e.g., due to JFR not being present)JVMTI_ERROR_INVALID_THREADthe passed-in thread is invalidJVMTI_ERROR_NONEif the call succeeded
After calling the API, JFR will emit an event that is specified as follows:
- Name:
StackTraceRequest, an experimental event - Field:
stackTraceof typeStackTrace- the stack-trace - Field:
eventThreadof typeThread- the thread - Field:
userDataof typejlong- the user data that has been passed intoRequestStackTrace - Field:
failedof typeboolean-trueif obtaining a stack-trace failed for some reason,falseotherwise - Field:
biasedof typeboolean-trueif the stack-trace is biased towards a safepoint,falseotherwise
The functionality needs to be enabled before use by calling the following function:
jvmtiError EnableRequestStackTrace(jvmtiEnv* env)
This will typically be called from JVMTI's Agent_OnLoad to globally enable the functionality. However, it is also possible to call the function later. Notice that in order to use the functionality, one would also have to start a JFR recording.
The functionality can be disabled by calling the following function:
jvmtiError DisablesRequestStackTrace(jvmtiEnv* env)
This could be called from JVMTI's Agent_OnUnload or at any earlier time to disable the functionality.
Implementation
Much of the functionality that is required for this feature has already been implemented for JEP 509: JFR CPU-Time Profiling (Experimental). The mechanism for asynchronous stack-walking that has been implemented for the CPU-time sampler is generalized and re-used for the Unbiased Stack-Walk API.
In short, the stack-walker works as follows:
- The signal-handler (or any other trigger) calls into RequestStackTrace.
- RequestStackTrace records the thread's current PC, BCI and SP, and places a stack-walk-request with that information on a queue.
- It then arms the thread for safepoint-polling (aka handshaking).
- As soon as the thread arrives at the next safepoint-poll, it stops and starts processing all enqueued requests.
- For each request, the stack-walker fetches the PC, BCI and SP, and reconstructs the top frame information from that. Notice that we only need to reconstruct the top frame (plus possibly inlined frames), but never any frames below that, because method returns would always run into a safepoint poll.
- Once we have the top-frame, the thread walks the stack down by the usual mechanisms.
Alternatives
The following alternatives have been considered:
- Implement all possible functionality in JFR. This would be impossible: 1. There are too many different scenarios. For example, JFR events could be provided for various different Linux perf event overflows. 2. Many scenarios may be very platform-specific (e.g., Linux perf event overflows). Note that something like this has already been attempted in JEP 509, and while it works well for what it aims to do, it only covers one very special scenario, on one particular platform.
- Change the implementation of the JVMTI
GetStackTracefamily of functions to provide the desired functionality (signal-safety and avoiding the safepoint bias). While this may be possible in principal, it would require a change of the signature to also accept avoid* ucontext, and it would also represent a significant change in behaviour, which is undesirable. It would also suffer from the same problems asAsyncGetCallTracein that the caller would have to pre-allocate the stack-trace structure, and also have to deal with broken (undefined-behaviour)jmethodID. - Fix
AsyncGetCallTrace. As it currently is,ASGCTsuffers from various fundamental problems in its design (see discussion above). In-fact, this JEP is an attempt at fixing it, by providing a more sustainable alternative. - A similar new JVMTI extension API, but one that would not emit a JFR event, but reports back the stack-trace via JVMTI callbacks. The API would look like this:
jvmtiError RequestStackTrace(jvmtiEnv* env, jthread thread, void* ucontext, jvmtiStackTraceCallbacks* cb, void* user_data)
This would actually have been the author's preferred approach, because it avoids the coupling to JFR and gives the JVMTI agent more freedom in how it handles the stack-trace, while also providing signal-safety and no safepoint-bias. However, it has (so far) been rejected on the grounds that no new functionality is currently wanted in JVMTI.
Testing
- Several new jtreg tests are added to verify that the new functionality works, and that it does not interfere with existing JFR functionality.
Risks and Assumptions
- The RequestStackTrace as it is currently specified assumes that JFR is present. If it is not, then the functionality is not available.
- The functionality relies on the JVMTI agent to somehow associate the JFR
StackTraceRequestevent with the original event (e.g., a cache-misses counter overflow) that caused the stack-trace request. This association would have to be done in a post-processing step. The API has theuser_dataargument to facilitate this post-processing. Other than that, it is outside of the scope of this JEP. - If more than one JVMTI agent is requesting stack-traces via the API, then it becomes an interesting problem to associate the RequestStackTrace events with the original events. For example, the agents may use the same IDs (e.g., counters or timestamp) for their events, assuming that no other agent would do the same.
- It is problematic that the JVMTI agent has no control over whether JFR is present, a recording is ongoing (recordings can even be started/stopped at the user's discretion), where the recording is going (to a file? streamed over network? etc).
