JEP draft: Ahead-of-Time Code Compilation

Owner	John Rose
Type	Feature
Scope	Implementation
Status	Submitted
Component	hotspot / compiler
Relates to	JEP 483: Ahead-of-Time Class Loading & Linking
	JEP 515: Ahead-of-Time Method Profiling
Reviewed by	Dan Heidinga, Vladimir Kozlov
Created	2024/06/30 04:47
Updated	2026/04/01 15:04
Issue	8335368

Summary

Improve startup and warmup time by making native code for Java methods instantly available, when the HotSpot Java Virtual Machine starts. This will greatly reduce the initial load on the JIT (Just-In-Time) compiler, reducing its interference with the application during startup, particularly in configurations with fewer cores. The JIT is then free to delay the generation of native code, unless and until the previously generated code proves insufficiently performant.

Goals

Help applications start up and warm up more quickly by shifting dynamic (JIT) method compilation from production runs to training runs, conveying the necessary native code via the AOT cache.
Provide full interoperability between all optimization levels and execution modes in the HotSpot Java Virtual Machine, including new AOT code, existing JIT code, and the bytecode interpreter.
Do not require any change to the code of applications, libraries, or frameworks.
Do not introduce any new constraints on application execution.
Do not introduce new AOT workflows, but, rather, use the existing AOT cache creation commands.
Support AArch64 and x64 CPU types, at a minimum.

Non-Goals

It is not a goal to provide an AOT-only mode. Applications may continue to use both AOT code and JIT-compiled code in the same run, transitioning between them as needed.
It is not a goal to support cross-compilation. This JEP generates AOT code only for the same CPU architecture used during the training run.
It is not a goal to immediately support all CPU types known to HotSpot. We expect normal porting activities to eventually add support for all major CPUs.

Motivation

To prepare the best possible native code for an application, we must first run the application.

This means that, initially, an application must execute by means of less-than-optimal techniques. During this initial period, called warmup, the actual application behavior must be observed (or profiled) in order to track which code paths and object types need to be prioritized for optimization. As profiles accumulate during warmup, the system is able to compile methods first with modest optimization and then recompile the most important methods with higher optimization levels. When application execution is fully transferred to this optimized code, it stays at peak performance, as long as the profiled code paths and object types continue to dominate performance.

It may seem that there is no shortcut, that that peak application performance is only attained after a CPU-intensive warmup period, including application execution, profiling, and optimizing JIT compilation.

Recent work has reduced these warmup costs, in part. JEP 483 shifts application linking and loading to a training run by means of the AOT cache. JEP 515 shifts profiling work in the same way, so that a production run starts with ready-made profile data, so that the JIT compiler can run immediately. But warmup is still delayed, by seconds or even minutes, because the JIT compilation of optimized code uses many computing resources. On some platforms, the latency of JIT compilation can be hidden by running many JIT threads in parallel, but this trick requires the allocation of processors beyond those immediately useful to the application. Surely it would be helpful if the heavy work of JIT compilation could be shifted to a training run as well.

We will extend the existing AOT cache so that it can carry precompiled native code generated from the profiles collected during the training run. In a production run, the JVM can then satisfy requests for native code by loading the cached code immediately, rather than compiling it again with the JIT. This will preserve the existing execution model while reducing both startup and warmup costs.

Description

We extend the AOT cache, introduced by JEP 483 and previously extended by JEP 515, to store natively compiled method code assets, also known as AOT code. During a production run, a request for native method code, normally fulfilled by the JIT compiler, can be immediately fulfilled if a matching method is found in the AOT cache. Neither profiling nor JIT compilation needs to introduce delays into the application startup if appropriate AOT code is available. If matching AOT code is unavailable, incompatible, or later deoptimized, execution falls back to the existing interpreter and JIT mechanisms. This means that warmup happens quickly, and with less consumption of computing resources.

AOT code is generated by the HotSpot JVM's C1 and C2 JIT compilers during the AOT cache assembly phase (-XX:AOTMode=create). AOT compilation uses profiling information collected during the training run, as described by JEP 515, to generate native code.

Code generated ahead of time is slightly different from normal code (generated just in time, in the production run). For example, some values normally treated as compile-time constants (normally embedded by the JIT in the native code) may be recomputed directly by AOT code, if they could be changed when the production run starts. As another example, if the AOT code compiler cannot predict the order of class initialization ahead of time, it may compile explicit checks into the AOT code, to ensure classes are initialized before use. Normally, the JIT notes that an initialization has already happened, before compile time, and emits no check at all.

From the user’s point of view, all JIT compilation activity is transparent, except for effects on application performance. Likewise, all uses of AOT code are equally transparent. There are no new requirements on application configuration or VM invocation. Applications that use AOT code will usually start up and warm up more quickly. Even when peak performance requires additional JIT activity (to generate newly optimized code), there is likely to be less overall consumption of machine resources by JIT activity, and such activity will tend to spread more evenly across the lifetime of the application.

Popular Java frameworks will benefit without change if they are already using an AOT cache:

The presence of AOT code has two low-level effects: It makes the AOT cache larger, usually by a modest amount. And, it makes good native code appear quickly, making it seem as though the JIT compilation tasks compete almost instantly. The near-instant loading of AOT code will cause even the earliest phases of application startup to run faster, since it is much faster to load precompiled code than to generate it from scratch. Application warmup will also be accelerated, since much profiling and JIT activity will be skipped, in favor of immediate use of AOT code.

Of course, if the application’s behavior in the production run is significantly different from the training run, some AOT code might not be usable, or it might be deoptimized and replaced. This is nothing new: Both AOT and JIT code are always used conditionally (on proof of importance) and are then subject to deoptimization and replacement (if they become less useful). The JIT is therefore necessary even if AOT code is present. In the extreme case where the VM is running on a processor version that cannot execute the AOT methods, the JIT will be the only source of compiled code. When the JIT compiler must run, AOT profiles will be useful, enabling the JIT to predict the appropriate hot code paths and hot object types, as they were observed during the training run.

A new diagnostic VM flag,-XX:+AOTCodeCaching, has been added to control both creation (during the training run) and usage (during the production run) of AOT code. It is enabled by default. To disable AOT code generation (during training) or AOT code usage (during production) run with the flags -XX:+UnlockDiagnosticVMOptions and -XX:-AOTCodeCaching.

The same consistencies listed in JEP 483 apply to AOT code caching. In addition AOT code caching requires that CPU versions used by code generation and by the production run must match, so as to ensure that the production CPU is actually capable of running the cached AOT code. If the CPU versions do not match, then the JVM issues a warning and doesn’t load the AOT code, although the rest of AOT cache (loaded classes and profiling info) will be used.

To check if your JVM is correctly configured to use the AOT code, you can add the option -XX:AOTMode=on to the command line. With this option, the JVM reports an error and exits if the AOT cache violates any constraint, including the CPU version match. See more details about this mode in JEP 483.

Testing

We will create new unit tests for this feature.
We will run existing AOT cache tests with this feature enabled, and ensure that they pass.
For the first release, only AArch64 and x64 CPUs will be supported, and unit tests will be adjusted appropriately on other platforms to all for the absence of AOT code.

Alternatives

As has been demonstrated many times, Java can be supported by a pure static compiler. Static compilation is always accompanied by compromises to performance, agility, or compatibility. At present, best performance requires a balanced mix of AOT and JIT execution modes (plus the interpreter), as provided by this JEP.

Since AOT code can be loaded immediately on startup, it might seem that profiles in the AOT cache (added by JEP 515) are now useless. In fact, they are used to sequence the loading of optimized AOT code, as well as helping the VM compiler regenerate JIT code.

Therefore, it is not presently a goal to rely completely on AOT code, as if a Java application were the same as a C++ application. When appropriate, applications can still make use of the interpreter, the JIT, and AOT profiles. Future work may investigate further minimization of JIT usage, and/or interpreter usage. However, initial experiments suggest that totally excluding the JIT often leads to lower peak performance. Likewise, excluding the interpreter results in bloated AOT cache files, which can be more expensive to load than running the interpreter.

Unlike a C++ application, a Java application is always compiled to use the highest and best instruction set architecture available at production time, including any available optional instructions. Vector ISAs change and develop, affecting the details of vectorized code generated by the HotSpot virtual machine. When running with an AOT cache that contains AOT code, the VM checks that the present processor can correctly execute the AOT code. This check can fail if the AOT cache was created by a newer machine, but the production run is performed on an older model. The resulting execution is still correct, but it may exhibit lower performance, as some or all AOT code may be inappropriate for the current run.

Future work may investigate alternatives for finer control over optimization levels of AOT code, possibly allowing users to trade off speed for processor compatibility. Such work could potentially install several versions of a given AOT method, usable by differing processor levels. However, such fine control is not an initial goal.

Risks and Assumptions

There are no new risks beyond those already noted in JEP 483.

The base assumption of the AOT cache remains operative: A training run is assumed to be a good source of observations that, when passed through an AOT cache to a production run, will benefit the performance of that production run. This assumption applies fully to AOT code, which benefits similar production runs, without doing harm to divergent production runs.