Skip to content

Shenandoah support (Graal JIT)#10904

Closed
rkennke wants to merge 3 commits intooracle:masterfrom
rkennke:shenandoah-support
Closed

Shenandoah support (Graal JIT)#10904
rkennke wants to merge 3 commits intooracle:masterfrom
rkennke:shenandoah-support

Conversation

@rkennke
Copy link
Copy Markdown
Contributor

@rkennke rkennke commented Mar 21, 2025

This implements the barriers that are needed to run with Shenandoah GC in the Graal compiler. (Issue: #3472)

There are 3 basic kinds of barriers needed for Shenandoah:

  • SATB barriers (aka pre-write-barrier, also needed for Reference.get() support). Those are pretty similar to G1's SATB barriers. Those barriers are inserted before reference stores, or in the case of Reference.get(), after the load of the referent. The SATB barriers are inserted in the node graph, and expanded to assembly in the respective backends. (Compared to the G1 backend, we implemented a slight improvement, where we move the mid-path-section into an out-of-line stub, similar to the slow-path. This should improve performance by helping static branch prediction. We may want to change G1 barriers in a similar fashion.)
  • Load-reference-barriers (LRB). Those are conceptually similar to ZGC's read-barriers, but differ in the implementation. Those barriers, too, are inserted as nodes, and expanded to assembly in the backends.
  • Card-marking barriers. Those are only needed when running with generational Shenandoah, and are similar to Serial and Parallel GC's card-marking barriers. However, in contrast to Serial and Parallel GC, those Shenandoah card-barriers are again inserted as nodes, and expanded to assembly in the backends. (We may want to adapt this code in Serial and Parallel and ditch their snippets-based implementation.)

Notice that none of the barriers are implemented as snippets (like Serial/Parallel's card-barriers) or in the backend-only (as ZGC's read-barriers). We needed a way to efficiently deal with compressed-oops, which is not (easily) possible to do in the backend. In the node-graph this is pretty easy: insert the LRB with preceding and succeeding uncompress/compress after any load and before the (potential) uncompress (i.e. turn load->uncompress into load -> (uncompress -> lrb -> compress) -> uncompress) and then let the optimizer optimize away the trailing compress -> uncompress pairs.

In order to support this, we needed a few additions:

  • The compression nodes now have a method that allows to add them without using unique(). If we used unique(), then the uncompress before the LRB would be matched with the original uncompress after the load, and we would cut out the LRB.
  • We moved the barrier insertion for Shenandoah from the mid-tier to the low-tier. This is needed because we can't insert barriers to FloatingReadNodes. We moved the barrier insertion to after fixing the read-nodes, at which point this is safe to do. Other GCs keep adding their barriers in the mid-tier. The mechanics is that BarrierSet defaults to mid-tier, but implementations can override this to add barriers in low-tier (instead, or additionally).

X86 port contributed by @JohnTortugo.

Testing:

  • Renaissance
  • SPECjvm2008
  • SPECjbb2015
  • DaCapo

(We have run those workloads for correctness testing only, we have not (yet) conducted a performance study.)

@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Mar 21, 2025
@tkrodriguez
Copy link
Copy Markdown
Member

If you have any questions about how to structure your changes feel free to ping me over slack as I was the primary author of the new LIR support for barriers.

@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented Mar 25, 2025

If you have any questions about how to structure your changes feel free to ping me over slack as I was the primary author of the new LIR support for barriers.

Thanks, Tom! I will do that whenever I get stuck or have questions. So far I'm making progress. Structurally, Shenandoah will be look like a mix of ZGC (for the load-barrier, even though I am modeling it as a Node that consumes the loaded value, instead of replacing the ReadNode altogether) and G1 (for the SATB parts), and likely Serial/Parallel for the card-table parts.

@tkrodriguez
Copy link
Copy Markdown
Member

Sounds good. There's still some more work to finish out the switch to LIR only barriers but I think supporting G1 and ZGC covers the required strategies in a fairly pragmatic way.

@rkennke rkennke marked this pull request as ready for review May 15, 2025 17:43
@dougxc
Copy link
Copy Markdown
Member

dougxc commented May 19, 2025

@tkrodriguez can you please take another look at this. Once done, we can ask @davleopo and @gergo- to look at it.

@tkrodriguez
Copy link
Copy Markdown
Member

I'll take a look.

@tkrodriguez
Copy link
Copy Markdown
Member

Are the gate failures actual problems? It would be good to see a clean gate. Also, we should squash the history before committing.

@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented May 19, 2025

Are the gate failures actual problems? It would be good to see a clean gate.

I don't know what those problems are. I fixed everything that looked related to my changes. Those failures look like infra problems, some volumes seem to have run out of memory or something. I doubt that it is related.

Also, we should squash the history before committing. Ok, I can do that - tomorrow.

Comment thread compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/core/phases/LowTier.java Outdated
@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented May 20, 2025

@tkrodriguez There seem to be GHA failures that report SerialWriteBarriers not being Lowerable, coming from SubstrateVM. Could this be related to moving the barrier addition phase from mid- to low-tier? I don't think I have changed anything in SerialWriteBarrier or related code.

@rkennke rkennke force-pushed the shenandoah-support branch from f8dac0f to 9f2c78d Compare May 20, 2025 15:37
@tkrodriguez
Copy link
Copy Markdown
Member

Yes this is something I mentioned in our slack discussions. Moving WriteBarrierAdditionPhase after LowTierLoweringPhase creates problems for GCs that still use snippets since they expect to be lowered by LowTierLoweringPhase. In the long term I would like it to be done there but I'm not sure how to bridge the gap. I've got an internal PR based on your branch that I'm testing and I was going to look into this.

The options are to conditionalize the placement of WriteBarrierAdditionPhase based on methods from BarrierSet but that's not available when constructing the suites. We could have early and late WriteBarrierAdditionPhase to handle each case during the transition but that's a bit ugly to me. Or we could beef up WriteBarrierAdditionPhase to perform any required lowering itself, though that might be a bit complicated. It would also complicate stuff like BarrierSetVerificationPhase and some barrier elimination that's part of enterprise.

I'm going to try putting appendPhase(new PlaceholderPhase<>(WriteBarrierAdditionPhase.class)); in both MidTier and LowTier and do the placeholder replacement based on the BarrierSet.

@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented May 20, 2025

Yes this is something I mentioned in our slack discussions. Moving WriteBarrierAdditionPhase after LowTierLoweringPhase creates problems for GCs that still use snippets since they expect to be lowered by LowTierLoweringPhase. In the long term I would like it to be done there but I'm not sure how to bridge the gap. I've got an internal PR based on your branch that I'm testing and I was going to look into this.

The options are to conditionalize the placement of WriteBarrierAdditionPhase based on methods from BarrierSet but that's not available when constructing the suites. We could have early and late WriteBarrierAdditionPhase to handle each case during the transition but that's a bit ugly to me. Or we could beef up WriteBarrierAdditionPhase to perform any required lowering itself, though that might be a bit complicated. It would also complicate stuff like BarrierSetVerificationPhase and some barrier elimination that's part of enterprise.

I'm going to try putting appendPhase(new PlaceholderPhase<>(WriteBarrierAdditionPhase.class)); in both MidTier and LowTier and do the placeholder replacement based on the BarrierSet.

As far as I can see, it's only the SerialWriteBarrierNode which depends on snippets (is that right?). That should be relatively straightforward to implement without snippets and would look almost exactly like ShenandoahCardBarrierNode implementation - even a little simpler. I could work on implementing that, if you think that'd help.

@tkrodriguez
Copy link
Copy Markdown
Member

It would be easy to convert the HotSpot serial barrier to LIR but native image uses snippets for its barriers and its serial barrier is non-trivial. So we'll need to live with this mixed model for a little while I think. I'm beginning to think we might just need an early and late phase. I think the way barriers for vector writes work, we might have to do barrier addition for them before LowTierLowering, or at least before VectorLoweringPhase, which is before FixReadsPhase.

It might be too much to try to resolve all these issues in this PR. Since Shenandoah is currently HotSpot only maybe it would be best to special case its barrier insertion. I'll will play some more with this to see what would be best.

@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented May 21, 2025

It would be easy to convert the HotSpot serial barrier to LIR but native image uses snippets for its barriers and its serial barrier is non-trivial. So we'll need to live with this mixed model for a little while I think. I'm beginning to think we might just need an early and late phase. I think the way barriers for vector writes work, we might have to do barrier addition for them before LowTierLowering, or at least before VectorLoweringPhase, which is before FixReadsPhase.

It might be too much to try to resolve all these issues in this PR. Since Shenandoah is currently HotSpot only maybe it would be best to special case its barrier insertion. I'll will play some more with this to see what would be best.

I implemented barrier addition to trigger in both mid- and low-tier, and the BarrierSet implementation gets to choose which one (or both, if it wishes) is appropriate. This choice defaults to mid-tier, and can be overridden in implementations, like I did in ShenandoahBarrierSet. This way we get a clean gate :-)

@tkrodriguez
Copy link
Copy Markdown
Member

Thanks. I'll see whether that works ok in our full gate. I'd tried something slightly different but the StageFlags makes it hard to be super flexible about when these phases run. I might be tempted to keep only the phase that actually does the work in the final suite.

@tkrodriguez
Copy link
Copy Markdown
Member

Your fix works though I don't love some of the details. I'll just put comments on those places. I was able to get a clean internal gate with just minor changes in enterprise. I wasn't actually able to test Shenandoah because we don't have a labsjdk that includes it at the moment.

@tkrodriguez
Copy link
Copy Markdown
Member

Overall I think this looks good. I've only lightly reviewed the actual shenandoah parts since I don't really know anything about how the collector works. A high level JavaDoc comment on each of your newly added classes would be appreciated.

@davleopo @gergo- I think it's in good shape for review.

@rkennke rkennke force-pushed the shenandoah-support branch from 5c691ed to e9de7db Compare May 23, 2025 07:55
@davleopo
Copy link
Copy Markdown
Member

LGTM

Copy link
Copy Markdown
Member

@gergo- gergo- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not joining earlier review rounds. I left some style/documentation comments, looks good overall.

//
// Try to CAS with given arguments. If successful, then we are done.

// There are two ways to reach this label. Initial entry into the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misplaced comment from the AArch64 version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this whole block (except the first 3 lines) can be removed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By this whole block you mean the block comment starting with // There are two ways to reach this label. Initial entry into the?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

@tkrodriguez
Copy link
Copy Markdown
Member

So I have working truffle entry point read barrier changes but that exposed a deeper problem. The current changes rely on running after FixReadsPhase or not having any floating reads but the economy configuration doesn't run FixReadsPhase. We're also not strict about never emitting FloatingReadNodes in the mode so some reads end up without a barrier because they are floating. We're moving towards enforcing no floating reads in economy but it's not how it works yet.

So I'm looking at moving the read barrier into the LIR so it matches the ZGC implemenation. This would also remove all the changes to where WriteBarrierAdditionPhase is run. The way I'm approaching this is changing the oop reading node to always return uncompressed oops when a read barrier is used, which avoids any dancing around with compression in the barrier itself. It is largely straightforward and is mostly passing the unit tests. I'm chasing a crash or two but I should have something later today that we can evaluate.

@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented Jul 31, 2025

So I have working truffle entry point read barrier changes but that exposed a deeper problem. The current changes rely on running after FixReadsPhase or not having any floating reads but the economy configuration doesn't run FixReadsPhase. We're also not strict about never emitting FloatingReadNodes in the mode so some reads end up without a barrier because they are floating. We're moving towards enforcing no floating reads in economy but it's not how it works yet.

So I'm looking at moving the read barrier into the LIR so it matches the ZGC implemenation. This would also remove all the changes to where WriteBarrierAdditionPhase is run. The way I'm approaching this is changing the oop reading node to always return uncompressed oops when a read barrier is used, which avoids any dancing around with compression in the barrier itself. It is largely straightforward and is mostly passing the unit tests. I'm chasing a crash or two but I should have something later today that we can evaluate.

That sounds great, thank you! Let me know if I can help with anything!

@tkrodriguez
Copy link
Copy Markdown
Member

That idea didn't really pan out as it would have required changes to FloatingReadNode that I didn't really want to make. So I'm looking into enforcing the rule that there should be no floating reads in economy mode. That's likely to be true soon as part of another PR but it's not being enforced there. It's mostly straightforward to enforce and we can resolve any conflicts in how it's done once we're ready to merge. I kind of wanted to push this kind of enforcement anyway, and this gives me a good reason.

@tkrodriguez
Copy link
Copy Markdown
Member

I have a set of changes that disallow floating reads from reaching the backend which makes the read barrier strategy for shenandoah work in economy. It's not completely clear whether that will be pushed as part of this PR or if we might want to separate it. Could you rebase to the latest master? I need to update the CI tasks which recently changed. Once you've rebased I can update my internal PR and mirror that to github with my fixes on top. Then we can finish any required work there. Sound good?

@rkennke rkennke force-pushed the shenandoah-support branch from 2832afe to cf0d37d Compare August 5, 2025 16:00
@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented Aug 5, 2025

I have a set of changes that disallow floating reads from reaching the backend which makes the read barrier strategy for shenandoah work in economy. It's not completely clear whether that will be pushed as part of this PR or if we might want to separate it. Could you rebase to the latest master? I need to update the CI tasks which recently changed. Once you've rebased I can update my internal PR and mirror that to github with my fixes on top. Then we can finish any required work there. Sound good?

Sounds good! I rebased my branch to latest master.

@tkrodriguez
Copy link
Copy Markdown
Member

So I've taken your changes and applied some fixes including the changes to have only fixed reads in backend, and the combined PR is at #11941. The fix reads change will be addressed separately for clarity under #11942 and I will rebase once that merges. I can address Gergos final comments there but it seems like it's all working. I had a full clean gate including all benchmarks with shenanadoah last week though I had seem a crash or two during early testing so there might still be some problems lurking. Once it's all clean and ready I'll do a final squash since the history is getting very ugly. Does that sound all good?

@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented Aug 12, 2025

So I've taken your changes and applied some fixes including the changes to have only fixed reads in backend, and the combined PR is at #11941. The fix reads change will be addressed separately for clarity under #11942 and I will rebase once that merges. I can address Gergos final comments there but it seems like it's all working. I had a full clean gate including all benchmarks with shenanadoah last week though I had seem a crash or two during early testing so there might still be some problems lurking. Once it's all clean and ready I'll do a final squash since the history is getting very ugly. Does that sound all good?

Yep, perfect! Thank you for your help!

@tkrodriguez
Copy link
Copy Markdown
Member

I have rebased to master and incorporated most of Gergos suggestions but wasn't sure about the two I left unresolved. The new state is over at #11941. @rkennke maybe you could provide guidance on the last 2 unresolved comments and my last set of review comment commits?

@rkennke
Copy link
Copy Markdown
Contributor Author

rkennke commented Aug 28, 2025

I have rebased to master and incorporated most of Gergos suggestions but wasn't sure about the two I left unresolved. The new state is over at #11941. @rkennke maybe you could provide guidance on the last 2 unresolved comments and my last set of review comment commits?

I commented on Gergos comments, I don't think we can use tbz/tbnz because of the jump offset limitation. The second part of the comment can be deleted. Your commit looks fine. Thank you for taking care of all this!

@tkrodriguez
Copy link
Copy Markdown
Member

This was merged under 3c7b5cb

@reneleonhardt
Copy link
Copy Markdown

@rkennke Thank you very much for all your work over half a year!

@simonis
Copy link
Copy Markdown
Contributor

simonis commented Jan 30, 2026

@fniephaus, @alina-yur could you please add this PR to the GraalVM Community Roadmap? I think it is worth being mentioned there.

Also, it seems that the GraalVM Community Roadmap view is a little bit out of sync, as it still mentions "GraalVM 25" which was released in September 2025 under "Upcoming Releases". Maybe you can fix that as well?

@alina-yur
Copy link
Copy Markdown
Member

Hi @simonis, congrats on reaching the first milestone :)

The topic is already in the roadmap as #12237. I linked this PR there and moved the item to "In Progress". I also moved the 25 items to released, thank you for the suggestions.

@simonis
Copy link
Copy Markdown
Contributor

simonis commented Jan 30, 2026

Hi @simonis, congrats on reaching the first milestone :)

The topic is already in the roadmap as #12237. I linked this PR there and moved the item to "In Progress".

@alina-yur, #12237 is about porting the Shenandoah GC to Substrate VM / Native Imaeg. This change here, which indeed might be badly named, is about adding Shenandoah support to the Graal JIT compiler running on HotSpot (similar to [GR-27475] Add ZGC support which is on the roadmap as Add support for ZGC on HotSpot. So could you please add it to the roadmap as something like "Add support for Shenandoah on HotSpot" with the status "Done".

I also moved the 25 items to released, thank you for the suggestions.

Thanks for moving 25 to released. Do you know why the ordering is so strange. E.g. for me "GraalVM 25" is sorted between "GraalVM for JDK 17" and "22.3.0 Release" instead of appearing on top of the list?

@alina-yur alina-yur changed the title Shenandoah support Shenandoah support (Graal JIT) Feb 11, 2026
@alina-yur alina-yur added this to the 25.1.0 Release milestone Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Integrate the Shenandoah/Generational-Shenandoah GC into Native Image

10 participants