Skip to content

8252887: Zero VM is broken after JDK-8252661#64

Closed
DamonFool wants to merge 1 commit intoopenjdk:masterfrom
DamonFool:JDK-8252887
Closed

8252887: Zero VM is broken after JDK-8252661#64
DamonFool wants to merge 1 commit intoopenjdk:masterfrom
DamonFool:JDK-8252887

Conversation

@DamonFool
Copy link
Copy Markdown
Member

@DamonFool DamonFool commented Sep 7, 2020

Hi all,

JBS: https://bugs.openjdk.java.net/browse/JDK-8252887

Zero VM is broken due to 'block_if_requested' is not a member of 'SafepointMechanism'.
The reason is that 'block_if_requested' has been replaced by 'process_if_requested' after JDK-8252661.

The fix just replaces 'block_if_requested' with 'process_if_requested'.

Thanks.
Best regards,
Jie

/issue add DK-8252887
/cc hotspot-runtime
/test
/summary
8252887: Zero VM is broken after JDK-8252661


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/64/head:pull/64
$ git checkout pull/64

@bridgekeeper
Copy link
Copy Markdown

bridgekeeper Bot commented Sep 7, 2020

👋 Welcome back jiefu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk Bot added the rfr Pull request is ready for review label Sep 7, 2020
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Sep 7, 2020

@DamonFool The issue identifier DK-8252887 is invalid: This PR can only solve issues in the JDK project.

@openjdk openjdk Bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Sep 7, 2020
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Sep 7, 2020

@DamonFool
The hotspot-runtime label was successfully added.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Sep 7, 2020

@DamonFool Setting summary to 8252887: Zero VM is broken after JDK-8252661

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Sep 7, 2020

@DamonFool To determine the appropriate audience for reviewing this pull request, one or more labels corresponding to different subsystems will normally be applied automatically. However, no automatic labelling rule matches the changes in this pull request.

In order to have an RFR email automatically sent to the correct mailing list, you will need to add one or more labels manually using the /label add "label" command. The following labels are valid: 2d awt beans build compiler core-libs hotspot hotspot-compiler hotspot-gc hotspot-jfr hotspot-runtime i18n javadoc jdk jmx kulla net nio security serviceability shenandoah sound swing.

@mlbridge
Copy link
Copy Markdown

mlbridge Bot commented Sep 7, 2020

Webrevs

@DamonFool
Copy link
Copy Markdown
Member Author

/label add hotspot-runtime
/issue add JDK-8252887
/cc hotspot-runtime

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Sep 7, 2020

@DamonFool The hotspot-runtime label was already applied.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Sep 8, 2020

@DamonFool This issue is referenced in the PR title - it will now be updated.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Sep 8, 2020

@DamonFool The hotspot-runtime label was already applied.

Copy link
Copy Markdown
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and trivial.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Sep 8, 2020

@DamonFool This change now passes all automated pre-integration checks. When the change also fulfills all project specific requirements, type /integrate in a new comment to proceed. After integration, the commit message will be:

8252887: Zero VM is broken after JDK-8252661

Zero VM is broken due to 'block_if_requested' is not a member of 'SafepointMechanism'.

Reviewed-by: dholmes
  • If you would like to add a summary, use the /summary command.
  • To credit additional contributors, use the /contributor command.
  • To add additional solved issues, use the /issue command.

There are currently no new commits on the master branch since the last update of the source branch of this PR. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you would like to avoid potential automatic rebasing, specify the current head hash when integrating, like this: /integrate 7686e871551c745f062f084eaaae594c857066c0.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk Bot added the ready Pull request is ready to be integrated label Sep 8, 2020
@DamonFool
Copy link
Copy Markdown
Member Author

/summary Zero VM is broken due to 'block_if_requested' is not a member of 'SafepointMechanism'.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Sep 8, 2020

@DamonFool Updating existing summary to Zero VM is broken due to 'block_if_requested' is not a member of 'SafepointMechanism'.

@DamonFool
Copy link
Copy Markdown
Member Author

Thanks David for your review.

@DamonFool
Copy link
Copy Markdown
Member Author

/integrate

@openjdk openjdk Bot closed this Sep 8, 2020
@openjdk openjdk Bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated labels Sep 8, 2020
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Sep 8, 2020

@DamonFool Pushed as commit 891886b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@openjdk openjdk Bot removed the rfr Pull request is ready for review label Sep 8, 2020
@DamonFool DamonFool deleted the JDK-8252887 branch September 8, 2020 03:00
lewurm added a commit to lewurm/openjdk that referenced this pull request Oct 6, 2021
Restore looks like this now:
```
  0x0000000106e4dfcc:   movk    x9, #0x5e4, lsl openjdk#16
  0x0000000106e4dfd0:   movk    x9, #0x1, lsl openjdk#32
  0x0000000106e4dfd4:   blr x9
  0x0000000106e4dfd8:   ldp x2, x3, [sp, openjdk#16]
  0x0000000106e4dfdc:   ldp x4, x5, [sp, openjdk#32]
  0x0000000106e4dfe0:   ldp x6, x7, [sp, openjdk#48]
  0x0000000106e4dfe4:   ldp x8, x9, [sp, openjdk#64]
  0x0000000106e4dfe8:   ldp x10, x11, [sp, openjdk#80]
  0x0000000106e4dfec:   ldp x12, x13, [sp, openjdk#96]
  0x0000000106e4dff0:   ldp x14, x15, [sp, openjdk#112]
  0x0000000106e4dff4:   ldp x16, x17, [sp, openjdk#128]
  0x0000000106e4dff8:   ldp x0, x1, [sp], openjdk#144
  0x0000000106e4dffc:   ldp xzr, x19, [sp], openjdk#16
  0x0000000106e4e000:   ldp x22, x23, [sp, openjdk#16]
  0x0000000106e4e004:   ldp x24, x25, [sp, openjdk#32]
  0x0000000106e4e008:   ldp x26, x27, [sp, openjdk#48]
  0x0000000106e4e00c:   ldp x28, x29, [sp, openjdk#64]
  0x0000000106e4e010:   ldp x30, xzr, [sp, openjdk#80]
  0x0000000106e4e014:   ldp x20, x21, [sp], openjdk#96
  0x0000000106e4e018:   ldur    x12, [x29, #-24]
  0x0000000106e4e01c:   ldr x22, [x12, openjdk#16]
  0x0000000106e4e020:   add x22, x22, #0x30
  0x0000000106e4e024:   ldr x8, [x28, openjdk#8]
```
caojoshua added a commit to caojoshua/jdk that referenced this pull request Jul 28, 2023
dansmithcode pushed a commit to dansmithcode/jdk that referenced this pull request Aug 31, 2024
pf0n pushed a commit to pf0n/jdk that referenced this pull request Jul 9, 2025
fg1417 added a commit to fg1417/jdk that referenced this pull request Mar 13, 2026
…marks after JDK-8340093

JDK-8340093 enabled auto-vectorization for more reduction loop cases
using 128-bit vector operations. As a result, the following
microbenchmarks are negatively affected:
VectorReduction2.longAddDotProduct
VectorReduction2.longMulDotProduct
VectorReduction2.longMulSimple

This patch fixes these regressions.

1. Improve code generation for MLA

For longAddDotProduct[1], the current implementation generates
vectorized code similar to:
```
ldr     q17, [x12, openjdk#16]
ldr     q18, [x11, openjdk#16]
mla     z16.d, p7/m, z17.d, z18.d
ldr     q17, [x11, openjdk#32]
ldr     q18, [x12, openjdk#32]
mla     z16.d, p7/m, z18.d, z17.d
...
ldr     q17, [x11, openjdk#128]
ldr     q18, [x12, openjdk#128]
mla     z16.d, p7/m, z18.d, z17.d
```
`z16` is the third source and destination register. There are
true dependencies between consecutive mla[2] instructions.
As a result, this vectorized code performs significantly worse
than the scalar version due to limited instruction-level
parallelism.

These mla instructions are produced by a backend match rule that
fuses AddVL and MulVL into a vector MLA[3]. In this situation,
avoiding instruction fusion and instead generating separate SVE
mul and add instructions can improve instruction-level parallelism
and overall performance.

To address this, this patch introduces
is_multiply_accumulate_candidate() to determine whether a node is
a suitable vector MLA candidate. For node patterns that may
increase execution latency, instruction fusion into MLA is
disabled.

After applying this patch, the generated assembly looks like:
```
ldr     q17, [x12, openjdk#16]
ldr     q18, [x11, openjdk#16]
ldr     q19, [x11, openjdk#32]
mul     z17.d, p7/m, z17.d, z18.d
ldr     q18, [x12, openjdk#32]
ldr     q20, [x11, openjdk#48]
mul     z18.d, p7/m, z18.d, z19.d
ldr     q19, [x12, openjdk#48]
add     v16.2d, v17.2d, v16.2d
ldr     q17, [x11, openjdk#64]
add     v16.2d, v18.2d, v16.2d
ldr     q18, [x12, openjdk#64]
mul     z19.d, p7/m, z19.d, z20.d
ldr     q20, [x12, openjdk#80]
add     v16.2d, v19.2d, v16.2d
```
This sequence exposes more independent operations and reduces
dependency chains, leading to improved performance.

Since SVE mls instructions may suffer from similar issues, the
same logic has been extended to cover MLS as well. Additional
microbenchmarks have been added accordingly.

2. Avoid vectorizing MUL-heavy loops

For longMulSimple[3], the generated vectorized code exhibits
long dependency chains of SVE mul instructions, which results
in worse performance than scalar execution:
```
ldr     q17, [x1, openjdk#16]
ldr     q18, [x1, openjdk#32]
mul     z17.d, p7/m, z17.d, z16.d
ldr     q16, [x1, openjdk#48]
mul     z17.d, p7/m, z17.d, z18.d
ldr     q18, [x1, openjdk#64]
mul     z16.d, p7/m, z16.d, z17.d
...
ldr     q16, [x1, openjdk#256]
mul     z17.d, p7/m, z17.d, z19.d
mul     z16.d, p7/m, z16.d, z17.d
```

To address this, the patch introduces a platform-specific interface:
`VTransformElementWiseVectorNode::node_weight()`.

For 128-bit operations, this interface detects consecutive vector
long multiply operations and increases the node weight to 4, which is
the minimum value required for the cost model to avoid vectorization
on both 128-bit and 256-bit platforms.

3. Results
Performance measurements on 128-bit and 256-bit SVE machines show that
these changes avoid harmful vectorization and improve overall
performance for the affected benchmarks.

patch: results obtained after applying this patch, using default
auto-vectorization settings (-XX:+UseSuperWord,
-XX:AutoVectorizationOverrideProfitability=1, cost-model decision mode)

main-default: results on mainline using the same default
auto-vectorization settings (-XX:+UseSuperWord,
-XX:AutoVectorizationOverrideProfitability=1, cost-model decision mode)

main-scalar: results on mainline with -XX:+UseSuperWord and
-XX:AutoVectorizationOverrideProfitability=0 (force scalar code)

The table below reports relative performance changes:
p/m1 = (patch - main-default) / main-default
p/m0 = (patch - main-scalar) / main-scalar

Mode: avgt
Unit: ns/op

Arm Neoverse V2 machine (128 bit SVE):
Benchmark                                         (COUNT)    p/m1       p/m0
TypeVectorOperationsSuperWord.mlaL                  512     0.16%      -50.42%
TypeVectorOperationsSuperWord.mlaL                  2048    0.26%      -56.70%
TypeVectorOperationsSuperWord.mlsL                  512     -0.10%     -50.37%
TypeVectorOperationsSuperWord.mlsL                  2048    0.14%      -56.82%
TypeVectorOperationsSuperWord.mulBigL               512     0.06%      -25.77%
TypeVectorOperationsSuperWord.mulBigL               2048    -0.02%     -19.63%
TypeVectorOperationsSuperWord.mulI                  512     0.63%      -63.44%
TypeVectorOperationsSuperWord.mulI                  2048    0.28%      -63.07%
TypeVectorOperationsSuperWord.mulL                  512     -0.03%     -50.47%
TypeVectorOperationsSuperWord.mulL                  2048    0.29%      -50.82%
TypeVectorOperationsSuperWord.mulMediumL            512     -0.19%     -27.54%
TypeVectorOperationsSuperWord.mulMediumL            2048    0.24%      -25.18%
TypeVectorOperationsSuperWord.mulMlaLDependent      512     0.30%      -28.70%
TypeVectorOperationsSuperWord.mulMlaLDependent      2048    0.12%      -26.74%
TypeVectorOperationsSuperWord.mulMlaLIndependent    512     -10.43%    -43.09%
TypeVectorOperationsSuperWord.mulMlaLIndependent    2048    -14.82%    -42.68%
VectorReduction2.WithSuperword.longAddBig           2048    -15.15%    -44.01%
VectorReduction2.WithSuperword.longAddBigMixSub1    2048    -6.19%     -43.92%
VectorReduction2.WithSuperword.longAddBigMixSub2    2048    -15.18%    -43.90%
VectorReduction2.WithSuperword.longAddBigMixSub3    2048    -5.74%     -43.87%
VectorReduction2.WithSuperword.longAddDotProduct    2048    -33.36%    -18.16%
VectorReduction2.WithSuperword.longAddSimple        2048    -0.02%     -6.72%
VectorReduction2.WithSuperword.longAndBig           2048    -16.32%    -44.06%
VectorReduction2.WithSuperword.longAndDotProduct    2048    -0.01%     -3.74%
VectorReduction2.WithSuperword.longAndSimple        2048    0.00%      -6.35%
VectorReduction2.WithSuperword.longMaxBig           2048    -15.29%    -52.09%
VectorReduction2.WithSuperword.longMaxDotProduct    2048    -0.03%     -52.08%
VectorReduction2.WithSuperword.longMaxSimple        2048    -0.40%     -52.74%
VectorReduction2.WithSuperword.longMinBig           2048    -14.88%    -51.70%
VectorReduction2.WithSuperword.longMinDotProduct    2048    0.01%      -52.21%
VectorReduction2.WithSuperword.longMinSimple        2048    0.26%      -52.88%
VectorReduction2.WithSuperword.longMulBig           2048    -2.21%     -0.07%
VectorReduction2.WithSuperword.longMulDotProduct    2048    -15.47%    0.00%
VectorReduction2.WithSuperword.longMulSimple        2048    -17.87%    -0.33%
VectorReduction2.WithSuperword.longOrBig            2048    -15.23%    -43.94%
VectorReduction2.WithSuperword.longOrDotProduct     2048    -0.01%     -3.83%
VectorReduction2.WithSuperword.longOrSimple         2048    -0.01%     -6.60%
VectorReduction2.WithSuperword.longXorBig           2048    -10.03%    -41.62%
VectorReduction2.WithSuperword.longXorDotProduct    2048    0.01%      -38.61%
VectorReduction2.WithSuperword.longXorSimple        2048    0.02%      -53.18%

Arm Neoverse V1 machine (256 bit SVE):
Note: In the current mainline code, the AArch64 backend supports
only 128-bit multiply long operations. Auto-vectorization accounts
for this backend constraint and splits 256-bit vectors into 128-bit
chunks so that the loop can still be vectorized. This is why
256-bit platforms also benefit from this patch.

No obvious performance changes are observed for other benchmarks.

Benchmark                           (COUNT)       p/m1       p/m0
VectorReduction2.longMulDotProduct    2048       -28.23%    0.00%
VectorReduction2.longMulSimple        2048       -19.29%    0.01%

Tier 1 - 3 passed on both aarch64 and x86 platforms.

[1] https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/test/micro/org/openjdk/bench/vm/compiler/VectorReduction2.java#L1096
[2] https://developer.arm.com/documentation/ddi0602/2025-12/SVE-Instructions/MLA--vectors---Multiply-add--predicated--?lang=en
[3] https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2617
[4] https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/test/micro/org/openjdk/bench/vm/compiler/VectorReduction2.java#L1035
ruben-arm added a commit to ruben-arm/jdk that referenced this pull request Mar 30, 2026
Some vector operations do not have inputs and essentially initialize
vectors with a constant value. These operations can be marked for
spilling and subsequently rematerialized at every use. The result of
the transformation might look as follows:
   movi    v16.2d, #0x0
   str     q16, [x16, openjdk#64]
   movi    v16.2d, #0x0
   str     q16, [x16, openjdk#32]
   movi    v16.2d, #0x0
   str     q16, [x16, openjdk#16]
   movi    v16.2d, #0x0
   str     q16, [x16]
   movi    v16.2d, #0x0
   str     q16, [x16, openjdk#48]
   movi    v16.2d, #0x0
   str     q16, [x16, openjdk#112]
   movi    v16.2d, #0x0
   str     q16, [x16, openjdk#80]
   movi    v16.2d, #0x0
   str     q16, [x16, openjdk#96]

Introduce deduplication of these rematerialized vector
constant initializations reducing the above sequence to:
   movi    v16.2d, #0x0
   str     q16, [x16, openjdk#64]
   str     q16, [x16, openjdk#32]
   str     q16, [x16, openjdk#16]
   str     q16, [x16]
   str     q16, [x16, openjdk#48]
   str     q16, [x16, openjdk#112]
   str     q16, [x16, openjdk#80]
   str     q16, [x16, openjdk#96]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

2 participants