8252887: Zero VM is broken after JDK-8252661 by DamonFool · Pull Request #64 · openjdk/jdk

DamonFool · 2020-09-07T23:54:06Z

Hi all,

JBS: https://bugs.openjdk.java.net/browse/JDK-8252887

Zero VM is broken due to 'block_if_requested' is not a member of 'SafepointMechanism'.
The reason is that 'block_if_requested' has been replaced by 'process_if_requested' after JDK-8252661.

The fix just replaces 'block_if_requested' with 'process_if_requested'.

Thanks.
Best regards,
Jie

/issue add DK-8252887
/cc hotspot-runtime
/test
/summary
8252887: Zero VM is broken after JDK-8252661

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8252887: Zero VM is broken after JDK-8252661

Reviewers

David Holmes (@dholmes-ora - Reviewer)

Download

$ git fetch https://git.openjdk.java.net/jdk pull/64/head:pull/64
$ git checkout pull/64

Reviewed-by:

bridgekeeper · 2020-09-07T23:55:23Z

👋 Welcome back jiefu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2020-09-07T23:55:51Z

@DamonFool The issue identifier DK-8252887 is invalid: This PR can only solve issues in the JDK project.

openjdk · 2020-09-07T23:56:16Z

@DamonFool
The hotspot-runtime label was successfully added.

openjdk · 2020-09-07T23:56:35Z

@DamonFool Setting summary to 8252887: Zero VM is broken after JDK-8252661

openjdk · 2020-09-07T23:57:08Z

@DamonFool To determine the appropriate audience for reviewing this pull request, one or more labels corresponding to different subsystems will normally be applied automatically. However, no automatic labelling rule matches the changes in this pull request.

In order to have an RFR email automatically sent to the correct mailing list, you will need to add one or more labels manually using the /label add "label" command. The following labels are valid: 2d awt beans build compiler core-libs hotspot hotspot-compiler hotspot-gc hotspot-jfr hotspot-runtime i18n javadoc jdk jmx kulla net nio security serviceability shenandoah sound swing.

mlbridge · 2020-09-07T23:59:27Z

Webrevs

00: Full (4972875)

DamonFool · 2020-09-07T23:59:47Z

/label add hotspot-runtime
/issue add JDK-8252887
/cc hotspot-runtime

openjdk · 2020-09-07T23:59:56Z

@DamonFool The hotspot-runtime label was already applied.

openjdk · 2020-09-08T00:00:00Z

@DamonFool This issue is referenced in the PR title - it will now be updated.

openjdk · 2020-09-08T00:00:04Z

@DamonFool The hotspot-runtime label was already applied.

dholmes-ora

Looks good and trivial.

openjdk · 2020-09-08T00:15:49Z

@DamonFool This change now passes all automated pre-integration checks. When the change also fulfills all project specific requirements, type /integrate in a new comment to proceed. After integration, the commit message will be:

8252887: Zero VM is broken after JDK-8252661

Zero VM is broken due to 'block_if_requested' is not a member of 'SafepointMechanism'.

Reviewed-by: dholmes

If you would like to add a summary, use the /summary command.
To credit additional contributors, use the /contributor command.
To add additional solved issues, use the /issue command.

There are currently no new commits on the master branch since the last update of the source branch of this PR. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you would like to avoid potential automatic rebasing, specify the current head hash when integrating, like this: /integrate 7686e871551c745f062f084eaaae594c857066c0.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

DamonFool · 2020-09-08T02:24:47Z

/summary Zero VM is broken due to 'block_if_requested' is not a member of 'SafepointMechanism'.

openjdk · 2020-09-08T02:25:19Z

@DamonFool Updating existing summary to Zero VM is broken due to 'block_if_requested' is not a member of 'SafepointMechanism'.

DamonFool · 2020-09-08T02:34:01Z

Thanks David for your review.

DamonFool · 2020-09-08T02:34:58Z

/integrate

openjdk · 2020-09-08T02:36:01Z

@DamonFool Pushed as commit 891886b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Restore looks like this now: ``` 0x0000000106e4dfcc: movk x9, #0x5e4, lsl openjdk#16 0x0000000106e4dfd0: movk x9, #0x1, lsl openjdk#32 0x0000000106e4dfd4: blr x9 0x0000000106e4dfd8: ldp x2, x3, [sp, openjdk#16] 0x0000000106e4dfdc: ldp x4, x5, [sp, openjdk#32] 0x0000000106e4dfe0: ldp x6, x7, [sp, openjdk#48] 0x0000000106e4dfe4: ldp x8, x9, [sp, openjdk#64] 0x0000000106e4dfe8: ldp x10, x11, [sp, openjdk#80] 0x0000000106e4dfec: ldp x12, x13, [sp, openjdk#96] 0x0000000106e4dff0: ldp x14, x15, [sp, openjdk#112] 0x0000000106e4dff4: ldp x16, x17, [sp, openjdk#128] 0x0000000106e4dff8: ldp x0, x1, [sp], openjdk#144 0x0000000106e4dffc: ldp xzr, x19, [sp], openjdk#16 0x0000000106e4e000: ldp x22, x23, [sp, openjdk#16] 0x0000000106e4e004: ldp x24, x25, [sp, openjdk#32] 0x0000000106e4e008: ldp x26, x27, [sp, openjdk#48] 0x0000000106e4e00c: ldp x28, x29, [sp, openjdk#64] 0x0000000106e4e010: ldp x30, xzr, [sp, openjdk#80] 0x0000000106e4e014: ldp x20, x21, [sp], openjdk#96 0x0000000106e4e018: ldur x12, [x29, #-24] 0x0000000106e4e01c: ldr x22, [x12, openjdk#16] 0x0000000106e4e020: add x22, x22, #0x30 0x0000000106e4e024: ldr x8, [x28, openjdk#8] ```

…ility. (openjdk#64)

Add percentile accounting

…marks after JDK-8340093 JDK-8340093 enabled auto-vectorization for more reduction loop cases using 128-bit vector operations. As a result, the following microbenchmarks are negatively affected: VectorReduction2.longAddDotProduct VectorReduction2.longMulDotProduct VectorReduction2.longMulSimple This patch fixes these regressions. 1. Improve code generation for MLA For longAddDotProduct[1], the current implementation generates vectorized code similar to: ``` ldr q17, [x12, openjdk#16] ldr q18, [x11, openjdk#16] mla z16.d, p7/m, z17.d, z18.d ldr q17, [x11, openjdk#32] ldr q18, [x12, openjdk#32] mla z16.d, p7/m, z18.d, z17.d ... ldr q17, [x11, openjdk#128] ldr q18, [x12, openjdk#128] mla z16.d, p7/m, z18.d, z17.d ``` `z16` is the third source and destination register. There are true dependencies between consecutive mla[2] instructions. As a result, this vectorized code performs significantly worse than the scalar version due to limited instruction-level parallelism. These mla instructions are produced by a backend match rule that fuses AddVL and MulVL into a vector MLA[3]. In this situation, avoiding instruction fusion and instead generating separate SVE mul and add instructions can improve instruction-level parallelism and overall performance. To address this, this patch introduces is_multiply_accumulate_candidate() to determine whether a node is a suitable vector MLA candidate. For node patterns that may increase execution latency, instruction fusion into MLA is disabled. After applying this patch, the generated assembly looks like: ``` ldr q17, [x12, openjdk#16] ldr q18, [x11, openjdk#16] ldr q19, [x11, openjdk#32] mul z17.d, p7/m, z17.d, z18.d ldr q18, [x12, openjdk#32] ldr q20, [x11, openjdk#48] mul z18.d, p7/m, z18.d, z19.d ldr q19, [x12, openjdk#48] add v16.2d, v17.2d, v16.2d ldr q17, [x11, openjdk#64] add v16.2d, v18.2d, v16.2d ldr q18, [x12, openjdk#64] mul z19.d, p7/m, z19.d, z20.d ldr q20, [x12, openjdk#80] add v16.2d, v19.2d, v16.2d ``` This sequence exposes more independent operations and reduces dependency chains, leading to improved performance. Since SVE mls instructions may suffer from similar issues, the same logic has been extended to cover MLS as well. Additional microbenchmarks have been added accordingly. 2. Avoid vectorizing MUL-heavy loops For longMulSimple[3], the generated vectorized code exhibits long dependency chains of SVE mul instructions, which results in worse performance than scalar execution: ``` ldr q17, [x1, openjdk#16] ldr q18, [x1, openjdk#32] mul z17.d, p7/m, z17.d, z16.d ldr q16, [x1, openjdk#48] mul z17.d, p7/m, z17.d, z18.d ldr q18, [x1, openjdk#64] mul z16.d, p7/m, z16.d, z17.d ... ldr q16, [x1, openjdk#256] mul z17.d, p7/m, z17.d, z19.d mul z16.d, p7/m, z16.d, z17.d ``` To address this, the patch introduces a platform-specific interface: `VTransformElementWiseVectorNode::node_weight()`. For 128-bit operations, this interface detects consecutive vector long multiply operations and increases the node weight to 4, which is the minimum value required for the cost model to avoid vectorization on both 128-bit and 256-bit platforms. 3. Results Performance measurements on 128-bit and 256-bit SVE machines show that these changes avoid harmful vectorization and improve overall performance for the affected benchmarks. patch: results obtained after applying this patch, using default auto-vectorization settings (-XX:+UseSuperWord, -XX:AutoVectorizationOverrideProfitability=1, cost-model decision mode) main-default: results on mainline using the same default auto-vectorization settings (-XX:+UseSuperWord, -XX:AutoVectorizationOverrideProfitability=1, cost-model decision mode) main-scalar: results on mainline with -XX:+UseSuperWord and -XX:AutoVectorizationOverrideProfitability=0 (force scalar code) The table below reports relative performance changes: p/m1 = (patch - main-default) / main-default p/m0 = (patch - main-scalar) / main-scalar Mode: avgt Unit: ns/op Arm Neoverse V2 machine (128 bit SVE): Benchmark (COUNT) p/m1 p/m0 TypeVectorOperationsSuperWord.mlaL 512 0.16% -50.42% TypeVectorOperationsSuperWord.mlaL 2048 0.26% -56.70% TypeVectorOperationsSuperWord.mlsL 512 -0.10% -50.37% TypeVectorOperationsSuperWord.mlsL 2048 0.14% -56.82% TypeVectorOperationsSuperWord.mulBigL 512 0.06% -25.77% TypeVectorOperationsSuperWord.mulBigL 2048 -0.02% -19.63% TypeVectorOperationsSuperWord.mulI 512 0.63% -63.44% TypeVectorOperationsSuperWord.mulI 2048 0.28% -63.07% TypeVectorOperationsSuperWord.mulL 512 -0.03% -50.47% TypeVectorOperationsSuperWord.mulL 2048 0.29% -50.82% TypeVectorOperationsSuperWord.mulMediumL 512 -0.19% -27.54% TypeVectorOperationsSuperWord.mulMediumL 2048 0.24% -25.18% TypeVectorOperationsSuperWord.mulMlaLDependent 512 0.30% -28.70% TypeVectorOperationsSuperWord.mulMlaLDependent 2048 0.12% -26.74% TypeVectorOperationsSuperWord.mulMlaLIndependent 512 -10.43% -43.09% TypeVectorOperationsSuperWord.mulMlaLIndependent 2048 -14.82% -42.68% VectorReduction2.WithSuperword.longAddBig 2048 -15.15% -44.01% VectorReduction2.WithSuperword.longAddBigMixSub1 2048 -6.19% -43.92% VectorReduction2.WithSuperword.longAddBigMixSub2 2048 -15.18% -43.90% VectorReduction2.WithSuperword.longAddBigMixSub3 2048 -5.74% -43.87% VectorReduction2.WithSuperword.longAddDotProduct 2048 -33.36% -18.16% VectorReduction2.WithSuperword.longAddSimple 2048 -0.02% -6.72% VectorReduction2.WithSuperword.longAndBig 2048 -16.32% -44.06% VectorReduction2.WithSuperword.longAndDotProduct 2048 -0.01% -3.74% VectorReduction2.WithSuperword.longAndSimple 2048 0.00% -6.35% VectorReduction2.WithSuperword.longMaxBig 2048 -15.29% -52.09% VectorReduction2.WithSuperword.longMaxDotProduct 2048 -0.03% -52.08% VectorReduction2.WithSuperword.longMaxSimple 2048 -0.40% -52.74% VectorReduction2.WithSuperword.longMinBig 2048 -14.88% -51.70% VectorReduction2.WithSuperword.longMinDotProduct 2048 0.01% -52.21% VectorReduction2.WithSuperword.longMinSimple 2048 0.26% -52.88% VectorReduction2.WithSuperword.longMulBig 2048 -2.21% -0.07% VectorReduction2.WithSuperword.longMulDotProduct 2048 -15.47% 0.00% VectorReduction2.WithSuperword.longMulSimple 2048 -17.87% -0.33% VectorReduction2.WithSuperword.longOrBig 2048 -15.23% -43.94% VectorReduction2.WithSuperword.longOrDotProduct 2048 -0.01% -3.83% VectorReduction2.WithSuperword.longOrSimple 2048 -0.01% -6.60% VectorReduction2.WithSuperword.longXorBig 2048 -10.03% -41.62% VectorReduction2.WithSuperword.longXorDotProduct 2048 0.01% -38.61% VectorReduction2.WithSuperword.longXorSimple 2048 0.02% -53.18% Arm Neoverse V1 machine (256 bit SVE): Note: In the current mainline code, the AArch64 backend supports only 128-bit multiply long operations. Auto-vectorization accounts for this backend constraint and splits 256-bit vectors into 128-bit chunks so that the loop can still be vectorized. This is why 256-bit platforms also benefit from this patch. No obvious performance changes are observed for other benchmarks. Benchmark (COUNT) p/m1 p/m0 VectorReduction2.longMulDotProduct 2048 -28.23% 0.00% VectorReduction2.longMulSimple 2048 -19.29% 0.01% Tier 1 - 3 passed on both aarch64 and x86 platforms. [1] https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/test/micro/org/openjdk/bench/vm/compiler/VectorReduction2.java#L1096 [2] https://developer.arm.com/documentation/ddi0602/2025-12/SVE-Instructions/MLA--vectors---Multiply-add--predicated--?lang=en [3] https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2617 [4] https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/test/micro/org/openjdk/bench/vm/compiler/VectorReduction2.java#L1035

Some vector operations do not have inputs and essentially initialize vectors with a constant value. These operations can be marked for spilling and subsequently rematerialized at every use. The result of the transformation might look as follows: movi v16.2d, #0x0 str q16, [x16, openjdk#64] movi v16.2d, #0x0 str q16, [x16, openjdk#32] movi v16.2d, #0x0 str q16, [x16, openjdk#16] movi v16.2d, #0x0 str q16, [x16] movi v16.2d, #0x0 str q16, [x16, openjdk#48] movi v16.2d, #0x0 str q16, [x16, openjdk#112] movi v16.2d, #0x0 str q16, [x16, openjdk#80] movi v16.2d, #0x0 str q16, [x16, openjdk#96] Introduce deduplication of these rematerialized vector constant initializations reducing the above sequence to: movi v16.2d, #0x0 str q16, [x16, openjdk#64] str q16, [x16, openjdk#32] str q16, [x16, openjdk#16] str q16, [x16] str q16, [x16, openjdk#48] str q16, [x16, openjdk#112] str q16, [x16, openjdk#80] str q16, [x16, openjdk#96]

8252887: Zero VM is broken after JDK-8252661

4972875

Reviewed-by:

openjdk Bot added the rfr Pull request is ready for review label Sep 7, 2020

openjdk Bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Sep 7, 2020

dholmes-ora approved these changes Sep 8, 2020

View reviewed changes

openjdk Bot added the ready Pull request is ready to be integrated label Sep 8, 2020

openjdk Bot closed this Sep 8, 2020

openjdk Bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated labels Sep 8, 2020

openjdk Bot removed the rfr Pull request is ready for review label Sep 8, 2020

DamonFool deleted the JDK-8252887 branch September 8, 2020 03:00

mlbridge Bot mentioned this pull request Nov 9, 2020

8255949: AArch64: Add support for vectorized shift right and accumulate #1087

Closed

3 tasks

caojoshua added a commit to caojoshua/jdk that referenced this pull request Jul 28, 2023

[PEA] mark monitor inputs as escaped (openjdk#64)

b8e8eb2

dansmithcode pushed a commit to dansmithcode/jdk that referenced this pull request Aug 31, 2024

7903546: jdec,jdis: Enhance decompiler outputs flexibility and readab…

944d9b7

…ility. (openjdk#64)

pf0n pushed a commit to pf0n/jdk that referenced this pull request Jul 9, 2025

Merge pull request openjdk#64 from kdnilsen/add-percentile-accounting

7f1b2a1

Add percentile accounting

Conversation

DamonFool commented Sep 7, 2020 • edited by openjdk Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Download

Uh oh!

bridgekeeper Bot commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

mlbridge Bot commented Sep 7, 2020

Webrevs

Uh oh!

DamonFool commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 8, 2020

Uh oh!

openjdk Bot commented Sep 8, 2020

Uh oh!

dholmes-ora left a comment

Choose a reason for hiding this comment

Uh oh!

openjdk Bot commented Sep 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DamonFool commented Sep 8, 2020

Uh oh!

openjdk Bot commented Sep 8, 2020

Uh oh!

DamonFool commented Sep 8, 2020

Uh oh!

DamonFool commented Sep 8, 2020

Uh oh!

openjdk Bot commented Sep 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

DamonFool commented Sep 7, 2020 •

edited by openjdk Bot

Loading

openjdk Bot commented Sep 8, 2020 •

edited

Loading