8251152: ARM32: jtreg c2 Test8202414 test crash by fzhinkin · Pull Request #48 · openjdk/jdk

fzhinkin · 2020-09-07T11:14:04Z

Some CPUs (like ARM32) does not support unaligned memory accesses. To avoid JVM crashes tests that perform such accesses should be skipped on corresponding platforms.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8251152: ARM32: jtreg c2 Test8202414 test crash

Reviewers

Igor Ignatyev (@iignatev - Reviewer)
clanger - Reviewer ⚠️ Added manually

Download

$ git fetch https://git.openjdk.java.net/jdk pull/48/head:pull/48
$ git checkout pull/48

…upport Reviewed-by: iignatyev, clanger

bridgekeeper · 2020-09-07T11:15:26Z

👋 Welcome back fzhinkin! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2020-09-07T11:16:16Z

@fzhinkin Setting summary to Some CPUs (like ARM32) does not support unaligned memory accesses. To avoid JVM crashes tests that perform such accesses should be skipped on corresponding platforms.

openjdk · 2020-09-07T11:16:35Z

@fzhinkin This issue is referenced in the PR title - it will now be updated.

openjdk · 2020-09-07T11:16:54Z

@fzhinkin The command reviewer cannot be used in the pull request body. Please use it in a new comment.

openjdk · 2020-09-07T11:17:14Z

@fzhinkin The command reviewer cannot be used in the pull request body. Please use it in a new comment.

fzhinkin · 2020-09-07T11:17:26Z

/reviewer add iignatyev
/reviewer add clanger

openjdk · 2020-09-07T11:18:01Z

@fzhinkin
Reviewer iignatyev successfully added.

openjdk · 2020-09-07T11:18:19Z

@fzhinkin This change now passes all automated pre-integration checks. When the change also fulfills all project specific requirements, type /integrate in a new comment to proceed. After integration, the commit message will be:

8251152: ARM32: jtreg c2 Test8202414 test crash

Some CPUs (like ARM32) does not support unaligned memory accesses. To avoid JVM crashes tests that perform such accesses should be skipped on corresponding platforms.

Reviewed-by: iignatyev, clanger

If you would like to add a summary, use the /summary command.
To credit additional contributors, use the /contributor command.
To add additional solved issues, use the /issue command.

Since the source branch of this PR was last updated there have been 2 commits pushed to the master branch:

e0d5b5f: 8252627: Make it safe for JFR thread to read threadObj
e29c3f6: 8252661: Change SafepointMechanism terminology to talk less about "blocking"

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid automatic rebasing, please merge master into your branch, and then specify the current head hash when integrating, like this: /integrate e0d5b5f7f2c7290db0680d060acad66066b83499.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2020-09-07T11:18:24Z

@fzhinkin
Reviewer clanger successfully added.

fzhinkin · 2020-09-07T11:18:50Z

/issue add 8251152

openjdk · 2020-09-07T11:19:09Z

@fzhinkin This issue is referenced in the PR title - it will now be updated.

openjdk · 2020-09-07T11:19:43Z

@fzhinkin The following label will be automatically applied to this pull request: hotspot-compiler.

When this pull request is ready to be reviewed, an RFR email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label (add|remove) "label" command.

mlbridge · 2020-09-07T11:22:52Z

Webrevs

00: Full (e5ff3a6)

iignatev · 2020-09-07T12:53:03Z

+        // memory accesses. This test may cause JVM crash due to
+        // alignment check failure on such CPUs.
+        if (!jdk.internal.misc.Unsafe.getUnsafe().unalignedAccess()) {
+          throw new SkippedException(


nit: I don't think we need a line break here.

Previously, all lines within this file were no longer than 80 chars, so I decided to follow the same restriction.

fzhinkin · 2020-09-07T14:03:46Z

/reviewer remove clanger
/reviewer add @RealCLanger

openjdk · 2020-09-07T14:04:20Z

@fzhinkin
Reviewer clanger successfully removed.

openjdk · 2020-09-07T14:05:10Z

@fzhinkin
Reviewer clanger successfully added.

fzhinkin · 2020-09-07T15:32:23Z

/integrate

openjdk · 2020-09-07T15:33:38Z

@fzhinkin Since your change was applied there have been 2 commits pushed to the master branch:

e0d5b5f: 8252627: Make it safe for JFR thread to read threadObj
e29c3f6: 8252661: Change SafepointMechanism terminology to talk less about "blocking"

Your commit was automatically rebased without conflicts.

Pushed as commit 70d5cac.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Restore looks like this now: ``` 0x0000000106e4dfcc: movk x9, #0x5e4, lsl openjdk#16 0x0000000106e4dfd0: movk x9, #0x1, lsl openjdk#32 0x0000000106e4dfd4: blr x9 0x0000000106e4dfd8: ldp x2, x3, [sp, openjdk#16] 0x0000000106e4dfdc: ldp x4, x5, [sp, openjdk#32] 0x0000000106e4dfe0: ldp x6, x7, [sp, openjdk#48] 0x0000000106e4dfe4: ldp x8, x9, [sp, openjdk#64] 0x0000000106e4dfe8: ldp x10, x11, [sp, openjdk#80] 0x0000000106e4dfec: ldp x12, x13, [sp, openjdk#96] 0x0000000106e4dff0: ldp x14, x15, [sp, openjdk#112] 0x0000000106e4dff4: ldp x16, x17, [sp, openjdk#128] 0x0000000106e4dff8: ldp x0, x1, [sp], openjdk#144 0x0000000106e4dffc: ldp xzr, x19, [sp], openjdk#16 0x0000000106e4e000: ldp x22, x23, [sp, openjdk#16] 0x0000000106e4e004: ldp x24, x25, [sp, openjdk#32] 0x0000000106e4e008: ldp x26, x27, [sp, openjdk#48] 0x0000000106e4e00c: ldp x28, x29, [sp, openjdk#64] 0x0000000106e4e010: ldp x30, xzr, [sp, openjdk#80] 0x0000000106e4e014: ldp x20, x21, [sp], openjdk#96 0x0000000106e4e018: ldur x12, [x29, #-24] 0x0000000106e4e01c: ldr x22, [x12, openjdk#16] 0x0000000106e4e020: add x22, x22, #0x30 0x0000000106e4e024: ldr x8, [x28, openjdk#8] ```

This patch optimizes the backend implementation of VectorMaskToLong for AArch64, given a more efficient approach to mov value bits from predicate register to general purpose register as x86 PMOVMSK[1] does, by using BEXT[2] which is available in SVE2. With this patch, the final code (input mask is byte type with SPECIESE_512, generated on an SVE vector reg size of 512-bit QEMU emulator) changes as below: Before: mov z16.b, p0/z, #1 fmov x0, d16 orr x0, x0, x0, lsr openjdk#7 orr x0, x0, x0, lsr openjdk#14 orr x0, x0, x0, lsr openjdk#28 and x0, x0, #0xff fmov x8, v16.d[1] orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#8 orr x8, xzr, #0x2 whilele p1.d, xzr, x8 lastb x8, p1, z16.d orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#16 orr x8, xzr, #0x3 whilele p1.d, xzr, x8 lastb x8, p1, z16.d orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#24 orr x8, xzr, #0x4 whilele p1.d, xzr, x8 lastb x8, p1, z16.d orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#32 mov x8, #0x5 whilele p1.d, xzr, x8 lastb x8, p1, z16.d orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#40 orr x8, xzr, #0x6 whilele p1.d, xzr, x8 lastb x8, p1, z16.d orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#48 orr x8, xzr, #0x7 whilele p1.d, xzr, x8 lastb x8, p1, z16.d orr x8, x8, x8, lsr openjdk#7 orr x8, x8, x8, lsr openjdk#14 orr x8, x8, x8, lsr openjdk#28 and x8, x8, #0xff orr x0, x0, x8, lsl openjdk#56 After: mov z16.b, p0/z, #1 mov z17.b, #1 bext z16.d, z16.d, z17.d mov z17.d, #0 uzp1 z16.s, z16.s, z17.s uzp1 z16.h, z16.h, z17.h uzp1 z16.b, z16.b, z17.b mov x0, v16.d[0] [1] https://www.felixcloutier.com/x86/pmovmskb [2] https://developer.arm.com/documentation/ddi0602/2020-12/SVE-Instructions/BEXT--Gather-lower-bits-from-positions-selected-by-bitmask- Change-Id: Ia983a20c89f76403e557ac21328f2f2e05dd08e0

Co-authored-by: Xin Liu <xxinliu@amazon.com>

…ng into ldp/stp on AArch64 Macro-assembler on aarch64 can merge adjacent loads or stores into ldp/stp[1]. For example, it can merge: ``` str w20, [sp, openjdk#16] str w10, [sp, openjdk#20] ``` into ``` stp w20, w10, [sp, openjdk#16] ``` But C2 may generate a sequence like: ``` str x21, [sp, openjdk#8] str w20, [sp, openjdk#16] str x19, [sp, openjdk#24] <--- str w10, [sp, openjdk#20] <--- Before sorting str x11, [sp, openjdk#40] str w13, [sp, openjdk#48] str x16, [sp, openjdk#56] ``` We can't do any merging for non-adjacent loads or stores. The patch is to sort the spilling or unspilling sequence in the order of offset during instruction scheduling and bundling phase. After that, we can get a new sequence: ``` str x21, [sp, openjdk#8] str w20, [sp, openjdk#16] str w10, [sp, openjdk#20] <--- str x19, [sp, openjdk#24] <--- After sorting str x11, [sp, openjdk#40] str w13, [sp, openjdk#48] str x16, [sp, openjdk#56] ``` Then macro-assembler can do ld/st merging: ``` str x21, [sp, openjdk#8] stp w20, w10, [sp, openjdk#16] <--- Merged str x19, [sp, openjdk#24] str x11, [sp, openjdk#40] str w13, [sp, openjdk#48] str x16, [sp, openjdk#56] ``` To justify the patch, we run `HelloWorld.java` ``` public class HelloWorld { public static void main(String [] args) { System.out.println("Hello World!"); } } ``` with `java -Xcomp -XX:-TieredCompilation HelloWorld`. Before the patch, macro-assembler can do ld/st merging for 3688 times. After the patch, the number of ld/st merging increases to 3871 times, by ~5 %. Tested tier1~3 on x86 and AArch64. [1] https://github.com/openjdk/jdk/blob/a95062b39a431b4937ab6e9e73de4d2b8ea1ac49/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2079

fix int number to long

JDK-8196064 added support for merging 4- and 8-byte scalar load/store operations into load/store pairs. The AArch64 platform also supports SIMD load/store pairs for SIMD&FP registers[1]. This patch extends that functionality to support merging 4-, 8-, and 16-byte vector load/store operations into vector load/store pairs. For example, given the following assembly: ``` str q16, [x14, openjdk#32] str q16, [x14, openjdk#48] ``` after this change, it can be merged into: ``` stp q16, q16, [x14, openjdk#32] ``` Tier 1~3 tests passed on aarch64 platform. [1] https://developer.arm.com/documentation/ddi0602/2025-12/SIMD-FP-Instructions/LDP--SIMD-FP---Load-pair-of-SIMD-FP-registers-?lang=en

…marks after JDK-8340093 JDK-8340093 enabled auto-vectorization for more reduction loop cases using 128-bit vector operations. As a result, the following microbenchmarks are negatively affected: VectorReduction2.longAddDotProduct VectorReduction2.longMulDotProduct VectorReduction2.longMulSimple This patch fixes these regressions. 1. Improve code generation for MLA For longAddDotProduct[1], the current implementation generates vectorized code similar to: ``` ldr q17, [x12, openjdk#16] ldr q18, [x11, openjdk#16] mla z16.d, p7/m, z17.d, z18.d ldr q17, [x11, openjdk#32] ldr q18, [x12, openjdk#32] mla z16.d, p7/m, z18.d, z17.d ... ldr q17, [x11, openjdk#128] ldr q18, [x12, openjdk#128] mla z16.d, p7/m, z18.d, z17.d ``` `z16` is the third source and destination register. There are true dependencies between consecutive mla[2] instructions. As a result, this vectorized code performs significantly worse than the scalar version due to limited instruction-level parallelism. These mla instructions are produced by a backend match rule that fuses AddVL and MulVL into a vector MLA[3]. In this situation, avoiding instruction fusion and instead generating separate SVE mul and add instructions can improve instruction-level parallelism and overall performance. To address this, this patch introduces is_multiply_accumulate_candidate() to determine whether a node is a suitable vector MLA candidate. For node patterns that may increase execution latency, instruction fusion into MLA is disabled. After applying this patch, the generated assembly looks like: ``` ldr q17, [x12, openjdk#16] ldr q18, [x11, openjdk#16] ldr q19, [x11, openjdk#32] mul z17.d, p7/m, z17.d, z18.d ldr q18, [x12, openjdk#32] ldr q20, [x11, openjdk#48] mul z18.d, p7/m, z18.d, z19.d ldr q19, [x12, openjdk#48] add v16.2d, v17.2d, v16.2d ldr q17, [x11, openjdk#64] add v16.2d, v18.2d, v16.2d ldr q18, [x12, openjdk#64] mul z19.d, p7/m, z19.d, z20.d ldr q20, [x12, openjdk#80] add v16.2d, v19.2d, v16.2d ``` This sequence exposes more independent operations and reduces dependency chains, leading to improved performance. Since SVE mls instructions may suffer from similar issues, the same logic has been extended to cover MLS as well. Additional microbenchmarks have been added accordingly. 2. Avoid vectorizing MUL-heavy loops For longMulSimple[3], the generated vectorized code exhibits long dependency chains of SVE mul instructions, which results in worse performance than scalar execution: ``` ldr q17, [x1, openjdk#16] ldr q18, [x1, openjdk#32] mul z17.d, p7/m, z17.d, z16.d ldr q16, [x1, openjdk#48] mul z17.d, p7/m, z17.d, z18.d ldr q18, [x1, openjdk#64] mul z16.d, p7/m, z16.d, z17.d ... ldr q16, [x1, openjdk#256] mul z17.d, p7/m, z17.d, z19.d mul z16.d, p7/m, z16.d, z17.d ``` To address this, the patch introduces a platform-specific interface: `VTransformElementWiseVectorNode::node_weight()`. For 128-bit operations, this interface detects consecutive vector long multiply operations and increases the node weight to 4, which is the minimum value required for the cost model to avoid vectorization on both 128-bit and 256-bit platforms. 3. Results Performance measurements on 128-bit and 256-bit SVE machines show that these changes avoid harmful vectorization and improve overall performance for the affected benchmarks. patch: results obtained after applying this patch, using default auto-vectorization settings (-XX:+UseSuperWord, -XX:AutoVectorizationOverrideProfitability=1, cost-model decision mode) main-default: results on mainline using the same default auto-vectorization settings (-XX:+UseSuperWord, -XX:AutoVectorizationOverrideProfitability=1, cost-model decision mode) main-scalar: results on mainline with -XX:+UseSuperWord and -XX:AutoVectorizationOverrideProfitability=0 (force scalar code) The table below reports relative performance changes: p/m1 = (patch - main-default) / main-default p/m0 = (patch - main-scalar) / main-scalar Mode: avgt Unit: ns/op Arm Neoverse V2 machine (128 bit SVE): Benchmark (COUNT) p/m1 p/m0 TypeVectorOperationsSuperWord.mlaL 512 0.16% -50.42% TypeVectorOperationsSuperWord.mlaL 2048 0.26% -56.70% TypeVectorOperationsSuperWord.mlsL 512 -0.10% -50.37% TypeVectorOperationsSuperWord.mlsL 2048 0.14% -56.82% TypeVectorOperationsSuperWord.mulBigL 512 0.06% -25.77% TypeVectorOperationsSuperWord.mulBigL 2048 -0.02% -19.63% TypeVectorOperationsSuperWord.mulI 512 0.63% -63.44% TypeVectorOperationsSuperWord.mulI 2048 0.28% -63.07% TypeVectorOperationsSuperWord.mulL 512 -0.03% -50.47% TypeVectorOperationsSuperWord.mulL 2048 0.29% -50.82% TypeVectorOperationsSuperWord.mulMediumL 512 -0.19% -27.54% TypeVectorOperationsSuperWord.mulMediumL 2048 0.24% -25.18% TypeVectorOperationsSuperWord.mulMlaLDependent 512 0.30% -28.70% TypeVectorOperationsSuperWord.mulMlaLDependent 2048 0.12% -26.74% TypeVectorOperationsSuperWord.mulMlaLIndependent 512 -10.43% -43.09% TypeVectorOperationsSuperWord.mulMlaLIndependent 2048 -14.82% -42.68% VectorReduction2.WithSuperword.longAddBig 2048 -15.15% -44.01% VectorReduction2.WithSuperword.longAddBigMixSub1 2048 -6.19% -43.92% VectorReduction2.WithSuperword.longAddBigMixSub2 2048 -15.18% -43.90% VectorReduction2.WithSuperword.longAddBigMixSub3 2048 -5.74% -43.87% VectorReduction2.WithSuperword.longAddDotProduct 2048 -33.36% -18.16% VectorReduction2.WithSuperword.longAddSimple 2048 -0.02% -6.72% VectorReduction2.WithSuperword.longAndBig 2048 -16.32% -44.06% VectorReduction2.WithSuperword.longAndDotProduct 2048 -0.01% -3.74% VectorReduction2.WithSuperword.longAndSimple 2048 0.00% -6.35% VectorReduction2.WithSuperword.longMaxBig 2048 -15.29% -52.09% VectorReduction2.WithSuperword.longMaxDotProduct 2048 -0.03% -52.08% VectorReduction2.WithSuperword.longMaxSimple 2048 -0.40% -52.74% VectorReduction2.WithSuperword.longMinBig 2048 -14.88% -51.70% VectorReduction2.WithSuperword.longMinDotProduct 2048 0.01% -52.21% VectorReduction2.WithSuperword.longMinSimple 2048 0.26% -52.88% VectorReduction2.WithSuperword.longMulBig 2048 -2.21% -0.07% VectorReduction2.WithSuperword.longMulDotProduct 2048 -15.47% 0.00% VectorReduction2.WithSuperword.longMulSimple 2048 -17.87% -0.33% VectorReduction2.WithSuperword.longOrBig 2048 -15.23% -43.94% VectorReduction2.WithSuperword.longOrDotProduct 2048 -0.01% -3.83% VectorReduction2.WithSuperword.longOrSimple 2048 -0.01% -6.60% VectorReduction2.WithSuperword.longXorBig 2048 -10.03% -41.62% VectorReduction2.WithSuperword.longXorDotProduct 2048 0.01% -38.61% VectorReduction2.WithSuperword.longXorSimple 2048 0.02% -53.18% Arm Neoverse V1 machine (256 bit SVE): Note: In the current mainline code, the AArch64 backend supports only 128-bit multiply long operations. Auto-vectorization accounts for this backend constraint and splits 256-bit vectors into 128-bit chunks so that the loop can still be vectorized. This is why 256-bit platforms also benefit from this patch. No obvious performance changes are observed for other benchmarks. Benchmark (COUNT) p/m1 p/m0 VectorReduction2.longMulDotProduct 2048 -28.23% 0.00% VectorReduction2.longMulSimple 2048 -19.29% 0.01% Tier 1 - 3 passed on both aarch64 and x86 platforms. [1] https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/test/micro/org/openjdk/bench/vm/compiler/VectorReduction2.java#L1096 [2] https://developer.arm.com/documentation/ddi0602/2025-12/SVE-Instructions/MLA--vectors---Multiply-add--predicated--?lang=en [3] https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2617 [4] https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/test/micro/org/openjdk/bench/vm/compiler/VectorReduction2.java#L1035

The microbenchmark ArraysFill.testLongFill[1] on 128-bit vector platforms generates vectorized store instructions with non-monotonic memory offsets, e.g.: str q16, [x12, openjdk#80] str q16, [x12, openjdk#48] str q16, [x12, openjdk#128] ... This arises because SuperWord only considers true dependencies when building edges (see [3]), and therefore does not enforce ordering among independent vector memory operations. These nodes are later scheduled using RPO, which can result in an apparently unordered sequence of memory accesses. This patch replaces RPO-based scheduling with a priority-based topological sort to improve ordering and locality. The scheduling policy is: 1. Prefer nodes whose weak predecessors have already been scheduled. 2. Prioritize node types in the following order: scalar operations (loads/stores, address expressions), vector arithmetic, vector loads, vector stores, then others. 3. For independent loads/stores sharing the same base address, prefer ascending offsets. 4. Use VTransformNodeIDX to ensure stable ordering. With this change, the generated code becomes monotonic in memory offsets: str q16, [x12, openjdk#16] str q16, [x12, openjdk#32] str q16, [x12, openjdk#48] ... On Arm Neoverse V2 machine (128 bit SVE), this improves the following benchmarks: TypeVectorOperationsSuperWord.java[2] Benchmark (COUNT) Mode Units Difference absD 512 avgt ns/op -27.05% absD 2048 avgt ns/op -27.05% absL 512 avgt ns/op -24.46% absL 2048 avgt ns/op -27.26% convertD2LBitsRaw 512 avgt ns/op -20.39% convertD2LBitsRaw 2048 avgt ns/op -23.92% convertF2L 512 avgt ns/op -16.82% convertF2L 2048 avgt ns/op -22.60% convertI2D 512 avgt ns/op -12.50% convertI2D 2048 avgt ns/op -17.92% convertLBits2D 512 avgt ns/op -27.13% convertLBits2D 2048 avgt ns/op -31.69% negD 512 avgt ns/op -26.85% negD 2048 avgt ns/op -27.09% ArraysFill.java[1]: Benchmark (size) Mode Units Difference testDoubleFill 250 thrpt ops/ms 26.46% testDoubleFill 266 thrpt ops/ms 32.69% testDoubleFill 511 thrpt ops/ms 33.83% testDoubleFill 2047 thrpt ops/ms 45.35% testDoubleFill 2048 thrpt ops/ms 45.38% testDoubleFill 8195 thrpt ops/ms 49.32% testLongFill 250 thrpt ops/ms 28.12% testLongFill 266 thrpt ops/ms 40.30% testLongFill 511 thrpt ops/ms 34.79% testLongFill 2047 thrpt ops/ms 45.71% testLongFill 2048 thrpt ops/ms 53.07% testLongFill 8195 thrpt ops/ms 49.52% No significant performance changes are observed on wider vector platforms (e.g., 256-bit or 512-bit), where fewer vector operations are generated in SuperWord and scheduling has less impact. [1] https://github.com/openjdk/jdk/blob/34a0235ed30141e92f064d30fe4378709ea0e135/test/micro/org/openjdk/bench/java/util/ArraysFill.java#L92 [2] https://github.com/openjdk/jdk/blob/34a0235ed30141e92f064d30fe4378709ea0e135/test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java [3] https://github.com/openjdk/jdk/blob/34a0235ed30141e92f064d30fe4378709ea0e135/src/hotspot/share/opto/superwordVTransformBuilder.cpp#L99

Some vector operations do not have inputs and essentially initialize vectors with a constant value. These operations can be marked for spilling and subsequently rematerialized at every use. The result of the transformation might look as follows: movi v16.2d, #0x0 str q16, [x16, openjdk#64] movi v16.2d, #0x0 str q16, [x16, openjdk#32] movi v16.2d, #0x0 str q16, [x16, openjdk#16] movi v16.2d, #0x0 str q16, [x16] movi v16.2d, #0x0 str q16, [x16, openjdk#48] movi v16.2d, #0x0 str q16, [x16, openjdk#112] movi v16.2d, #0x0 str q16, [x16, openjdk#80] movi v16.2d, #0x0 str q16, [x16, openjdk#96] Introduce deduplication of these rematerialized vector constant initializations reducing the above sequence to: movi v16.2d, #0x0 str q16, [x16, openjdk#64] str q16, [x16, openjdk#32] str q16, [x16, openjdk#16] str q16, [x16] str q16, [x16, openjdk#48] str q16, [x16, openjdk#112] str q16, [x16, openjdk#80] str q16, [x16, openjdk#96]

…ch-2 Sync with upstreaming branch 2

8251152: Skip Test8202414 on CPUs missing unaligned memory accesses s…

e5ff3a6

…upport Reviewed-by: iignatyev, clanger

openjdk Bot changed the title ~~8251152: Skip Test8202414 on CPUs missing unaligned memory accesses support~~ 8251152: ARM32: jtreg c2 Test8202414 test crash Sep 7, 2020

openjdk Bot added the ready Pull request is ready to be integrated label Sep 7, 2020

fzhinkin marked this pull request as ready for review September 7, 2020 11:19

openjdk Bot added hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review labels Sep 7, 2020

iignatev reviewed Sep 7, 2020

View reviewed changes

iignatev approved these changes Sep 7, 2020

View reviewed changes

openjdk Bot closed this Sep 7, 2020

openjdk Bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 7, 2020

mlbridge Bot mentioned this pull request Nov 9, 2020

8255949: AArch64: Add support for vectorized shift right and accumulate #1087

Closed

3 tasks

asotona added a commit to asotona/jdk that referenced this pull request Feb 9, 2023

AttributeElement.Kind removal (openjdk#48)

753e684

asotona added a commit to asotona/jdk that referenced this pull request Feb 15, 2023

AttributeElement.Kind removal (openjdk#48)

d4a8460

caojoshua pushed a commit to caojoshua/jdk that referenced this pull request Jun 9, 2023

JVM-2081: PEA materializes the object at bytecode aastore (openjdk#48)

3d9bfb0

Co-authored-by: Xin Liu <xxinliu@amazon.com>

robehn pushed a commit to robehn/jdk that referenced this pull request Aug 15, 2023

Fix ::set-output being deprecated warning (openjdk#48)

564ac55

pf0n pushed a commit to pf0n/jdk that referenced this pull request Jul 9, 2025

Merge pull request openjdk#48 from ryota0624/fix/int_number_to_long

821a0fc

fix int number to long

snake66 added a commit to snake66/jdk that referenced this pull request Apr 8, 2026

Merge pull request openjdk#48 from snake66/sync-with-upstreaming-bran…

dc931d7

…ch-2 Sync with upstreaming branch 2

Conversation

fzhinkin commented Sep 7, 2020 • edited by openjdk Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Download

Uh oh!

bridgekeeper Bot commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

fzhinkin commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

fzhinkin commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

mlbridge Bot commented Sep 7, 2020

Webrevs

Uh oh!

iignatev Sep 7, 2020

Choose a reason for hiding this comment

Uh oh!

fzhinkin Sep 7, 2020

Choose a reason for hiding this comment

Uh oh!

fzhinkin commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

fzhinkin commented Sep 7, 2020

Uh oh!

openjdk Bot commented Sep 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

fzhinkin commented Sep 7, 2020 •

edited by openjdk Bot

Loading

openjdk Bot commented Sep 7, 2020 •

edited

Loading