Skip to content

8249142: java/awt/FontClass/CreateFont/DeleteFont.sh is unstable#256

Closed
prrace wants to merge 1 commit intoopenjdk:masterfrom
prrace:deletefont
Closed

8249142: java/awt/FontClass/CreateFont/DeleteFont.sh is unstable#256
prrace wants to merge 1 commit intoopenjdk:masterfrom
prrace:deletefont

Conversation

@prrace
Copy link
Copy Markdown
Contributor

@prrace prrace commented Sep 18, 2020

This test is being marked intermittent, although at the same time I am trying to make it less likely to fail.
However since we have a known issue around NIO mmap'd files not being directly unmappable, the
deletes the font system make may be stymied on Windows. So marking it intermittent is probably for the best
One other thing is that I changed it so that the tmp files created are now of different sizes so we can now
tell which createFont() call resulted in the font that can't be deleted. If it is always the Type 1 fonts then
that will be good evidence it is mmap that is the problem.
We likely need to stop using mmap for this reason.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8249142: java/awt/FontClass/CreateFont/DeleteFont.sh is unstable

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/256/head:pull/256
$ git checkout pull/256

@bridgekeeper
Copy link
Copy Markdown

bridgekeeper Bot commented Sep 18, 2020

👋 Welcome back prr! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk Bot added the rfr Pull request is ready for review label Sep 18, 2020
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Sep 18, 2020

@prrace The following label will be automatically applied to this pull request: 2d.

When this pull request is ready to be reviewed, an RFR email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label (add|remove) "label" command.

@openjdk openjdk Bot added the 2d client-libs-dev@openjdk.org label Sep 18, 2020
@mlbridge
Copy link
Copy Markdown

mlbridge Bot commented Sep 18, 2020

Webrevs

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Sep 19, 2020

@prrace This change now passes all automated pre-integration checks. In addition to the automated checks, the change must also fulfill all project specific requirements

After integration, the commit message will be:

8249142: java/awt/FontClass/CreateFont/DeleteFont.sh is unstable

Reviewed-by: serb
  • If you would like to add a summary, use the /summary command.
  • To credit additional contributors, use the /contributor command.
  • To add additional solved issues, use the /issue command.

Since the source branch of this PR was last updated there have been 57 commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid automatic rebasing, please merge master into your branch, and then specify the current head hash when integrating, like this: /integrate 1438ce097f4b327570504066a3f999163802a14f.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk Bot added the ready Pull request is ready to be integrated label Sep 19, 2020
@prrace
Copy link
Copy Markdown
Contributor Author

prrace commented Sep 19, 2020

/integrate

@prrace prrace closed this Sep 19, 2020
@prrace prrace reopened this Sep 19, 2020
@prrace
Copy link
Copy Markdown
Contributor Author

prrace commented Sep 19, 2020

/integrate

@openjdk openjdk Bot closed this Sep 19, 2020
@openjdk openjdk Bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 19, 2020
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Sep 19, 2020

@prrace Since your change was applied there have been 57 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

Pushed as commit d27835b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@prrace prrace deleted the deletefont branch September 20, 2020 16:56
fg1417 added a commit to fg1417/jdk that referenced this pull request Mar 13, 2026
…marks after JDK-8340093

JDK-8340093 enabled auto-vectorization for more reduction loop cases
using 128-bit vector operations. As a result, the following
microbenchmarks are negatively affected:
VectorReduction2.longAddDotProduct
VectorReduction2.longMulDotProduct
VectorReduction2.longMulSimple

This patch fixes these regressions.

1. Improve code generation for MLA

For longAddDotProduct[1], the current implementation generates
vectorized code similar to:
```
ldr     q17, [x12, openjdk#16]
ldr     q18, [x11, openjdk#16]
mla     z16.d, p7/m, z17.d, z18.d
ldr     q17, [x11, openjdk#32]
ldr     q18, [x12, openjdk#32]
mla     z16.d, p7/m, z18.d, z17.d
...
ldr     q17, [x11, openjdk#128]
ldr     q18, [x12, openjdk#128]
mla     z16.d, p7/m, z18.d, z17.d
```
`z16` is the third source and destination register. There are
true dependencies between consecutive mla[2] instructions.
As a result, this vectorized code performs significantly worse
than the scalar version due to limited instruction-level
parallelism.

These mla instructions are produced by a backend match rule that
fuses AddVL and MulVL into a vector MLA[3]. In this situation,
avoiding instruction fusion and instead generating separate SVE
mul and add instructions can improve instruction-level parallelism
and overall performance.

To address this, this patch introduces
is_multiply_accumulate_candidate() to determine whether a node is
a suitable vector MLA candidate. For node patterns that may
increase execution latency, instruction fusion into MLA is
disabled.

After applying this patch, the generated assembly looks like:
```
ldr     q17, [x12, openjdk#16]
ldr     q18, [x11, openjdk#16]
ldr     q19, [x11, openjdk#32]
mul     z17.d, p7/m, z17.d, z18.d
ldr     q18, [x12, openjdk#32]
ldr     q20, [x11, openjdk#48]
mul     z18.d, p7/m, z18.d, z19.d
ldr     q19, [x12, openjdk#48]
add     v16.2d, v17.2d, v16.2d
ldr     q17, [x11, openjdk#64]
add     v16.2d, v18.2d, v16.2d
ldr     q18, [x12, openjdk#64]
mul     z19.d, p7/m, z19.d, z20.d
ldr     q20, [x12, openjdk#80]
add     v16.2d, v19.2d, v16.2d
```
This sequence exposes more independent operations and reduces
dependency chains, leading to improved performance.

Since SVE mls instructions may suffer from similar issues, the
same logic has been extended to cover MLS as well. Additional
microbenchmarks have been added accordingly.

2. Avoid vectorizing MUL-heavy loops

For longMulSimple[3], the generated vectorized code exhibits
long dependency chains of SVE mul instructions, which results
in worse performance than scalar execution:
```
ldr     q17, [x1, openjdk#16]
ldr     q18, [x1, openjdk#32]
mul     z17.d, p7/m, z17.d, z16.d
ldr     q16, [x1, openjdk#48]
mul     z17.d, p7/m, z17.d, z18.d
ldr     q18, [x1, openjdk#64]
mul     z16.d, p7/m, z16.d, z17.d
...
ldr     q16, [x1, openjdk#256]
mul     z17.d, p7/m, z17.d, z19.d
mul     z16.d, p7/m, z16.d, z17.d
```

To address this, the patch introduces a platform-specific interface:
`VTransformElementWiseVectorNode::node_weight()`.

For 128-bit operations, this interface detects consecutive vector
long multiply operations and increases the node weight to 4, which is
the minimum value required for the cost model to avoid vectorization
on both 128-bit and 256-bit platforms.

3. Results
Performance measurements on 128-bit and 256-bit SVE machines show that
these changes avoid harmful vectorization and improve overall
performance for the affected benchmarks.

patch: results obtained after applying this patch, using default
auto-vectorization settings (-XX:+UseSuperWord,
-XX:AutoVectorizationOverrideProfitability=1, cost-model decision mode)

main-default: results on mainline using the same default
auto-vectorization settings (-XX:+UseSuperWord,
-XX:AutoVectorizationOverrideProfitability=1, cost-model decision mode)

main-scalar: results on mainline with -XX:+UseSuperWord and
-XX:AutoVectorizationOverrideProfitability=0 (force scalar code)

The table below reports relative performance changes:
p/m1 = (patch - main-default) / main-default
p/m0 = (patch - main-scalar) / main-scalar

Mode: avgt
Unit: ns/op

Arm Neoverse V2 machine (128 bit SVE):
Benchmark                                         (COUNT)    p/m1       p/m0
TypeVectorOperationsSuperWord.mlaL                  512     0.16%      -50.42%
TypeVectorOperationsSuperWord.mlaL                  2048    0.26%      -56.70%
TypeVectorOperationsSuperWord.mlsL                  512     -0.10%     -50.37%
TypeVectorOperationsSuperWord.mlsL                  2048    0.14%      -56.82%
TypeVectorOperationsSuperWord.mulBigL               512     0.06%      -25.77%
TypeVectorOperationsSuperWord.mulBigL               2048    -0.02%     -19.63%
TypeVectorOperationsSuperWord.mulI                  512     0.63%      -63.44%
TypeVectorOperationsSuperWord.mulI                  2048    0.28%      -63.07%
TypeVectorOperationsSuperWord.mulL                  512     -0.03%     -50.47%
TypeVectorOperationsSuperWord.mulL                  2048    0.29%      -50.82%
TypeVectorOperationsSuperWord.mulMediumL            512     -0.19%     -27.54%
TypeVectorOperationsSuperWord.mulMediumL            2048    0.24%      -25.18%
TypeVectorOperationsSuperWord.mulMlaLDependent      512     0.30%      -28.70%
TypeVectorOperationsSuperWord.mulMlaLDependent      2048    0.12%      -26.74%
TypeVectorOperationsSuperWord.mulMlaLIndependent    512     -10.43%    -43.09%
TypeVectorOperationsSuperWord.mulMlaLIndependent    2048    -14.82%    -42.68%
VectorReduction2.WithSuperword.longAddBig           2048    -15.15%    -44.01%
VectorReduction2.WithSuperword.longAddBigMixSub1    2048    -6.19%     -43.92%
VectorReduction2.WithSuperword.longAddBigMixSub2    2048    -15.18%    -43.90%
VectorReduction2.WithSuperword.longAddBigMixSub3    2048    -5.74%     -43.87%
VectorReduction2.WithSuperword.longAddDotProduct    2048    -33.36%    -18.16%
VectorReduction2.WithSuperword.longAddSimple        2048    -0.02%     -6.72%
VectorReduction2.WithSuperword.longAndBig           2048    -16.32%    -44.06%
VectorReduction2.WithSuperword.longAndDotProduct    2048    -0.01%     -3.74%
VectorReduction2.WithSuperword.longAndSimple        2048    0.00%      -6.35%
VectorReduction2.WithSuperword.longMaxBig           2048    -15.29%    -52.09%
VectorReduction2.WithSuperword.longMaxDotProduct    2048    -0.03%     -52.08%
VectorReduction2.WithSuperword.longMaxSimple        2048    -0.40%     -52.74%
VectorReduction2.WithSuperword.longMinBig           2048    -14.88%    -51.70%
VectorReduction2.WithSuperword.longMinDotProduct    2048    0.01%      -52.21%
VectorReduction2.WithSuperword.longMinSimple        2048    0.26%      -52.88%
VectorReduction2.WithSuperword.longMulBig           2048    -2.21%     -0.07%
VectorReduction2.WithSuperword.longMulDotProduct    2048    -15.47%    0.00%
VectorReduction2.WithSuperword.longMulSimple        2048    -17.87%    -0.33%
VectorReduction2.WithSuperword.longOrBig            2048    -15.23%    -43.94%
VectorReduction2.WithSuperword.longOrDotProduct     2048    -0.01%     -3.83%
VectorReduction2.WithSuperword.longOrSimple         2048    -0.01%     -6.60%
VectorReduction2.WithSuperword.longXorBig           2048    -10.03%    -41.62%
VectorReduction2.WithSuperword.longXorDotProduct    2048    0.01%      -38.61%
VectorReduction2.WithSuperword.longXorSimple        2048    0.02%      -53.18%

Arm Neoverse V1 machine (256 bit SVE):
Note: In the current mainline code, the AArch64 backend supports
only 128-bit multiply long operations. Auto-vectorization accounts
for this backend constraint and splits 256-bit vectors into 128-bit
chunks so that the loop can still be vectorized. This is why
256-bit platforms also benefit from this patch.

No obvious performance changes are observed for other benchmarks.

Benchmark                           (COUNT)       p/m1       p/m0
VectorReduction2.longMulDotProduct    2048       -28.23%    0.00%
VectorReduction2.longMulSimple        2048       -19.29%    0.01%

Tier 1 - 3 passed on both aarch64 and x86 platforms.

[1] https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/test/micro/org/openjdk/bench/vm/compiler/VectorReduction2.java#L1096
[2] https://developer.arm.com/documentation/ddi0602/2025-12/SVE-Instructions/MLA--vectors---Multiply-add--predicated--?lang=en
[3] https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2617
[4] https://github.com/openjdk/jdk/blob/c5f288e2ae2ebe6ee4a0d39d91348f746bd0e353/test/micro/org/openjdk/bench/vm/compiler/VectorReduction2.java#L1035
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2d client-libs-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

2 participants