Skip to content

Build: remove hard-coded Java versions from ecj.javadocs.prefs#14651

Merged
dsmiley merged 3 commits intoapache:mainfrom
dsmiley:ecjImproveJdk
May 14, 2025
Merged

Build: remove hard-coded Java versions from ecj.javadocs.prefs#14651
dsmiley merged 3 commits intoapache:mainfrom
dsmiley:ecjImproveJdk

Conversation

@dsmiley
Copy link
Contributor

@dsmiley dsmiley commented May 12, 2025

This small improvement to the build eliminates references to the Java version in ecj.javadocs.prefs. They were already being set at the CLI when ECJ Lint was invoked. What wasn't set that was the Java "compliance level", which is different from source & target. All 3 (compliance, source, target) can simply be set with a hyphen then the version.

Copy link
Contributor

@uschindler uschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is this breaks Eclipse IDE where the file is also used for setting up project.

@dsmiley
Copy link
Contributor Author

dsmiley commented May 12, 2025

I'm hoping an Eclipse user could improve this PR to address that side of the equation. I've never used Eclipse.

@uschindler
Copy link
Contributor

I'm hoping an Eclipse user could improve this PR to address that side of the equation. I've never used Eclipse.

Not sure how, but if we remove the java version from the ecj javadocs setting file, we have to add the back in the .settings directory for the Eclipse IDE:

from rootProject.file("${resources}/dot.settings")
into rootProject.file(".settings")
filter(ReplaceTokens, tokens: [
'ecj-lint-config': ecjLintFile.getText('UTF-8').replaceAll(/=error\b/, '=' + errorMode)
])

At this place it processes all files in the dot.settings directory and replaces the tag:

So maybe we just need to add the target version here and set those three properties.

Uwe

@uschindler
Copy link
Contributor

uschindler commented May 12, 2025

Something like:

filter(ReplaceTokens, tokens: [ 
         'ecj-lint-config': ecjLintFile.getText('UTF-8').replaceAll(/=error\b/, '=' + errorMode),

         'java-version-config': 'org.eclipse.jdt.core.compiler.codegen.targetPlatform='+targetVersion /..... and so on
 ]) 

And add a @java-version-config@ in above template.

@uschindler
Copy link
Contributor

uschindler commented May 12, 2025

But in general: I would keep the properties file as is and inject the Java version as described before. The command line of ECJ should not use release.

@dsmiley
Copy link
Contributor Author

dsmiley commented May 12, 2025

The command line of ECJ should not be modified.

Why; is it perfection? It seems the "compliance" wasn't being set.

@uschindler
Copy link
Contributor

I fixed the eclipse setup in 965a00b

It basically adds the follwoing to the settingss file generated for eclipse config:

org.eclipse.jdt.core.compiler.codegen.targetPlatform=24
org.eclipse.jdt.core.compiler.compliance=24
org.eclipse.jdt.core.compiler.source=24

To me all is fine, just try to figure out with @dweiss and @rmuir if the release flag is enabled or not. At least in my eclipse settings the "release" is not checked in the options, only source and compliance level are set. Target does not exist in eclipse config.

@rmuir
Copy link
Member

rmuir commented May 12, 2025

To me all is fine, just try to figure out with @dweiss and @rmuir if the release flag is enabled or not.

Please don't do it here. Please make separate PR if you want to enable that flag: it needs to be backed by benchmarks on JDK > releaseVersion, as the performance issues with it won't show with JDK == releaseVersion.

@uschindler
Copy link
Contributor

uschindler commented May 12, 2025

To me all is fine, just try to figure out with @dweiss and @rmuir if the release flag is enabled or not.

Please don't do it here. Please make separate PR if you want to enable that flag: it needs to be backed by benchmarks on JDK > releaseVersion, as the performance issues with it won't show with JDK == releaseVersion.

I can just give my observation: When the compliance level is set, the Eclipse IDE does not show the "release" setting enabled:
image

(the setting is also "off" by default. To enable it the properties file needs: org.eclipse.jdt.core.compiler.release=enabled

@uschindler
Copy link
Contributor

uschindler commented May 12, 2025

But of course it may be different with the command line parameters. But to me it looks like Eclipse disables the release flag by default and settings compliance is just a combination of the settings "source" and "target" (and version specific preview flags) in above picture.

@rmuir
Copy link
Member

rmuir commented May 12, 2025

Sorry, I don't have a better idea other than, try it on branch_10x, using a JDK24 where the "target" is 21.

Or maybe "template" the file the same way as is done for gradle eclipse and then use that resulting processed file for ecj-linting, rather than CLI flags.

@uschindler
Copy link
Contributor

Use Jdk 25 EA to check. Jenkins does this all the time.

@dweiss
Copy link
Contributor

dweiss commented May 13, 2025

I dug deep into this, fascinating. So the problem is indeed in the release check flag. Here is a flame graph from a slowed-down execution of ecj:
image

you'll notice a lot of time spent in Files.exists there, which in turn is called from ClasspathJep247Jdk12. JEP 247 (https://openjdk.org/jeps/247) introduced the --release option for the compiler and also introduced an - undocumented? - 'ct.sym' file, which is part of the JDK and is a zip file with class signatures from previous Java version. Couldn't find the docs on this file anywhere and ecj's source is a fun reading too:

 * Abstraction to the ct.sym file access (see https://openjdk.java.net/jeps/247). The ct.sym file is required to
 * implement JEP 247 feature (compile with "--release" option against class stubs for older releases) and is currently
 * (Java 15) a jar file with undocumented internal structure, currently existing in at least two different format
 * versions (pre Java 12 and Java 12 and later).
 * <p>
 * The only documentation known seem to be the current implementation of
 * com.sun.tools.javac.platform.JDKPlatformProvider and probably some JDK build tools that construct ct.sym file. Root
 * directories inside the file are somehow related to the Java release number, encoded as single digit or letter (single
 * digits for releases 7 to 9, capital letters for 10 and higher).
 * <p>
 * If a release directory contains "system-modules" file, it is a flag that this release files are not inside ct.sym
 * file because it is the current release, and jrt file system should be used instead.
 * <p>
 * All other release directories contain encoded signature (*.sig) files with class stubs for classes in the release.
 * <p>
 * Some directories contain files that are shared between different releases, exact logic how they are distributed is
 * not known.
 * <p>
 * ...

Anyway. The cause of the slowdown in ecj is caused by many repetitive calls to zip Filesystem provider, which doesn't seem to be too efficient. There does seem to be some caches around it in CtSym.java but evidently it depends not only on ecj version but also on the content of ct.sym file - how many patch releases there were for the target version (all patch sub-directories need to be checked).

It's quite insane. I would leave release checking out of ecj.

@uschindler
Copy link
Contributor

the compiler and also introduced an - undocumented? - 'ct.sym' file, which is part of the JDK and is a zip file with class signatures from previous Java version. Couldn't find the docs on this file anywhere and ecj's source is a fun reading too:

This is well documented in the source code of the JDK; I don't remember if there's also a JEP. The ct.sym file is generated by the JDK build and later committed when they start a new release. Actually the way how it is generated was my inspiration for the stub-only APIJAR files we have for compilation of MR-JARs. Basically it is more or less exactly the same: The ct.sym file is just a ZIP file with empty stubs and (in addition to Lucene's approach) with some additional logic to only record the differences between API versions. Because of this it is accessing the filesystem provider many times, because it has to load a symbol from many files until it has gotten all symbols by adding up all incremental changes. This is the reason for the file exists. It looks up for every previous Java relaese if a sym file exists. And as we are at 24 already it goes down to 7 or 8. This does not scale, so they should improve the lookup structure by having a pointer to previous release of a specific file in each class stub.

The most important question here: Did you find out if "compliance" flag uses release or not?

@uschindler
Copy link
Contributor

uschindler commented May 13, 2025

P.S.: It could be that the newest version of the ct.sym generator uses the Classfile API in a similar way like I refactored it last week. Have not checked source code, the last time I looked at it, it was NOT using ASM, but the javac compiler classfile parser used for symbol lookup from real JAR files. Nowadays all is merged together, so chances are high that it uses Classfile API nowadays.

@dweiss
Copy link
Contributor

dweiss commented May 13, 2025

Also - the final cherry on top of this cake - why we can't see such significant difference on main is very likely on this change: https://bugs.openjdk.org/browse/JDK-8283336

If you go back to commit 8d01037 and use the newest available ecj batch compiler with jdk17, the difference is still there (and large). If you run ecj with jdk 21 there (same invocation line but you need to do it manually because gradle bails out), the difference is much less pronounced.

Fun.

@dweiss
Copy link
Contributor

dweiss commented May 13, 2025

@dweiss
Copy link
Contributor

dweiss commented May 13, 2025

There is a lot of complexity involved in checking this release compatibility. I'd leave it out to javac and just use source/target here. No need to double-check the same stuff twice and slow down the build.

@uschindler
Copy link
Contributor

Thanks, so I'd like to revert those changes here with a clear warning next to it.

@dweiss
Copy link
Contributor

dweiss commented May 13, 2025

Yep, I'd stick to using what was there before. Not a bad idea to replace java properties if you've already done it though.

@uschindler
Copy link
Contributor

There is a lot of complexity involved in checking this release compatibility. I'd leave it out to javac and just use source/target here. No need to double-check the same stuff twice and slow down the build.

This is not true, I checked the code. When you specify -release on command line (and only then), it sets relaese version:

https://github.com/eclipse-jdt/eclipse.jdt.core/blob/85f1797ab9f379eeffd203d94522fbf3c6a4e7ce/org.eclipse.jdt.core.compiler.batch/src/org/eclipse/jdt/internal/compiler/batch/Main.java#L2588

If you just set compliance level it does not do this,

Later in the code it compares the current compliance level and release version via (this.releaseVersion != null && this.complianceLevel < jdkLevel) -- At this place it sets up the "Horrible slowness":

https://github.com/eclipse-jdt/eclipse.jdt.core/blob/85f1797ab9f379eeffd203d94522fbf3c6a4e7ce/org.eclipse.jdt.core.compiler.batch/src/org/eclipse/jdt/internal/compiler/batch/Main.java#L5191-L5198

So basically, the code in this PR is fine as it does not set the release version unless you use "release". If you just give compliance level it is identical to setting source/target.

@uschindler
Copy link
Contributor

But at end I don't care if we set source+target or just compliance. It is effectively doing the same at end. We should just make sure to not set "release".

Just to confirm that my observations are correct: Did you measure slowness with this PR if executed in Java 25? If yes, I am wrong with my analysis.

@dweiss
Copy link
Contributor

dweiss commented May 13, 2025

Ok, I think the second snippet of yours is spot-on - yes. Still, I'd prefer using javac options. I understand them. Everyone who uses javac will. ecj's "numeric" compliance options are a shortcut that save like 10 characters but they're rather obscure. Why bother?

@dweiss
Copy link
Contributor

dweiss commented May 13, 2025

Just to confirm that my observations are correct: Did you measure slowness with this PR if executed in Java 25?

I don't see a lot of slowdown at all when I run on Java 21+. It is slightly slower but nowhere near what it used to be. I still don't see the point of using this entire back-compat infrastructure if it's duplicated in javac anyway.

@rmuir
Copy link
Member

rmuir commented May 13, 2025

Please don't enable this slowness, there is zero benefit as javac already checks it.

We don't need it. Just avoid it.

@uschindler
Copy link
Contributor

I am complete pissed off, bye! Do whatever you want. Damn!

@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one.

@dsmiley
Copy link
Contributor Author

dsmiley commented May 13, 2025

Thanks Dawid for the deep dive and Uwe for helping with the Eclipse support!
For this minor build improvement, I wasn't going to update CHANGES.txt but let me know if you think I should add it. I plan to back-port to 10x and also take it to Solr. I'll merge later tonight (like in ~8hrs).

@dsmiley dsmiley merged commit 78da2ee into apache:main May 14, 2025
7 checks passed
@dsmiley dsmiley deleted the ecjImproveJdk branch May 14, 2025 02:58
dsmiley added a commit that referenced this pull request May 14, 2025
And retain correct Eclipse IDE configuration generation by templating org.eclipse.jdt.core.prefs with the JDK version.

---------

Co-authored-by: Uwe Schindler <uschindler@apache.org>

(cherry picked from commit 78da2ee)
weizijun added a commit to weizijun/lucene that referenced this pull request May 16, 2025
* main: (31 commits)
  Fix termination condition in TestStressNRTReplication. (apache#14665)
  deps(java): bump com.gradle.develocity from 3.19 to 3.19.2 (apache#14662)
  Build: remove hard-coded Java versions from ecj.javadocs.prefs (apache#14651)
  Update verifier comment to show label (apache#14658)
  Catch and re-throw Throwable rather than using a success boolean (apache#14633)
  Mention label in changelog verifier comment (apache#14656)
  Enable PR actions in changelog verifier (apache#14644)
  Fix FuzzySet#getEstimatedNumberUniqueValuesAllowingForCollisions to properly account for hashCount (apache#14614)
  Don't perform additional KNN querying after timeout, fixes apache#14639 (apache#14640)
  Add instructions to help/IDEs.txt for VSCode and Neovim (apache#14646)
  build(deps): bump ruff from 0.11.7 to 0.11.8 in /dev-tools/scripts (apache#14603)
  deps(java): bump de.jflex:jflex from 1.8.2 to 1.9.1 (apache#14583)
  Use the preload hint on completion fields and memory terms dictionaries. (apache#14634)
  Clean up FileTypeHint a bit. (apache#14635)
  Expressions: Improve test to use a fully private class or method
  Remove deprecations in expressions (apache#14641)
  removing constructor with deprecated attribute 'onlyLongestMatch (apache#14356)
  Moving CHANGES entry for apache#14609 from 11.0 to 10.3 (apache#14638)
  Overrides rewrite in PointRangeQuery to optimize AllDocs/NoDocs cases (apache#14609)
  Adding benchmark for histogram collector over point range query (apache#14622)
  ...

# Conflicts:
#	lucene/CHANGES.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants