Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Dec 4, 2025

What changes were proposed in this pull request?

This PR aims to upgrade lz4-java to 1.10.0 and exclude the legacy groupID version.

Why are the changes needed?

Since lz4-java changed its repository, we had better depend on the live repository for future maintenance.

Does this PR introduce any user-facing change?

No Spark behavior change.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Dec 4, 2025

This is a dependency-only PR, cc @dbtsai , @HyukjinKwon , @LuciferYang , @yawkat , @SteNicholas .

To be clear, the security issue is not a scope of this PR.

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon . I'm going to add one more commit to ban this library explicitly.

@dongjoon-hyun
Copy link
Member Author

All tests passed at the first commit and the subsequent two comments are only about excluding the transitive dependencies to make it sure.

Screenshot 2025-12-04 at 5 23 23 PM

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SteNicholas
Copy link
Member

SteNicholas commented Dec 5, 2025

@dongjoon-hyun, does it still need to switch fastDecompressor to safeDecompressor after upgrade?

@LuciferYang
Copy link
Contributor

LuciferYang commented Dec 5, 2025

org.lz4:lz4-java

spark/NOTICE-binary

Lines 205 to 208 in eaaa7ff

* LICENSE:
* license/LICENSE.lz4.txt (Apache License 2.0)
* HOMEPAGE:
* https://github.com/jpountz/lz4-java

Should the LICENSE-binary and NOTICE-binary be updated?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Dec 5, 2025

@dongjoon-hyun, does it still need to switch fastDecompressor to safeDecompressor after upgrade?

Exactly, that's @dbtsai 's contribution, @SteNicholas . This PR doesn't aim to do that. He will rebase his PR after merging this independently.

@dongjoon-hyun
Copy link
Member Author

Thank you, @LuciferYang ! I'll update it.

@SteNicholas
Copy link
Member

@dongjoon-hyun, I just confirm whether to switch fastDecompressor to safeDecompressor after upgrade to 1.10.0.

@dongjoon-hyun
Copy link
Member Author

I just confirm whether to switch fastDecompressor to safeDecompressor after upgrade to 1.10.0.

@SteNicholas What I can say here is that it's beyond of this PR. Technically, we don't know what decision we are going to make eventually on the following yet because it's still Draft.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Dec 5, 2025

Please don't get me wrong. I'm trying to help that PR move forward by reducing the gap.

@LuciferYang
Copy link
Contributor

LGTM ~

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Dec 5, 2025

Thank you all! Merged to master for Apache Spark 4.2.0 (for now)

@pan3793
Copy link
Member

pan3793 commented Dec 16, 2025

Just FYI, lz4 is famous for its ultra-fast speed, the upgrade is not free, my test shows it has perf impact - #53453

@yawkat
Copy link

yawkat commented Dec 16, 2025

@pan3793 please make an issue on the lz4-java project for this as well. there may be room for improvement on our side, maybe some compiler flags. not sure if i'll have time to look at it, but maybe someone else will.

@pan3793
Copy link
Member

pan3793 commented Dec 16, 2025

@yawkat okay, will open an issue and forward this message.


update: opened yawkat/lz4-java#30

@yawkat
Copy link

yawkat commented Dec 16, 2025

thanks!

@dongjoon-hyun
Copy link
Member Author

@pan3793 Of course, it's one pf the key consideration points.

dongjoon-hyun pushed a commit that referenced this pull request Dec 22, 2025
… in SBT build

### What changes were proposed in this pull request?

This PR is a followup of #53327 that explicitly exclude lz4-java in SBT build.

### Why are the changes needed?

For some reasons, SBT still tries to look for it:

```
2025-12-21T08:16:32.3447761Z [info] Jar hash: 61bb3bb74c3d32b7ae527652d9d8c46efa6d04fc
2025-12-21T08:16:33.2910680Z [error] lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts:
2025-12-21T08:16:33.2912312Z [error] file:/home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar
2025-12-21T08:16:33.2913430Z [error]
2025-12-21T08:16:33.2914325Z [error] 	at lmcoursier.internal.shaded.coursier.Artifacts$.$anonfun$fetchArtifacts$9(Artifacts.scala:365)
2025-12-21T08:16:33.2915570Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1(Task.scala:14)
2025-12-21T08:16:33.2916784Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1$adapted(Task.scala:14)
2025-12-21T08:16:33.2917884Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.wrap(Task.scala:82)
2025-12-21T08:16:33.2918859Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$2(Task.scala:14)
2025-12-21T08:16:33.2919771Z [error] 	at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
2025-12-21T08:16:33.2920635Z [error] 	at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:51)
2025-12-21T08:16:33.2921512Z [error] 	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:74)
2025-12-21T08:16:33.2922869Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2025-12-21T08:16:33.2924071Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2025-12-21T08:16:33.2925145Z [error] 	at java.base/java.lang.Thread.run(Thread.java:840)
2025-12-21T08:16:33.2926563Z [error] Caused by: lmcoursier.internal.shaded.coursier.cache.ArtifactError$NotFound: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar
2025-12-21T08:16:33.2928288Z [error] 	at lmcoursier.internal.shaded.coursier.cache.internal.Downloader.$anonfun$checkFileExists$1(Downloader.scala:603)
2025-12-21T08:16:33.2929450Z [error] 	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
2025-12-21T08:16:33.2930146Z [error] 	at scala.util.Success.$anonfun$map$1(Try.scala:255)
2025-12-21T08:16:33.2930723Z [error] 	at scala.util.Success.map(Try.scala:213)
2025-12-21T08:16:33.2931387Z [error] 	at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
2025-12-21T08:16:33.2932190Z [error] 	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:42)
2025-12-21T08:16:33.2933052Z [error] 	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:74)
2025-12-21T08:16:33.2934069Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2025-12-21T08:16:33.2938645Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2025-12-21T08:16:33.2939423Z [error] 	at java.base/java.lang.Thread.run(Thread.java:840)
2025-12-21T08:16:33.2940265Z [error] lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts:
2025-12-21T08:16:33.2941556Z [error] file:/home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar
2025-12-21T08:16:33.2942421Z [error]
2025-12-21T08:16:33.2943007Z [error] 	at lmcoursier.internal.shaded.coursier.Artifacts$.$anonfun$fetchArtifacts$9(Artifacts.scala:365)
2025-12-21T08:16:33.2944078Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1(Task.scala:14)
2025-12-21T08:16:33.2945450Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1$adapted(Task.scala:14)
2025-12-21T08:16:33.2946441Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.wrap(Task.scala:82)
2025-12-21T08:16:33.2947312Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$2(Task.scala:14)
2025-12-21T08:16:33.2948105Z [error] 	at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
2025-12-21T08:16:33.2948811Z [error] 	at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:51)
2025-12-21T08:16:33.2949547Z [error] 	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:74)
2025-12-21T08:16:33.2950403Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2025-12-21T08:16:33.2951391Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2025-12-21T08:16:33.2952135Z [error] 	at java.base/java.lang.Thread.run(Thread.java:840)
2025-12-21T08:16:33.2953218Z [error] Caused by: lmcoursier.internal.shaded.coursier.cache.ArtifactError$NotFound: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar
2025-12-21T08:16:33.2954841Z [error] 	at lmcoursier.internal.shaded.coursier.cache.internal.Downloader.$anonfun$checkFileExists$1(Downloader.scala:603)
2025-12-21T08:16:33.2955801Z [error] 	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
2025-12-21T08:16:33.2956376Z [error] 	at scala.util.Success.$anonfun$map$1(Try.scala:255)
2025-12-21T08:16:33.2956861Z [error] 	at scala.util.Success.map(Try.scala:213)
2025-12-21T08:16:33.2957389Z [error] 	at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
2025-12-21T08:16:33.2958305Z [error] 	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:42)
2025-12-21T08:16:33.2959058Z [error] 	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:74)
2025-12-21T08:16:33.2959915Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2025-12-21T08:16:33.2960919Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2025-12-21T08:16:33.2961677Z [error] 	at java.base/java.lang.Thread.run(Thread.java:840)
2025-12-21T08:16:33.2996977Z [error] (streaming-kafka-0-10 / update) lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts:
2025-12-21T08:16:33.2998744Z [error] file:/home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar
2025-12-21T08:16:33.3000432Z [error] (sql-kafka-0-10 / update) lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts:
2025-12-21T08:16:33.3002097Z [error] file:/home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar
2025-12-21T08:16:33.3032908Z [error] Total time: 361 s (0:06:01.0), completed Dec 21, 2025, 8:16:33 AM
```

which seems breaking the release build https://github.com/apache/spark/actions/workflows/release.yml

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I cannot reproduce properly in my local. This is the fix assuming from the log. I will monitor the build.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #53556 from HyukjinKwon/SPARK-54597-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants