Skip to content

Conversation

@HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR is a followup of #53327 that explicitly exclude lz4-java in SBT build.

Why are the changes needed?

For some reasons, SBT still tries to look for it:

2025-12-21T08:16:32.3447761Z [info] Jar hash: 61bb3bb74c3d32b7ae527652d9d8c46efa6d04fc
2025-12-21T08:16:33.2910680Z [error] lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts:
2025-12-21T08:16:33.2912312Z [error] file:/home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar
2025-12-21T08:16:33.2913430Z [error] 
2025-12-21T08:16:33.2914325Z [error] 	at lmcoursier.internal.shaded.coursier.Artifacts$.$anonfun$fetchArtifacts$9(Artifacts.scala:365)
2025-12-21T08:16:33.2915570Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1(Task.scala:14)
2025-12-21T08:16:33.2916784Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1$adapted(Task.scala:14)
2025-12-21T08:16:33.2917884Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.wrap(Task.scala:82)
2025-12-21T08:16:33.2918859Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$2(Task.scala:14)
2025-12-21T08:16:33.2919771Z [error] 	at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
2025-12-21T08:16:33.2920635Z [error] 	at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:51)
2025-12-21T08:16:33.2921512Z [error] 	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:74)
2025-12-21T08:16:33.2922869Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2025-12-21T08:16:33.2924071Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2025-12-21T08:16:33.2925145Z [error] 	at java.base/java.lang.Thread.run(Thread.java:840)
2025-12-21T08:16:33.2926563Z [error] Caused by: lmcoursier.internal.shaded.coursier.cache.ArtifactError$NotFound: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar
2025-12-21T08:16:33.2928288Z [error] 	at lmcoursier.internal.shaded.coursier.cache.internal.Downloader.$anonfun$checkFileExists$1(Downloader.scala:603)
2025-12-21T08:16:33.2929450Z [error] 	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
2025-12-21T08:16:33.2930146Z [error] 	at scala.util.Success.$anonfun$map$1(Try.scala:255)
2025-12-21T08:16:33.2930723Z [error] 	at scala.util.Success.map(Try.scala:213)
2025-12-21T08:16:33.2931387Z [error] 	at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
2025-12-21T08:16:33.2932190Z [error] 	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:42)
2025-12-21T08:16:33.2933052Z [error] 	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:74)
2025-12-21T08:16:33.2934069Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2025-12-21T08:16:33.2938645Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2025-12-21T08:16:33.2939423Z [error] 	at java.base/java.lang.Thread.run(Thread.java:840)
2025-12-21T08:16:33.2940265Z [error] lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts:
2025-12-21T08:16:33.2941556Z [error] file:/home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar
2025-12-21T08:16:33.2942421Z [error] 
2025-12-21T08:16:33.2943007Z [error] 	at lmcoursier.internal.shaded.coursier.Artifacts$.$anonfun$fetchArtifacts$9(Artifacts.scala:365)
2025-12-21T08:16:33.2944078Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1(Task.scala:14)
2025-12-21T08:16:33.2945450Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1$adapted(Task.scala:14)
2025-12-21T08:16:33.2946441Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.wrap(Task.scala:82)
2025-12-21T08:16:33.2947312Z [error] 	at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$2(Task.scala:14)
2025-12-21T08:16:33.2948105Z [error] 	at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
2025-12-21T08:16:33.2948811Z [error] 	at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:51)
2025-12-21T08:16:33.2949547Z [error] 	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:74)
2025-12-21T08:16:33.2950403Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2025-12-21T08:16:33.2951391Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2025-12-21T08:16:33.2952135Z [error] 	at java.base/java.lang.Thread.run(Thread.java:840)
2025-12-21T08:16:33.2953218Z [error] Caused by: lmcoursier.internal.shaded.coursier.cache.ArtifactError$NotFound: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar
2025-12-21T08:16:33.2954841Z [error] 	at lmcoursier.internal.shaded.coursier.cache.internal.Downloader.$anonfun$checkFileExists$1(Downloader.scala:603)
2025-12-21T08:16:33.2955801Z [error] 	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
2025-12-21T08:16:33.2956376Z [error] 	at scala.util.Success.$anonfun$map$1(Try.scala:255)
2025-12-21T08:16:33.2956861Z [error] 	at scala.util.Success.map(Try.scala:213)
2025-12-21T08:16:33.2957389Z [error] 	at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
2025-12-21T08:16:33.2958305Z [error] 	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:42)
2025-12-21T08:16:33.2959058Z [error] 	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:74)
2025-12-21T08:16:33.2959915Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2025-12-21T08:16:33.2960919Z [error] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2025-12-21T08:16:33.2961677Z [error] 	at java.base/java.lang.Thread.run(Thread.java:840)
2025-12-21T08:16:33.2996977Z [error] (streaming-kafka-0-10 / update) lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts:
2025-12-21T08:16:33.2998744Z [error] file:/home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar
2025-12-21T08:16:33.3000432Z [error] (sql-kafka-0-10 / update) lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts:
2025-12-21T08:16:33.3002097Z [error] file:/home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar
2025-12-21T08:16:33.3032908Z [error] Total time: 361 s (0:06:01.0), completed Dec 21, 2025, 8:16:33 AM

which seems breaking the release build https://github.com/apache/spark/actions/workflows/release.yml

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I cannot reproduce properly in my local. This is the fix assuming from the log. I will monitor the build.

Was this patch authored or co-authored using generative AI tooling?

No.

dongjoon-hyun
dongjoon-hyun previously approved these changes Dec 22, 2025
@dongjoon-hyun dongjoon-hyun dismissed their stale review December 22, 2025 02:51

Oh, it's 1.8 instead of 1.10, isn't it?

@dongjoon-hyun
Copy link
Member

@HyukjinKwon Are you sure that SBT doesn't need lz4-java?

that explicitly exclude lz4-java in SBT build.

lz4-java 1.8.0 seems to be brought by some 3rd party dependency because we use 1.10.0 currently.

2025-12-21T08:16:33.2953218Z [error] Caused by: lmcoursier.internal.shaded.coursier.cache.ArtifactError$NotFound: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar
2025-12-21T08:16:33.2954841Z [error] 	at lmcoursier.internal.shaded.coursier.cache.internal.Downloader.$anonfun$checkFileExists$1(Downloader.scala:603)

@sarutak
Copy link
Member

sarutak commented Dec 22, 2025

streaming-kafka-0-10 seems to depend on lz4-java:1.8.0 for test.

$ build/sbt Test/dependencyTree

[info] org.apache.spark:spark-streaming-kafka-0-10_2.13:4.2.0-SNAPSHOT [S]

...

[info]   +-org.apache.kafka:kafka-clients:3.9.1
[info]   | +-com.github.luben:zstd-jni:1.5.6-4 (evicted by: 1.5.7-6)
[info]   | +-com.github.luben:zstd-jni:1.5.7-6
[info]   | +-org.lz4:lz4-java:1.8.0

How about overriding lz4-java here rather than excluding it?

@dongjoon-hyun
Copy link
Member

For me, DependencyOverrides sounds better than ExcludedDependencies.

@sarutak
Copy link
Member

sarutak commented Dec 22, 2025

Ah, lz4-java 1.8.0 and 1.10.0 have different groupId right?
So, it may be difficult to override.

lz4-java seems intended to be excluded from kafka-clients.

<artifactId>lz4-java</artifactId>

It's strange that lz4-java:1.8.0 appears with build/sbt Test/dependencyTree while it doesn't appear with build/sbt dependencyTree.

Anyway, if it's difficult to override lz4-java, how about excluding it from streaming-kafka-0-10 for SBT?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Dec 22, 2025

Oh, right.

Ah, lz4-java 1.8.0 and 1.10.0 have different groupId right?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun
Copy link
Member

Let me merge this to help the release manager. Hopefully, it may recover the CI.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-54597][BUILD][FOLLOW-UP] Explicitly exclude lz4-java in SBT build [SPARK-54597][BUILD][FOLLOW-UP] Explicitly exclude org.lz4:lz4-java in SBT build Dec 22, 2025
@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.2.0.

@dongjoon-hyun
Copy link
Member

Thank you, @HyukjinKwon and @sarutak .

@sarutak
Copy link
Member

sarutak commented Dec 22, 2025

Finally the CI seems recovered.
https://github.com/apache/spark/actions/runs/20424655023

@HyukjinKwon
Copy link
Member Author

Wow thanks guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants