-
Notifications
You must be signed in to change notification settings - Fork 409
[CELEBORN-1402] SparkShuffleManager print warning log for spark.executor.userClassPathFirst=true with ShuffleManager defined in user jar #2482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
We ask users to copy celeborn client side jar to spark jars directory - that should ensure this problem does not occur, right ? So this would apply only when users are leveraging |
|
@mridulm, I left some points according to your reply:
Do you have any suggestion to the warning log? |
lifulong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@SteNicholas That makes sense, there are definitely scenarios where we cannot update the spark distribution with additional jars. The reason why I brought that up is, currently the warning will always be emitted ... even if celeborn jars are part of spark distribution and so not susceptible to the issue. Given this, can be add to the |
|
@mridulm, I got your main point of concern. I have addressed above comment and reduced the scope of printing warning logs. PTAL. |
FMX
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
client-spark/spark-3/src/main/java/org/apache/spark/shuffle/celeborn/SparkShuffleManager.java
Outdated
Show resolved
Hide resolved
client-spark/spark-3/src/main/java/org/apache/spark/shuffle/celeborn/SparkShuffleManager.java
Outdated
Show resolved
Hide resolved
4c47827 to
cc39797
Compare
|
@cxzl25, I have followed the check |
|
@mridulm, PTAL. |
client-spark/spark-2/src/main/java/org/apache/spark/shuffle/celeborn/SparkShuffleManager.java
Outdated
Show resolved
Hide resolved
|
@mridulm, @pan3793, @cfmcgrady, @cxzl25, I have address above comments. PTAL. |
|
Hi @SteNicholas, which modes did you test this in ? This should cause the executor to fail, and so wont get to this PR. I tested this with: (basic celeborn config is there in my defaults config) |
|
Works fine with celeborn jar in spark jar directory, not with --jars for 3.1: which is what I would expect actually (given SPARK-45762). How were you able to test/validate this @SteNicholas ? It is possible I am missing some steps here ? |
|
@mridulm, I just tested with Spark SQL via above command. Did you ever try to test Spark SQL to verify? |
|
Hi @SteNicholas , I was able to reproduce this behavior when celeborn jar is there in both spark platform jars and is specified by users via If yes, in this case, it might simply be easier to ask users not to add celeborn jar to their --jars ? Please let me know if I am missing anything. |
|
@mridulm, my deployment experience is that the celeborn client jar is not placed in |
|
Before 4.0, Apache Spark does not support shuffle manager from user jars - so I am not sure how this is working for you ... unless the fix was cherry picked/independently fixed in the Spark fork you are running perhaps ? Once 4.0 is released, this patch will become relevant: so the PR is useful and I am in favor of merging it ... I want to make sure we have the right expectations: given spark does not support this feature before 4.0, trying this will fail for users 😃 |
cxzl25
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Because the final fallback is to Spark's SortShuffleWriter, this may lead to inaccurate data. See SPARK-48037
…tor.userClassPathFirst=true with ShuffleManager defined in user jar
5729a71 to
3bbdec4
Compare
|
@mridulm, I uses the internal Spark 3.1 version to verify this. Anyway, this pull request could help user to confirm whether to use Celeborn in |
|
Merged to main(v0.5.0). |
What changes were proposed in this pull request?
SparkShuffleManagerprint warning log forspark.executor.userClassPathFirst=truewithShuffleManagerdefined in user jar via--jarorspark.jars.Why are the changes needed?
When
spark.executor.userClassPathFirstis enabled with ShuffleManager defined in user jar, theClassLoaderofhandleisChildFirstURLClassLoader, which is different fromCelebornShuffleHandleof which theClassLoaderisAppClassLoaderinSparkShuffleManager#getWriter/getReader. The local test log is as follows:It causes that
SparkShuffleManagerfallback to vanilla SparkSortShuffleManagerforspark.executor.userClassPathFirst=truewithShuffleManagerdefined in user jar before apache/spark#43627. After SPARK-45762, theClassLoaderofhandleandCelebornShuffleHandleare bothChildFirstURLClassLoader.Therefore,
SparkShuffleManagershould print warning log to remind forspark.executor.userClassPathFirst=truewithShuffleManagerdefined in user jar.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manual test.