Skip to content

Conversation

@wForget
Copy link
Member

@wForget wForget commented Dec 13, 2023

What changes were proposed in this pull request?

Fix NPE for ShuffleWriteClientImpl.unregisterShuffle.

Why are the changes needed?

Since the id of RssShuffleManager in Executor is assigned in getWriter, this means that the id may be null before getWriter, so this may cause NPE to occur in ShuffleWriteClientImpl.unregisterShuffle.

error log:

23/12/12 14:25:47 ERROR RssShuffleManager: Errors on closing shuffle write client
java.lang.NullPointerException
	at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
	at org.apache.uniffle.client.impl.ShuffleWriteClientImpl.unregisterShuffle(ShuffleWriteClientImpl.java:751)
	at org.apache.spark.shuffle.RssShuffleManager.stop(RssShuffleManager.java:591)
	at org.apache.spark.SparkEnv.stop(SparkEnv.scala:90)
	at org.apache.spark.executor.Executor.stop(Executor.scala:335)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1$$anon$1.run(CoarseGrainedExecutorBackend.scala:203)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

(Please test your changes, and provide instructions on how to test it:

  1. If you add a feature or fix a bug, add a test to cover your changes.
  2. If you fix a flaky test, repeat it for many times to prove it works.)

@jerqi jerqi changed the title Fix NPE for ShuffleWriteClientImpl.unregisterShuffle [MINOR] fix(spark): Fix NPE for ShuffleWriteClientImpl.unregisterShuffle Dec 13, 2023
Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @wForget

@codecov-commenter
Copy link

codecov-commenter commented Dec 13, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (ecbf2e7) 53.22% compared to head (41af3ad) 54.12%.

Files Patch % Lines
...he/uniffle/client/impl/ShuffleWriteClientImpl.java 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1367      +/-   ##
============================================
+ Coverage     53.22%   54.12%   +0.90%     
  Complexity     2715     2715              
============================================
  Files           418      398      -20     
  Lines         23932    21573    -2359     
  Branches       2043     2043              
============================================
- Hits          12738    11677    -1061     
+ Misses        10408     9181    -1227     
+ Partials        786      715      -71     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your contribution. @wForget

@zuston zuston merged commit f17e060 into apache:master Dec 13, 2023
xianjingfeng pushed a commit that referenced this pull request Dec 13, 2023
…fle (#1367)

### What changes were proposed in this pull request?

Fix NPE for ShuffleWriteClientImpl.unregisterShuffle.

### Why are the changes needed?

Since the `id` of `RssShuffleManager` in Executor is assigned in `getWriter`, this means that the `id` may be null before `getWriter`, so this may cause NPE to occur in `ShuffleWriteClientImpl.unregisterShuffle`.

error log:

```
23/12/12 14:25:47 ERROR RssShuffleManager: Errors on closing shuffle write client
java.lang.NullPointerException
	at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
	at org.apache.uniffle.client.impl.ShuffleWriteClientImpl.unregisterShuffle(ShuffleWriteClientImpl.java:751)
	at org.apache.spark.shuffle.RssShuffleManager.stop(RssShuffleManager.java:591)
	at org.apache.spark.SparkEnv.stop(SparkEnv.scala:90)
	at org.apache.spark.executor.Executor.stop(Executor.scala:335)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1$$anon$1.run(CoarseGrainedExecutorBackend.scala:203)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

(cherry picked from commit f17e060)
lifeSo added a commit to lifeSo/incubator-uniffle that referenced this pull request Dec 22, 2023
* master: (23 commits)
  format
  use synchronized
  format
  modify to singleton
  format
  format
  format
  format
  format
  change Coordinator to singleton like ShuffleServerClientFactory.java
  [apache#1189] feat(server): Add netty used direct memory size metric (apache#1363)
  format
  format
  format
  rename
  file rename
  [apache#960] fix(dashboard): simplify dependency and correct the startup script (apache#1347)
  [apache#1348] improvement(metrics): Unify tags generation for shuffle-server metrics reporter (apache#1349)
  [MINOR] chore: fix kubernetes ci pipeline (apache#1368)
  [MINOR] fix(spark): Fix NPE for ShuffleWriteClientImpl.unregisterShuffle (apache#1367)
  ...
lifeSo added a commit to lifeSo/incubator-uniffle that referenced this pull request Dec 26, 2023
…re_1119_new_2023_12_23

* feature_1119_BUFFER_LIMIT_OF_HUGE_PARTITION:
  add file for commit
  [apache#1189] feat(server): Add netty used direct memory size metric (apache#1363)
  format
  format
  format
  rename
  file rename
  [apache#960] fix(dashboard): simplify dependency and correct the startup script (apache#1347)
  [apache#1348] improvement(metrics): Unify tags generation for shuffle-server metrics reporter (apache#1349)
  [MINOR] chore: fix kubernetes ci pipeline (apache#1368)
  [MINOR] fix(spark): Fix NPE for ShuffleWriteClientImpl.unregisterShuffle (apache#1367)
  format code
  format code
  添加堆外内存监控的代码
zuston pushed a commit to zuston/incubator-uniffle that referenced this pull request Jan 18, 2024
…fle (apache#1367)

### What changes were proposed in this pull request?

Fix NPE for ShuffleWriteClientImpl.unregisterShuffle.

### Why are the changes needed?

Since the `id` of `RssShuffleManager` in Executor is assigned in `getWriter`, this means that the `id` may be null before `getWriter`, so this may cause NPE to occur in `ShuffleWriteClientImpl.unregisterShuffle`.

error log:

```
23/12/12 14:25:47 ERROR RssShuffleManager: Errors on closing shuffle write client
java.lang.NullPointerException
	at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
	at org.apache.uniffle.client.impl.ShuffleWriteClientImpl.unregisterShuffle(ShuffleWriteClientImpl.java:751)
	at org.apache.spark.shuffle.RssShuffleManager.stop(RssShuffleManager.java:591)
	at org.apache.spark.SparkEnv.stop(SparkEnv.scala:90)
	at org.apache.spark.executor.Executor.stop(Executor.scala:335)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1$$anon$1.run(CoarseGrainedExecutorBackend.scala:203)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants