Skip to content

[DS-6616][WorkerServer] fix worker stop fail and fakes death#6621

Merged
lenboo merged 3 commits intoapache:devfrom
caishunfeng:fix_server_stop
Oct 29, 2021
Merged

[DS-6616][WorkerServer] fix worker stop fail and fakes death#6621
lenboo merged 3 commits intoapache:devfrom
caishunfeng:fix_server_stop

Conversation

@caishunfeng
Copy link
Copy Markdown
Contributor

this pr close #6616

Purpose of the pull request

Brief change log

Verify this pull request

This pull request is code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

Comment on lines +125 to +135
try {
String address = getLocalAddress();
Set<String> workerZkPaths = getWorkerZkPaths();
for (String workerZkPath : workerZkPaths) {
registryClient.remove(workerZkPath);
logger.info("worker node : {} unRegistry from ZK {}.", address, workerZkPath);
}
this.heartBeatExecutor.shutdownNow();
logger.info("heartbeat executor shutdown");
registryClient.close();
} catch (Exception ex) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might better just catch KeeperException.ConnectionLossException here?

Suggested change
try {
String address = getLocalAddress();
Set<String> workerZkPaths = getWorkerZkPaths();
for (String workerZkPath : workerZkPaths) {
registryClient.remove(workerZkPath);
logger.info("worker node : {} unRegistry from ZK {}.", address, workerZkPath);
}
this.heartBeatExecutor.shutdownNow();
logger.info("heartbeat executor shutdown");
registryClient.close();
} catch (Exception ex) {
try {
String address = getLocalAddress();
Set<String> workerZkPaths = getWorkerZkPaths();
for (String workerZkPath : workerZkPaths) {
registryClient.remove(workerZkPath);
logger.info("worker node : {} unRegistry from ZK {}.", address, workerZkPath);
}
} catch (KeeperException.ConnectionLossException ex) {
logger.error("remove worker zk path exception", ex)
}
this.heartBeatExecutor.shutdownNow();
logger.info("heartbeat executor shutdown");
registryClient.close();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should catch all exception here, because it is the final catch for remove registry info. If it juse catch some exception, the problem will arise again.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I concern is that when the connection lost, will it reconnect again? Since the catch scope here may cause the registry.close() will not be executed.

And other thing need to confirm, will it cause some thread fail to exit? I am not clear.

Copy link
Copy Markdown
Contributor Author

@caishunfeng caishunfeng Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. It should be better to just try catch of remove zk path, exclude heartBeatExecutor and registryClient close.
I have modified it.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

Merging #6621 (ae7cc57) into dev (cbeedba) will decrease coverage by 0.00%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##                dev    #6621      +/-   ##
============================================
- Coverage     38.51%   38.50%   -0.01%     
- Complexity     3221     3222       +1     
============================================
  Files           646      646              
  Lines         25840    25844       +4     
  Branches       2799     2799              
============================================
- Hits           9952     9951       -1     
- Misses        14974    14980       +6     
+ Partials        914      913       -1     
Impacted Files Coverage Δ
...r/server/worker/registry/WorkerRegistryClient.java 0.00% <0.00%> (ø)
...ache/dolphinscheduler/server/log/LoggerServer.java 78.26% <0.00%> (-8.70%) ⬇️
...er/master/dispatch/host/assign/RandomSelector.java 83.33% <0.00%> (+5.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cbeedba...ae7cc57. Read the comment docs.

@sonarqubecloud
Copy link
Copy Markdown

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

Copy link
Copy Markdown
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@lenboo lenboo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lenboo lenboo merged commit 06e8e24 into apache:dev Oct 29, 2021
lenboo pushed a commit that referenced this pull request Nov 1, 2021
* [DS-6616][WorkerServer] fix worker stop fail and fakes death

* remove unuse test

* just add try catch to remove zk worker path

Co-authored-by: caishunfeng <534328519@qq.com>
@caishunfeng caishunfeng deleted the fix_server_stop branch December 15, 2021 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [Worker] Worker fakes death when it stop itself fail.

4 participants