Skip to content

[Bug] [Master] Worker failover will cause task cannot be failover#10631

Merged
ruanwenjun merged 6 commits intoapache:devfrom
ruanwenjun:dev_wenjun_fixTaskDispatchEventLoss
Jun 28, 2022
Merged

[Bug] [Master] Worker failover will cause task cannot be failover#10631
ruanwenjun merged 6 commits intoapache:devfrom
ruanwenjun:dev_wenjun_fixTaskDispatchEventLoss

Conversation

@ruanwenjun
Copy link
Copy Markdown
Member

Purpose of the pull request

close #10630

Brief change log

  • When do worker failover, directly query the taskInstance from cache.
  • Add some log

@SbloodyS SbloodyS added bug Something isn't working backend labels Jun 27, 2022
@SbloodyS SbloodyS added this to the 3.0.0-beta-3 milestone Jun 27, 2022
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 27, 2022

Codecov Report

Merging #10631 (eeff932) into dev (b518413) will decrease coverage by 0.11%.
The diff coverage is 26.79%.

❗ Current head eeff932 differs from pull request most recent head 3a8bde7. Consider uploading reports for the commit 3a8bde7 to get more accurate results

@@             Coverage Diff              @@
##                dev   #10631      +/-   ##
============================================
- Coverage     41.08%   40.96%   -0.12%     
+ Complexity     4898     4857      -41     
============================================
  Files           897      897              
  Lines         36239    36170      -69     
  Branches       3987     3988       +1     
============================================
- Hits          14887    14818      -69     
+ Misses        19890    19880      -10     
- Partials       1462     1472      +10     
Impacted Files Coverage Δ
...ache/dolphinscheduler/common/enums/StateEvent.java 0.00% <0.00%> (ø)
.../apache/dolphinscheduler/common/utils/OSUtils.java 34.14% <0.00%> (ø)
...ver/master/consumer/TaskPriorityQueueConsumer.java 0.00% <0.00%> (ø)
...r/server/master/runner/MasterSchedulerService.java 0.00% <0.00%> (ø)
.../server/master/runner/StateWheelExecuteThread.java 0.47% <0.00%> (ø)
.../server/master/runner/WorkflowExecuteRunnable.java 7.76% <0.00%> (-0.01%) ⬇️
...erver/master/runner/WorkflowExecuteThreadPool.java 1.36% <0.00%> (+0.05%) ⬆️
...ler/server/master/runner/task/TaskInstanceKey.java 0.00% <0.00%> (ø)
...pache/dolphinscheduler/remote/command/Command.java 60.60% <ø> (ø)
...uler/remote/command/TaskExecuteRequestCommand.java 0.00% <0.00%> (ø)
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b518413...3a8bde7. Read the comment docs.

@ruanwenjun
Copy link
Copy Markdown
Member Author

@caishunfeng Please take a look.

kezhenxu94
kezhenxu94 previously approved these changes Jun 28, 2022
Copy link
Copy Markdown
Member

@kezhenxu94 kezhenxu94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ruanwenjun ruanwenjun force-pushed the dev_wenjun_fixTaskDispatchEventLoss branch from f980a4a to 3a8bde7 Compare June 28, 2022 07:00
@sonarqubecloud
Copy link
Copy Markdown

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 9 Code Smells

10.8% 10.8% Coverage
0.0% 0.0% Duplication

@ruanwenjun ruanwenjun merged commit 66624c5 into apache:dev Jun 28, 2022
@ruanwenjun ruanwenjun deleted the dev_wenjun_fixTaskDispatchEventLoss branch June 28, 2022 08:08
zhongjiajie pushed a commit to zhongjiajie/dolphinscheduler that referenced this pull request Jul 5, 2022
ruanwenjun added a commit to ruanwenjun/dolphinscheduler that referenced this pull request Jul 12, 2022
* [Bug] [Master] Worker failover will cause task cannot be failover (apache#10631)

* fix worker failover may lose event

Signed-off-by: ruanwenjun <wenjun@apache.org>

* fix compile error

* Fix t_ds_worker_group cannot find in startup
ruanwenjun added a commit that referenced this pull request Jul 19, 2022
…0631)

* fix worker failover may lose event

(cherry picked from commit 66624c5)
@huanhuande
Copy link
Copy Markdown

Can I ask if 10631 mainly fixes the master deadlock?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [Master] Worker failover will cause task cannot be failover

5 participants