Skip to content

[Feature-3575][*] Force Task Success#3576

Closed
legen618 wants to merge 52 commits intoapache:devfrom
legen618:dev-FTS
Closed

[Feature-3575][*] Force Task Success#3576
legen618 wants to merge 52 commits intoapache:devfrom
legen618:dev-FTS

Conversation

@legen618
Copy link
Copy Markdown
Contributor

What is the purpose of the pull request

This pull request adds function of Force Task Success. See #3575

Brief change log

The main changes focus on ApiServer and MasterServer.

Verify this pull request

This pull request is already covered by existing tests, such as (please describe tests).

This change added tests and can be verified as follows:

  • Added unit test to verify the change.
  • Manually verified the change by testing locally.

legen618 and others added 30 commits July 22, 2020 10:08
1. 新添加executeType
2. 新添加对应的commandType
3. 修改controller对应的service部分
4. 补充processService中的verifyIsNeedCreateCommand的判断逻辑
1. 添加两个查询(mapper.java和mapper.xml)
2. 添加ProcessService中constructProcessInstance中对应的部分
1. 这样的话可以完整的构建整个dag,恢复之前的上下文;也没有任务可能重复执行的担忧
2. 不需要额外去处理process执行完之后的状态(主要是部分与整体的原因)
3. RecoverNodeIdList也不会重复
1. 失败任务强制成功后会继续submit后续结点
2. 失败重试的节点强制成功后会继续submit后面结点,并且停止retry
3. 对于失败的sub-process如果其中的结点强制成功了,在parent-process中不会有任何影响,只能等到结束后“从强制成功处继续执行”。
1. dataAnalysis
2. alertManager
3. businessTimeUtils的schedule选择
1. taskStateCount中添加了强制成功的状态
2. processStateCount中剔除强制成功的状态
regular update local dev
防止logger注入的问题
@yangyichao-mango
Copy link
Copy Markdown
Contributor

Please resolve three code smell from sonar.

@legen618
Copy link
Copy Markdown
Contributor Author

legen618 commented Sep 8, 2020

Please resolve three code smell from sonar.

There's still two code smells, but I think it's unnecessary to fix it. I'd like to explain in details:

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Sep 8, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 2 Code Smells

40.9% 40.9% Coverage
0.0% 0.0% Duplication

@legen618
Copy link
Copy Markdown
Contributor Author

legen618 commented Sep 8, 2020

Because the number of commit is relatively large and messy, which may affect the efficiency of code review, I hereby explain each modified file:

For the function that changes the task state to force success: Mainly adds an interface. For a single failed task specified by the user, the task state can be changed to force success (FORCED_SUCCESS).
The following files have been modified:

  • TaskInstanceController. Java : increase the API
  • 'TaskInstanceService. Java' : The logical implementation of the above API, the general process is to determine whether the user has operation permission, verify whether the specified task instance exists, determine whether the state of the task instance is typeIsFailure, if these conditions are met, then modify the state of the task instance in the database
  • TaskInstanceControllerTest. Java : API corresponding unit test
  • TaskInstanceServiceTest. Java : logic to implement the corresponding unit tests
  • 'executionstatus. Java' : Add the status FORCED_SUCCESS to the decision for the TypeIsSuccess function

For the running workflow to verify whether there is a mandatory success: is mainly to modify 'masterexecthread.java', in the 'runProcess()' method to continuously scan the current workflow to see whether any failed tasks have been successfully enforced, that is, to traverse the state of errorList in the database;
Remove from the errorList, if any, and submit the task after the node.

  • one other note here is that I have included a change in pull request#3375 to fix a bug that determines if the task dependency is satisfied.

Running from force success for stopped workflows: this is mainly the addition of a new commandType that operates on the workflows using the original Execute interface.
The process of execution reads in the entire DAG, then checks from start to finish. Tasks marked FORCED_SUCCESS that were not performed before are treated as' typeIsSuccess '.
Modified the following file:

  • 'executetype.java' : Add a new type called RESUME_FROM_FORCED_SUCCESS
  • 'ExecutorService.java' : Adds some validation of the new executeType, the workflow is stopped to perform the restore, and all task instances with flag 1 on the last run must have been enforced to succeed.
  • 'ExecutorServiceTest2' : Corresponding unit test
  • 'commandType. Java' : adds RESUME_FROM_FORCED_SUCCESS
  • 'processService.java' : Added handling for RESUME_FROM_FORCED_SUCCESS when building workflow.
    Contains a few small functions that recursively determine whether a task instance has been forced to succeed in a nested sub-process.
  • 'taskinstancemapper. Java', 'taskinstancemapper. XML' : the interface and SQL statement corresponding to the database access is the dependency of the new content in ProcessService. Java.
  • 'daghel.java' : This is mainly a modification of getStartVertex because the recovery process from the forced success will load the complete DAG, and the judgment process will require some additional judgment logic

For some Status codes: uniformly added in 'status.java'

In addition to the above, the four corresponding files have been modified as follows because TaskType and CommandType have been added:

  • AlertManager.java
  • BusinessTimeUtils.java

The above should be easy to understand

  • DataAnalysisServiceImpl. Java and TaskCountDto. Java : the two is a modification of various task state statistics, increased FORCED_SUCCESS this state.

因为我的commit数量比较多而且相对比较凌乱,可能影响code review的效率,所以在这里对每一个改动的文件加以说明:

**对于修改任务状态为强制成功这一功能:**主要是增加一个接口,对用户指定的单个失败任务,可以任务状态修改为强制成功过(FORCED_SUCCESS)。修改了以下几个文件:

  • TaskInstanceController.java: 增加api
  • TaskInstanceService.java: 上述api对应的逻辑实现,大概流程是判断用户是否有操作权限,检验指定的任务实例是否存在,判断任务实例的状态是否为typeIsFailure,如果这些条件都满足了则修改数据库中任务实例的状态
  • TaskInstanceControllerTest.java: api对应的单元测试
  • TaskInstanceServiceTest.java: 逻辑实现对应的单元测试
  • ExecutionStatus.java: 增加FORCED_SUCCESS这种状态,同时将其加入TypeIsSuccess函数的判断中

**对于运行的工作流检验是否有强制成功的发生:**主要是修改了MasterExecThread.java,在runProcess()方法中持续扫描当前工作流中是否有失败的任务被强制成功了,即遍历errorList在数据库中的状态;如果有的话就从errorList中移除,同时submit节点之后的任务。

  • 这里还需要说明一点,就是我把pull request#3375的改动也加进来了,就是修复一个判断任务依赖是否满足的bug。

对于停止的工作流从强制成功处运行:主要是增加一个新的commandType,利用原来的execute接口对工作流来进行操作。执行的过程会读入整个dag,然后从头到尾的检测,执行之前没有执行的,标记为FORCED_SUCCESS的任务会当做typeIsSuccess。修改了以下的文件:

  • ExecuteType.java: 增加一种新的类型,叫 RESUME_FROM_FORCED_SUCCESS
  • ExecutorService.java: 增加了对新executeType的一些检验,工作流处于停止状态才能执行恢复的操作,同时上一次运行中对应的所有flag为1的任务实例中必须要有被强制成功过的。
  • ExecutorServiceTest2: 对应的单元测试
  • CommandType.java: 增加 RESUME_FROM_FORCED_SUCCESS
  • ProcessService.java: 增加构建工作流时对 RESUME_FROM_FORCED_SUCCESS 的处理。包含了一些小的函数,用来递归的判断嵌套的sub-process中是否有强制成功过的任务实例。
  • TaskInstanceMapper.javaTaskInstanceMapper.xml: 对应的数据库访问的接口和SQL语句,就是ProcessService.java的中新增内容的依赖。
  • DagHelper.java: 主要是修改了getStartVertex这个函数,因为从强制成功处恢复的过程会载入完整的dag,判断的过程需要增加一些额外的判断逻辑

**对于一些状态码:**统一增加在Status.java

除上述而外,因为新增了TaskTypeCommandType,所以修改了4个对应的文件如下:

  • AlertManager.java
  • BusinessTimeUtils.java

以上应该很容易理解

  • DataAnalysisServiceImpl.javaTaskCountDto.java: 这两个是对各种任务状态统计的一个修改,增加了FORCED_SUCCESS这个状态。

@yangyichao-mango
Copy link
Copy Markdown
Contributor

I will trace it later.
image

@yangyichao-mango
Copy link
Copy Markdown
Contributor

yangyichao-mango commented Sep 12, 2020

When I force success the fail job. And refresh front end , it will show like this.
We should add the new FORCE_SUCCESS state to { tasksState } from '@/conf/home/pages/dag/_source/config', and it will be better when we just return the list of tasksState, it can dynamic to return the state.


我们可以添加返回所有taskstate的列表的api,这样之后就不用在添加一个新的状态时,还需要改动config文件了。@break60 #3726

image

Copy link
Copy Markdown
Contributor

@yangyichao-mango yangyichao-mango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yangyichao-mango
Copy link
Copy Markdown
Contributor

This pr also Fixed #3375

@CalvinKirs
Copy link
Copy Markdown
Member

Hi, please resolve the conflict..

@legen618
Copy link
Copy Markdown
Contributor Author

Hi, please resolve the conflict..

OK. I've fixed the conflicts.

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #3576 into dev will increase coverage by 0.25%.
The diff coverage is 43.71%.

Impacted file tree graph

@@             Coverage Diff              @@
##                dev    #3576      +/-   ##
============================================
+ Coverage     39.46%   39.72%   +0.25%     
- Complexity     2900     2931      +31     
============================================
  Files           459      459              
  Lines         21802    21931     +129     
  Branches       2648     2676      +28     
============================================
+ Hits           8605     8712     +107     
- Misses        12394    12400       +6     
- Partials        803      819      +16     
Impacted Files Coverage Δ Complexity Δ
...heduler/api/controller/TaskInstanceController.java 18.18% <0.00%> (-15.16%) 2.00 <0.00> (ø)
...heduler/server/master/runner/MasterExecThread.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...he/dolphinscheduler/server/utils/AlertManager.java 0.91% <0.00%> (-0.01%) 1.00 <0.00> (ø)
...lphinscheduler/service/process/ProcessService.java 4.59% <0.00%> (-0.21%) 6.00 <0.00> (ø)
...dolphinscheduler/common/enums/ExecutionStatus.java 62.16% <50.00%> (+1.05%) 6.00 <0.00> (ø)
.../apache/dolphinscheduler/api/dto/TaskCountDto.java 56.75% <70.00%> (+2.06%) 5.00 <2.00> (+2.00)
...er/common/utils/placeholder/BusinessTimeUtils.java 71.42% <71.42%> (ø) 2.00 <2.00> (ø)
...uler/api/service/impl/DataAnalysisServiceImpl.java 57.82% <80.00%> (+0.48%) 17.00 <2.00> (+1.00)
.../dolphinscheduler/api/service/ExecutorService.java 48.16% <83.33%> (+14.25%) 29.00 <3.00> (+10.00)
...g/apache/dolphinscheduler/dao/utils/DagHelper.java 41.48% <86.66%> (+14.37%) 29.00 <10.00> (+11.00)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 05b248f...7817601. Read the comment docs.

@sonarqubecloud
Copy link
Copy Markdown

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 2 Code Smells

40.9% 40.9% Coverage
0.0% 0.0% Duplication

@davidzollo
Copy link
Copy Markdown
Contributor

davidzollo commented Jan 7, 2021

good job, good beginning, thanks for your participation and contribution,
PR #4267 merged the code and commits of this PR and do some more work , so I will close this PR

thanks again

@davidzollo davidzollo closed this Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

first time contributor First-time contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][*] Force Task Success

5 participants