Skip to content

[Improvement][Metrics] Switch to use tags to indicate task / workflow execution status for metrics#11128

Merged
EricGao888 merged 4 commits intoapache:devfrom
EricGao888:Fix-10867
Jul 26, 2022
Merged

[Improvement][Metrics] Switch to use tags to indicate task / workflow execution status for metrics#11128
EricGao888 merged 4 commits intoapache:devfrom
EricGao888:Fix-10867

Conversation

@EricGao888
Copy link
Copy Markdown
Member

@EricGao888 EricGao888 commented Jul 24, 2022

Purpose of the pull request

Brief change log

  • Use tags instead of names to indicate task / workflow instance execution state / status for metrics.
  • Will update docs / grafana-demos in following commits.

Verify this pull request

  • Verified by manual tests.

@EricGao888 EricGao888 self-assigned this Jul 24, 2022
@EricGao888 EricGao888 added this to the 3.1.0-alpha milestone Jul 24, 2022
@EricGao888 EricGao888 added metrics improvement make more easy to user or prompt friendly labels Jul 24, 2022
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jul 24, 2022

Codecov Report

Merging #11128 (011040f) into dev (71cf7e1) will increase coverage by 0.04%.
The diff coverage is 33.33%.

❗ Current head 011040f differs from pull request most recent head 9dd91b8. Consider uploading reports for the commit 9dd91b8 to get more accurate results

@@             Coverage Diff              @@
##                dev   #11128      +/-   ##
============================================
+ Coverage     40.37%   40.41%   +0.04%     
- Complexity     4873     4881       +8     
============================================
  Files           950      950              
  Lines         37187    37186       -1     
  Branches       4079     4078       -1     
============================================
+ Hits          15014    15030      +16     
+ Misses        20662    20632      -30     
- Partials       1511     1524      +13     
Impacted Files Coverage Δ
...r/api/service/impl/DqExecuteResultServiceImpl.java 70.37% <ø> (-1.06%) ⬇️
...che/dolphinscheduler/common/utils/HadoopUtils.java 15.42% <33.33%> (+9.45%) ⬆️
...r/plugin/task/sqoop/parameter/SqoopParameters.java 53.33% <0.00%> (-1.34%) ⬇️
...r/plugin/registry/zookeeper/ZookeeperRegistry.java 55.85% <0.00%> (-0.91%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us.

@sonarqubecloud
Copy link
Copy Markdown

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

34.9% 34.9% Coverage
2.2% 2.2% Duplication

Copy link
Copy Markdown
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM, some nit.

private static Map<String, Counter> PROCESS_INSTANCE_COUNTERS = new HashMap<>();

private static final Set<String> PROCESS_INSTANCE_STATES = ImmutableSet.of(
"submit", "timeout", "finish", "failover", "success", "fail", "stop");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to directly use ExecutionStatus.SUCCESS.getCode() to replace the success, and use

ProcessInstanceMetrics.incProcessInstanceByState(ExecutionStatus.SUCCESS.getCode());

to increase the counter.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thx for the suggestion. Will fix it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruanwenjun Hi I just checked this file

public enum ExecutionStatus {
/**
* status:
* 0 submit success
* 1 running
* 2 ready pause
* 3 pause
* 4 ready stop
* 5 stop
* 6 failure
* 7 success
* 8 need fault tolerance
* 9 kill
* 10 waiting thread
* 11 waiting depend node complete
* 12 delay execution
* 13 forced success
* 14 serial wait
* 15 ready block
* 16 block
* 17 dispatch
*/
SUBMITTED_SUCCESS(0, "submit success"),
RUNNING_EXECUTION(1, "running"),
READY_PAUSE(2, "ready pause"),
PAUSE(3, "pause"),
READY_STOP(4, "ready stop"),
STOP(5, "stop"),
FAILURE(6, "failure"),
SUCCESS(7, "success"),
NEED_FAULT_TOLERANCE(8, "need fault tolerance"),
KILL(9, "kill"),
WAITING_THREAD(10, "waiting thread"),
WAITING_DEPEND(11, "waiting depend node complete"),
DELAY_EXECUTION(12, "delay execution"),
FORCED_SUCCESS(13, "forced success"),
SERIAL_WAIT(14, "serial wait"),
READY_BLOCK(15, "ready block"),
BLOCK(16, "block"),
DISPATCH(17, "dispatch"),
;

It seems those states in workflow metrics and task metrics are not fully covered by ExecutionStatus e.g. failover and finish. What about keeping it this way for consistency and easier intelligibility?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is history problem, ExecutionStatus need to split into two class WorkflowExecutionStatus and TaskExecutionStatus, some states are only belong to workflow, and some states are only belong to task, and some status are deprecated.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is history problem, ExecutionStatus need to split into two class WorkflowExecutionStatus and TaskExecutionStatus, some states are only belong to workflow, and some states are only belong to task, and some status are deprecated.

@ruanwenjun Considering the legacy here, what should we do in this PR? Maybe I could open a new issue to split this ExecutionStatus ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a new issue to split ExecutionStatus and remove deprecated states in it: #11145

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing is if we use ExecutionStatus.SUCCESS.getCode() for tag, it might be a little bit unclear on the Grafana side as the tag value is integer in that case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend document improvement make more easy to user or prompt friendly metrics

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement][Metrics] Use tags to indicate task / workflow execution status for metrics

4 participants