-
Notifications
You must be signed in to change notification settings - Fork 5k
[DSIP-8][Metrics] Improve DolphinScheduler Monitoring #9324
Copy link
Copy link
Closed
Labels
DSIPfeaturenew featurenew featurehelp wantedExtra attention is neededExtra attention is neededmetrics
Description
Search before asking
- I had searched in the issues and found no similar feature requirement.
Description
- Monitoring plays an essential role in software stability. However, there is only statics but no metrics in Dolphin Scheduler at present, which means users cannot export metrics to external observation system to monitor their workflows, tasks, as well as DS performance.
- However, to match our slogan
Choose good tools, Back home early. Use Right Scheduler, Sleep Tight.we need richer metrics to increase monitoring ability and give our users better experience using Dolphinscheduler, especially in production environment. - Here are the Email Thread and Proposal.
Use case
- To make the expected improvement described in
Descriptionsection happen, we could take three steps:
- List all the metrics we need classified by different parts of Dolphinscheduler, such as master, worker, api server, etc. Here's the doc link for metrics list.
- Apply the code in the right place and collect these metrics with our metrics-collection frame.
- Find a method to expose these metrics to external system. related: [Improvement][Common] Use JMX to expose configuration and metrics #5255
Action Items
Stage I
- List the basic metrics for workflow / task / system and embed them in the code: [Feature][metrics] Add master, worker metrics #10326 [Improvement][Metrics] Use tags to indicate task / workflow execution status for metrics #10867
- Enable developers to test and debug metrics conveniently in standalone mode: [Feature][Metrics] Enable prometheus to collect metrics in standalone mode demo #10395
- Establish the naming convention for DS metrics: [Improvement][Metrics] Apply micrometer naming convention to metrics #10432 [Improvement][Metrics] Update some metrics names in grafana-demo dashboards #10552
- Add resource download related metrics for workers: [Feature][Metrics] Add resource download related metrics for workers #10749
- Add metrics for alert server: [Improvement][Metrics] Add metrics for alert server #11131
- Add metrics for api server: [Feature][Metrics] Add metrics for api server #11472
- Check the correctness of metrics when DS deployed with multiple masters and workers.
Stage II
- Make external monitoring system configurable and extensible.
- Add popular exporters supported by
MicrometerbesidesPrometheus, such asCloudWatch,Datadog,StatsD,Influx,JMX,Elastic, etc. For a full list, visit MicrometerSetupsection. In addition, to provide users with smooth experience, we should add docker yaml files for each exporter for the demo purpose.
Stage III
- Add user-configurable metrics filter: [Feature][Metrics] Add user-configurable metrics filters #10527
- Increase the granularity and richness of DS metrics to achieve the same or better observability than Apache Airflow: [Feature][Metrics] Increase granularity of metrics #10525
Related issues
related: #5255
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
DSIPfeaturenew featurenew featurehelp wantedExtra attention is neededExtra attention is neededmetrics
Type
Projects
Status
Done