Skip to content

[DSIP-8][Metrics] Improve DolphinScheduler Monitoring #9324

@EricGao888

Description

@EricGao888

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

  • Monitoring plays an essential role in software stability. However, there is only statics but no metrics in Dolphin Scheduler at present, which means users cannot export metrics to external observation system to monitor their workflows, tasks, as well as DS performance.
  • However, to match our slogan Choose good tools, Back home early. Use Right Scheduler, Sleep Tight. we need richer metrics to increase monitoring ability and give our users better experience using Dolphinscheduler, especially in production environment.
  • Here are the Email Thread and Proposal.

Use case

  • To make the expected improvement described in Description section happen, we could take three steps:
  1. List all the metrics we need classified by different parts of Dolphinscheduler, such as master, worker, api server, etc. Here's the doc link for metrics list.
  2. Apply the code in the right place and collect these metrics with our metrics-collection frame.
  3. Find a method to expose these metrics to external system. related: [Improvement][Common] Use JMX to expose configuration and metrics #5255

Action Items

Stage I

Stage II

  • Make external monitoring system configurable and extensible.
  • Add popular exporters supported by Micrometer besides Prometheus, such as CloudWatch, Datadog, StatsD, Influx, JMX, Elastic, etc. For a full list, visit Micrometer Setup section. In addition, to provide users with smooth experience, we should add docker yaml files for each exporter for the demo purpose.

Stage III

Related issues

related: #5255

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions