Metric for total compute time by task prefix, including currently-execuing tasks

In #7406, we added a prometheus metric for total on-thread compute time per task prefix. This basically gives you total CPU-seconds per second. This is quite a useful overview metric to show on grafana, since it can a) show generally which tasks were running when, and b) show how well-utilized the cluster was (if you have 20 threads, 20sec of compute time per second would be 100% utilization).

However, because this metric is only updated when tasks complete, if tasks are longer than the prometheus scrape interval, the metric can look quite misleading:
![image](https://user-images.githubusercontent.com/3309802/221997654-b4e8c9e1-aa09-49f6-b0b5-4313c6ef2f15.png)

Here we're comparing `rate(dask_scheduler_tasks_compute_seconds_total[15s])` and `dask_worker_tasks{state="executing"}`. This second metric was just updated by @crusaderky in #7506, but it's basically a gauge of the current number of tasks in state `executing` on each worker.

You'd expect these metrics to be pretty close to the same. If `nthreads` tasks were in state `executing`, you'd expect the total compute time to be about `nthreads` seconds. However, they look wildly different. I _suspect_ part/most of this is because the tasks are (much?) slower than the Prometheus scrape interval.

If you had a 5min task, you'll get a bunch of scrapes with no increase in the `tasks_compute_seconds_total` metric, then one scrape where it bursts up by 5min. I'm hoping that explains the spikiness here.

Ideally we could add another metric (or update this metric) to include time tasks have spent on the threadpool so far, even when they haven't completed yet.

---

Though it's also interesting that even if you set the aggregation interval to something very long like 30min, the averaged-out rate is still much lower than the theoretical max:
![image](https://user-images.githubusercontent.com/3309802/221997773-64ea10c6-fc7e-47f5-a05f-7088a8a2d151.png)

Because `dask_worker_tasks` is a gauge, it's also possible that this is an overestimate (it's only sampled every 5s, or whatever the scrape interval is; any idle time between the scrapes is not captured).

The high worker CPU in this case makes me think it's probably not that far off, though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metric for total compute time by task prefix, including currently-execuing tasks #7601

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Metric for total compute time by task prefix, including currently-execuing tasks #7601

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions