-
-
Notifications
You must be signed in to change notification settings - Fork 757
Description
In #7406, we added a prometheus metric for total on-thread compute time per task prefix. This basically gives you total CPU-seconds per second. This is quite a useful overview metric to show on grafana, since it can a) show generally which tasks were running when, and b) show how well-utilized the cluster was (if you have 20 threads, 20sec of compute time per second would be 100% utilization).
However, because this metric is only updated when tasks complete, if tasks are longer than the prometheus scrape interval, the metric can look quite misleading:

Here we're comparing rate(dask_scheduler_tasks_compute_seconds_total[15s]) and dask_worker_tasks{state="executing"}. This second metric was just updated by @crusaderky in #7506, but it's basically a gauge of the current number of tasks in state executing on each worker.
You'd expect these metrics to be pretty close to the same. If nthreads tasks were in state executing, you'd expect the total compute time to be about nthreads seconds. However, they look wildly different. I suspect part/most of this is because the tasks are (much?) slower than the Prometheus scrape interval.
If you had a 5min task, you'll get a bunch of scrapes with no increase in the tasks_compute_seconds_total metric, then one scrape where it bursts up by 5min. I'm hoping that explains the spikiness here.
Ideally we could add another metric (or update this metric) to include time tasks have spent on the threadpool so far, even when they haven't completed yet.
Though it's also interesting that even if you set the aggregation interval to something very long like 30min, the averaged-out rate is still much lower than the theoretical max:

Because dask_worker_tasks is a gauge, it's also possible that this is an overestimate (it's only sampled every 5s, or whatever the scrape interval is; any idle time between the scrapes is not captured).
The high worker CPU in this case makes me think it's probably not that far off, though.