-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[Dashboard] metrics page is extremely slow and unreliable #55499
Description
What happened + What you expected to happen
When I launch a simple job and try to view the "Metrics" page in the Ray Dashboard, it takes 15-30s to load (whereas the same dashboards in Grafana take <1s), often crashes my Chrome tab with SIGTRAP or Error code 5, and uses tons of memory (opening the Default dashboard in Grafana uses 85MB and the Data dashboard uses 102MB, while viewing them in Ray uses 1.4GB).
As far as I can tell, this is happening because the page embeds 60 Grafana iframes. I don't believe the page load time is an issue with my Grafana instance, because most of the time is spent loading static assets and they're all loaded from the memory or disk caches. I'm using the official Grafana image, so it shouldn't be an application code issue there.
I mostly wanted to open this issue to check if there's a reason why the Ray Dashboard embeds each panel individually instead of just embedding the entire Default and Data dashboards. I'm happy to make a PR if this sounds like a good change. We can of course work around this by opening the dashboards in Grafana, but we really like the Ray Dashboard and want to centralize our user flows as much as possible there.
Versions / Dependencies
Ray 2.47.1
Python 3.10
Ubuntu 24.04
Grafana 12.0.1
Reproduction script
Just launch any Ray job and look at the metrics page in the dashboard
Issue Severity
Low: It annoys or frustrates me.