Skip to content

[dashboard] embed entire grafana dashboards in metrics page#56307

Closed
eric-higgins-ai wants to merge 2 commits intoray-project:masterfrom
eric-higgins-ai:master
Closed

[dashboard] embed entire grafana dashboards in metrics page#56307
eric-higgins-ai wants to merge 2 commits intoray-project:masterfrom
eric-higgins-ai:master

Conversation

@eric-higgins-ai
Copy link
Copy Markdown
Contributor

@eric-higgins-ai eric-higgins-ai commented Sep 6, 2025

Why are these changes needed?

This improves the performance and stability of the Metrics page on the Ray dashboard by embedding the entire Default and Data dashboards instead of individual graphs.

A few notes:

  • I noticed that the Serve page also embeds individual panels from Grafana. I left these as-is because they embed only part of the Serve dashboard, so we would need to split it up to continue supporting this, they only embed a few graphs (3-6) so the performance shouldn't be as bad, and because we don't use Ray Serve and it would take me a bit of time to set it up + test the changes. Let me know if you'd like me to change the Serve page too though and I can do that.
  • The UI is a little weird now because it's not possible to have the dashboard iframe shrink or grow as the user collapses/expands sections. I tried to make it intuitive by decreasing the width of the iframes (so it's clear whether you're scrolling in the iframe or in the page) and setting the height of the dashboards to be relatively low so that the scrollbar for the iframe is always visible. This is what it looks like
Screen.Recording.2025-09-06.at.2.54.36.PM.mov

Related issue number

closes #55499

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: eric-higgins-ai <erichiggins@applied.co>
@eric-higgins-ai eric-higgins-ai requested a review from a team as a code owner September 6, 2025 22:29
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Metrics page to embed entire Grafana dashboards instead of individual panels, which is a great change for improving performance and stability. The related code cleanup, such as removing the now-unused panel configurations and state management, is well-executed. The tests have also been updated accordingly.

I've found a couple of minor but important issues in the src URLs for the new iframes in Metrics.tsx. They use a potentially undefined grafanaOrgId and have a duplicated var-datasource parameter. I've left specific suggestions to fix these. Other than that, the changes look solid.

or ""
)
global_filters = global_filters_str.split(",")
if global_filters_str == "":
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found this bug while trying to generate the latest dashboards with collapsible sections. If this string is empty then some queries in the dashboard contain ,, in their filters, which is invalid. I can split this into a separate PR if that's better

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find

Signed-off-by: eric-higgins-ai <erichiggins@applied.co>
@ray-gardener ray-gardener bot added dashboard Issues specific to the Ray Dashboard core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Sep 7, 2025
@alanwguo
Copy link
Copy Markdown
Contributor

alanwguo commented Sep 8, 2025

Thanks!

A few UI suggestions:

  1. let's make the background of the page match the background of the iframe for grafana.
  2. instead of collapsible sections for each dashboard, we may want to add tabs. This will let us have one dashboard per "page" at a time, which should help with the double scroll problem. Double scroll windows can be very confusing.

@simran-2797, @scottsun94 , do you mind mocking up the UI? You can go with my suggestions or try something different.

@eric-higgins-ai , let me know if you aren't comfortable doing more UI changes, we can have someone take over the PR as well.

@eric-higgins-ai
Copy link
Copy Markdown
Contributor Author

@alanwguo I'm happy to make the UI changes, and it's not that urgent for us (we're just using Grafana directly right now) so waiting for a UI design isn't a big deal.

However, @kshanmol mentioned that they want this to be in as early a Ray release as possible, and I'm sure it will get done faster if you can handle it within Anyscale. If you'd rather have somebody else take it over for that reason then that's ok with me too.

@alanwguo
Copy link
Copy Markdown
Contributor

alanwguo commented Sep 8, 2025

@alanwguo I'm happy to make the UI changes, and it's not that urgent for us (we're just using Grafana directly right now) so waiting for a UI design isn't a big deal.

However, @kshanmol mentioned that they want this to be in as early a Ray release as possible, and I'm sure it will get done faster if you can handle it within Anyscale. If you'd rather have somebody else take it over for that reason then that's ok with me too.

thanks, I'll let you know as the designs come out.

in the meanwhile, can put the global_filters_str empty string fix as a separate PR? It's a nice fix and we can get that pushed through quickly.

@eric-higgins-ai
Copy link
Copy Markdown
Contributor Author

here's a PR for that @alanwguo #56346

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Sep 23, 2025
@edoakes edoakes added observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling and removed core Issues that should be addressed in Ray Core labels Sep 25, 2025
@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Sep 26, 2025
@kshanmol
Copy link
Copy Markdown
Contributor

kshanmol commented Oct 8, 2025

However, @kshanmol mentioned that they want this to be in as early a Ray release as possible, and I'm sure it will get done faster if you can handle it within Anyscale. If you'd rather have somebody else take it over for that reason then that's ok with me too.

Hi @eric-higgins-ai, just wanted to update you that we decided to go ahead with this change ourselves while iterating on the new designs. I will raise a fresh PR soon and link it here. After that we can close this one. Thank you for the effort though, learned a bunch from you here & the end result looks great!

@eric-higgins-ai
Copy link
Copy Markdown
Contributor Author

No problem at all @kshanmol , thanks for taking it over! I agree that it's likely faster for you to handle internally at Anyscale

The UI in your other PR looks great! I'll be looking forward to the change in an upcoming ray version 🎉

@alanwguo
Copy link
Copy Markdown
Contributor

Closing since #57561 is now merged.

Thanks @eric-higgins-ai for the idea and the initial implementation!

@alanwguo alanwguo closed this Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community dashboard Issues specific to the Ray Dashboard observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Dashboard] metrics page is extremely slow and unreliable

4 participants