-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[dashboard] When autoscaler adds/removes worker, dashboard crashes #10319
Copy link
Copy link
Closed
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tdashboardIssues specific to the Ray DashboardIssues specific to the Ray Dashboard
Description
What is the problem?
This is originally reported by Pieterjan for 0.8.7 but also appears to be in issue in the latest nightly. The issue would appear to occur across
Reproduction (REQUIRED)
This is not a script, but given I will author the fix, I think this is sufficient:
-
Start a cluster with autoscaling enabled
-
Increase workload so that an additional worker or workers are spawned
-
When one of those workers is removed, their log entries still exist on the backend
-
The following stack trace occurs and crashes the dashboard:
/ray/dashboard/node_stats.py", line 63, in _insert_log_counts self._node_stats[hostname]["log_count"] = logs_by_pid -
I have verified my script runs in a clean environment and reproduces the issue.
-
I have verified the issue also occurs with the latest wheels.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tdashboardIssues specific to the Ray DashboardIssues specific to the Ray Dashboard