Skip to content

[dashboard] When autoscaler adds/removes worker, dashboard crashes #10319

@mfitton

Description

@mfitton

What is the problem?

This is originally reported by Pieterjan for 0.8.7 but also appears to be in issue in the latest nightly. The issue would appear to occur across

Reproduction (REQUIRED)

This is not a script, but given I will author the fix, I think this is sufficient:

  • Start a cluster with autoscaling enabled

  • Increase workload so that an additional worker or workers are spawned

  • When one of those workers is removed, their log entries still exist on the backend

  • The following stack trace occurs and crashes the dashboard: /ray/dashboard/node_stats.py", line 63, in _insert_log_counts self._node_stats[hostname]["log_count"] = logs_by_pid

  • I have verified my script runs in a clean environment and reproduces the issue.

  • I have verified the issue also occurs with the latest wheels.

Metadata

Metadata

Assignees

Labels

bugSomething that is supposed to be working; but isn'tdashboardIssues specific to the Ray Dashboard

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions