Skip to content

Crash during cluster autoscaling: Check failed: node_info_opt Spilling back to a node manager, but no GCS info found for node #16858

@ericl

Description

@ericl

Reproduction can be found in #16342:

python3.7 cart_with_tree.py -n 1 (8-cpu HEAD node, up to 10 16-CPU workers)

The good news is the object SIGSEGV seems to be solved, however now one in 5 times I see:

(raylet) [2021-07-02 23:25:15,081 C 168 168] cluster_task_manager.cc:939: Check failed: node_info_opt Spilling back to a node manager, but no GCS info found for node 22219da4a60d5ff689033487697e80fa435ca6de7a2a69c71f3dc69c
(raylet, ip=172.31.94.242) [2021-07-02 23:25:15,822 C 86 86] cluster_task_manager.cc:939: Check failed: node_info_opt Spilling back to a node manager, but no GCS info found for node 22219da4a60d5ff689033487697e80fa435ca6de7a2a69c71f3dc69c

cc @wuisawesome

Metadata

Metadata

Assignees

Labels

P0Issues that should be fixed in short orderrelease-blockerP0 Issue that blocks the release

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions