This issue supersedes #49064, which will be closed.
The node shutdown API should provide a safe way for operators to shutdown a node ensuring all relevant orchestration steps are taken to prevent cluster instability and data loss. The feature can be used to decommission, power cycle or upgrade nodes.
An example of marking a node as part of the shutdown:
PUT /_nodes/<node_id>/shutdown
{
"type": "remove",¹
"reason": "shutdown of node so we can remove it from the cluster"²
}
¹ The type of decommission, in this case either a "remove" (the node is never coming back) or a "restart"
² A user-enterable free text block description of the reason why the node is being shut down
And retrieving the shutdown status:
GET /_nodes/<node_id>/shutdown
{
"node": "data-node-1",
"node_id": "node-id-1",
"type": "remove",
"reason": "shutdown of node so we can remove it from the cluster"
"status": {¹
"shutdown_status": "IN_PROGRESS",²
"shard_migration": {
"status": "IN_PROGRESS",
"shard_migrations_remaining": 7,³
"time_started": "<user readable date>",
"time_started_millis": 234091892
},
"persistent_tasks": {⁴
"status": "IN_PROGRESS",
"tasks_remaining": 2,⁵
"error": "ICouldntStopTheTasksException[i can't do that dave]...etc stacktrack etc...",
"time_started": "<user readable date>",
"time_started_millis": 128391987
},
"plugins": {⁶
"status": "NOT_STARTED",
},
"data_loss_on_removal": false⁷
},
"time_since_shutdown": "1.2h",⁸
"time_since_shutdown_millis": 4320000,
"shutdown_started": "<user readable date>",9
"shutdown_started_millis": 128391987
}
1. Shows the current state of the shutdown for this node. This can be used by operators to track progress
2. Overall shutdown status. Possible values are: "IN_PROGRESS", "COMPLETE", "STALLED". IF the shutdown is STALLED a error field will also be returned containing the reason the shutdown is stalled (e.g. no nodes can take remaining shards)
3. How many shards remain to be migrated off of this node
4. Whether in progress persistent tasks have been halt and new tasks have been blocked
5. The number of tasks that need to be completed before shutdown
6. Whether plugins have indicated that they are ready for shutdown
7. Whether data loss could occur if the node was terminated now
8. How long the shutdown has been ongoing.
9. When the shutdown was initiated.
Here are some high-level tasks that need to be completed for this:
Phase 2:
This issue supersedes #49064, which will be closed.
The node shutdown API should provide a safe way for operators to shutdown a node ensuring all relevant orchestration steps are taken to prevent cluster instability and data loss. The feature can be used to decommission, power cycle or upgrade nodes.
An example of marking a node as part of the shutdown:
And retrieving the shutdown status:
Here are some high-level tasks that need to be completed for this:
ShutdownAwarePluginand stop its work while shutting downPhase 2: