Skip to content

Possible improvement for orderly shutdown #4248

@nik9000

Description

@nik9000

Last week I spent a few hours manually restarting nodes to upgrade to 0.90.7. Since we want to keep our configured level of redundancy at all non-emergency times I did it by using the shard allocation api to disallow shard from the node being shutdown, waiting until there were no indexes on the node, restarting it, then using the shard allocation api to allow shards back on the node, then waiting for the cluster to re-balance shards.

I wonder if this process could be automated beyond the bash/curl/awk mess that I've been using and if that might let the re-balance operation proceed more quickly. Would it be possible to:

  1. Ask the cluster to prepare a node for shutdown.
  2. The cluster will not allocate replicas to that node until it has finished shutting down.
  3. The cluster allocates an extra copy of all replicas that that node is hosting.
  4. Once this is done the node goes through the normal shutdown process.
  5. When the node comes back up it should rejoin the cluster and announce that replicas that it sill holds. At this point I think you can let the standard startup and re-balance logic take hold. The replicas that didn't change will stay on the restarted node and be removed from the node to which they were recently replicated. Those that did change will be removed from the restarted node and cluster re-balancing will balance the shards again.

I think this strikes a nice balance between the exclude._ip way of shutting down and the disable_allocation way of shutting down.

At some point it'd be nice if there were some kind of log that could be replayed against out of date replicas so they could recover even if changes had been made. I wonder if something could be synthesized from the _timestamp field....

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions