When starting a cluster with elastic-package stack, if the host where docker is running has little disk space available, Elasticsearch will fail to allocate shards and Kibana will fail to reach a healthy state.
The thresholds for this kind of issues to happen are the low and high disk watermarks, they are 90% and 95% respectively by default.
Currently, elastic-package just waits till timeout for Kibana to reach a healthy state, and no helpful information is given to users when it finally fails, leading to issues like #838. When the command timeouts, this is the printed error now:
container for service "kibana" is unhealthy
Error: booting up the stack failed: running docker-compose failed: running command failed: running Docker Compose up command failed: exit status 1
This situation can be detected by checking the available disk space, or by checking Elasticsearch logs, looking for messages like high disk watermark [95%] exceeded.
The current workaround would be to free disk space in the docker host.
Proposed actions
- Detect when the watermarks are exceeded during boot up (parsing logs or checking cluster state), and fail immediately, providing an error message about the availability of free space in the host.
- Modify healthcheck in elasticsearch service to fail if the status is
red.
- An alternative could be to disable these thresholds (
cluster.routing.allocation.disk.threshold_enabled: false), and let Elasticsearch fail to start when there is no disk space. But this can be still misleading, specially in OSs where docker runs in a virtual machine.
When starting a cluster with
elastic-package stack, if the host where docker is running has little disk space available, Elasticsearch will fail to allocate shards and Kibana will fail to reach a healthy state.The thresholds for this kind of issues to happen are the low and high disk watermarks, they are 90% and 95% respectively by default.
Currently,
elastic-packagejust waits till timeout for Kibana to reach a healthy state, and no helpful information is given to users when it finally fails, leading to issues like #838. When the command timeouts, this is the printed error now:This situation can be detected by checking the available disk space, or by checking Elasticsearch logs, looking for messages like
high disk watermark [95%] exceeded.The current workaround would be to free disk space in the docker host.
Proposed actions
red.cluster.routing.allocation.disk.threshold_enabled: false), and let Elasticsearch fail to start when there is no disk space. But this can be still misleading, specially in OSs where docker runs in a virtual machine.