Issue Description
When using Sablier with a Docker socket proxy (rather than direct socket access), the application fails to properly handle connection interruptions. Specifically, if the socket proxy terminates a connection due to timeouts or other reasons, Sablier logs an error but continues to report itself as healthy while being unable to execute Docker operations.
Current Behavior
- Sablier establishes a connection to Docker via a socket proxy
- If the socket proxy times out or terminates the connection, Sablier logs:
ERR docker/docker.go:155 event stream error provider=docker error="unexpected EOF"
ERR docker/docker.go:148 event stream closed provider=docker
- Sablier's health check (
/health endpoint) continues to return a 200 status code with "OK"
- Subsequent Docker operations fail silently, and users must manually restart the Sablier container
Expected Behavior
- Sablier should attempt to reconnect to the Docker daemon when the connection is lost
- The health check should verify Docker connectivity, not just API responsiveness
- If reconnection fails after several attempts, Sablier should either:
- Update its health status to unhealthy
- Log a clear error that the Docker connection is permanently lost
- Automatically restart itself (if possible)
Reproduction Steps
- Configure Sablier to use a socket proxy with a timeout value (e.g.,
PROXY_READ_TIMEOUT=8000)
- Wait for the timeout to occur
- Observe Sablier logs showing "event stream error" and "event stream closed"
- Verify that
/health still returns 200 OK
- Attempt to start a container through Sablier, which will fail
Technical Details
This issue stems from how Sablier handles the Docker client connection in the provider implementation. In the Docker provider, there appear to be two key issues:
- Event stream resilience: The
NotifyInstanceStopped method establishes a connection to the Docker events API, but doesn't automatically reconnect if this connection is lost.
- Docker client reuse: The Docker client is created once during provider initialization and reused, but there's no mechanism to verify its connectivity or recreate it if it becomes invalid.
Proposed Solutions
-
Implement automatic reconnection to Docker daemon:
- Add a reconnection loop in the event stream handler
- Periodically verify Docker connectivity and recreate the client if needed
-
Enhance health check:
- Update the
/health endpoint to verify Docker connectivity
- Add a new
/health/docker endpoint specifically for Docker connectivity
-
Connection tracking:
- Track the most recent successful Docker operation
- If operations fail or too much time passes since the last successful operation, attempt to recreate the connection
Environment Information
- Sablier version: 1.8.4
- Docker version: 27.5.1
- Docker API version: 1.47
- Socket proxy: Used with
PROXY_READ_TIMEOUT=8000
Additional Context
This issue becomes particularly problematic in production environments where Sablier is a critical component for managing containers, as it requires manual intervention to recover from connection issues.
Issue Description
When using Sablier with a Docker socket proxy (rather than direct socket access), the application fails to properly handle connection interruptions. Specifically, if the socket proxy terminates a connection due to timeouts or other reasons, Sablier logs an error but continues to report itself as healthy while being unable to execute Docker operations.
Current Behavior
/healthendpoint) continues to return a 200 status code with "OK"Expected Behavior
Reproduction Steps
PROXY_READ_TIMEOUT=8000)/healthstill returns 200 OKTechnical Details
This issue stems from how Sablier handles the Docker client connection in the provider implementation. In the Docker provider, there appear to be two key issues:
NotifyInstanceStoppedmethod establishes a connection to the Docker events API, but doesn't automatically reconnect if this connection is lost.Proposed Solutions
Implement automatic reconnection to Docker daemon:
Enhance health check:
/healthendpoint to verify Docker connectivity/health/dockerendpoint specifically for Docker connectivityConnection tracking:
Environment Information
PROXY_READ_TIMEOUT=8000Additional Context
This issue becomes particularly problematic in production environments where Sablier is a critical component for managing containers, as it requires manual intervention to recover from connection issues.