simplify container start logic, taking advantage of idempotency#99
simplify container start logic, taking advantage of idempotency#99thaJeztah wants to merge 1 commit intodocker:tocttou-cleanupfrom
Conversation
The start endpoint should be idempotent when trying to start a container that is already started (see [Daemon.containerStart][1], so we can take advantage of that. Once we either created the container, or received a "conflict" (name already in use), we can try starting the container. As there's a time window between a name being reserved and the container to exist (which may produce a 404 during that time window), we ignore "not found" errors, and retry the start. [1]: https://github.com/moby/moby/blob/5318877858c9fc94b5cccc3cf1a75d4ec46951b8/daemon/start.go#L76-L93 Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
| // Unfortunately the Docker API's /containers/{id}/wait API (and the | ||
| // corresponding Client.ContainerWait method) don't allow waiting for | ||
| // container startup, so instead we'll take a polling approach. |
There was a problem hiding this comment.
I actually forgot that this was not supported, and wondered "why?", because this could definitely be useful (probably?). I expected it would be a "quick fix" (just add additional conditions, or perhaps even to just accept "any" container state), but it looks like the conditions that are supported each have a dedicated channel for "waiters" https://github.com/moby/moby/blob/89aa33001e635c20707f6d4ed046590c22a24c4d/container/state.go#L193-L200
// Removal wakes up both removeOnlyWaiters and stopWaiters
// Container could be removed while still in "created" state
// in which case it is never actually stopped
if condition == container.WaitConditionRemoved {
s.removeOnlyWaiters = append(s.removeOnlyWaiters, waitC)
} else {
s.stopWaiters = append(s.stopWaiters, waitC)
}I don't think that should rule out the option to add a "Wait running" condition though, so perhaps it's worth opening a ticket in moby/moby.
Perhaps rewriting the logic to support "any" state (which could be a "map" for channels per state?) could even be an option after that, but I'd not be terribly upset if we'd add one more property there 😅
xenoscopic
left a comment
There was a problem hiding this comment.
I like this idea, I'm having some trouble with it getting it working. I have a branch here where I've combined #98 and this PR: main...container-start-cleanup
I had to add a small fixup and clean up some rebase issues, but the main problem that I'm having is that I'm receiving an EOF from ContainerStart:
error during connect: Post "http://<omitted>.sock/v1.48/containers/docker-model-runner/start": EOF
I'll play with it more on Monday, but let me know if that sparks any thoughts.
|
I've added another commit onto the tip of main...container-start-cleanup to ignore In any case if that branch looks okay to you, I'll replace #98 with it and close this PR. |
Yes, feel free to push to your PR branch - merging this PR should merge into your branch as well (it was opened against that branch), so merging it would make this commit show up there. This one is interesting though; if you're able to reproduce, might be worth capturing daemon logs to see if there's a bug somewhere, as that looks ... unexpected? |
|
Closing in favor of #107, which incorporates these commits. I couldn't figure out the source of the |
The start endpoint should be idempotent when trying to start a container that is already started (see Daemon.containerStart, so we can take advantage of that.
Once we either created the container, or received a "conflict" (name already in use), we can try starting the container. As there's a time window between a name being reserved and the container to exist (which may produce a 404 during that time window), we ignore "not found" errors, and retry the start.