Skip to content
This repository was archived by the owner on Oct 6, 2025. It is now read-only.

simplify container start logic, taking advantage of idempotency#99

Closed
thaJeztah wants to merge 1 commit intodocker:tocttou-cleanupfrom
thaJeztah:tocttou_cleanup_suggestion
Closed

simplify container start logic, taking advantage of idempotency#99
thaJeztah wants to merge 1 commit intodocker:tocttou-cleanupfrom
thaJeztah:tocttou_cleanup_suggestion

Conversation

@thaJeztah
Copy link
Member

The start endpoint should be idempotent when trying to start a container that is already started (see Daemon.containerStart, so we can take advantage of that.

Once we either created the container, or received a "conflict" (name already in use), we can try starting the container. As there's a time window between a name being reserved and the container to exist (which may produce a 404 during that time window), we ignore "not found" errors, and retry the start.

The start endpoint should be idempotent when trying to start a container
that is already started (see [Daemon.containerStart][1], so we can take
advantage of that.

Once we either created the container, or received a "conflict" (name
already in use), we can try starting the container. As there's a time
window between a name being reserved and the container to exist (which
may produce a 404 during that time window), we ignore "not found" errors,
and retry the start.

[1]: https://github.com/moby/moby/blob/5318877858c9fc94b5cccc3cf1a75d4ec46951b8/daemon/start.go#L76-L93

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Comment on lines -74 to -76
// Unfortunately the Docker API's /containers/{id}/wait API (and the
// corresponding Client.ContainerWait method) don't allow waiting for
// container startup, so instead we'll take a polling approach.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually forgot that this was not supported, and wondered "why?", because this could definitely be useful (probably?). I expected it would be a "quick fix" (just add additional conditions, or perhaps even to just accept "any" container state), but it looks like the conditions that are supported each have a dedicated channel for "waiters" https://github.com/moby/moby/blob/89aa33001e635c20707f6d4ed046590c22a24c4d/container/state.go#L193-L200

	// Removal wakes up both removeOnlyWaiters and stopWaiters
	// Container could be removed while still in "created" state
	// in which case it is never actually stopped
	if condition == container.WaitConditionRemoved {
		s.removeOnlyWaiters = append(s.removeOnlyWaiters, waitC)
	} else {
		s.stopWaiters = append(s.stopWaiters, waitC)
	}

I don't think that should rule out the option to add a "Wait running" condition though, so perhaps it's worth opening a ticket in moby/moby.

Perhaps rewriting the logic to support "any" state (which could be a "map" for channels per state?) could even be an option after that, but I'd not be terribly upset if we'd add one more property there 😅

Copy link
Collaborator

@doringeman doringeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think it's fine to remove the container status check as we're already using waitForStandaloneRunnerAfterInstall.

Copy link
Contributor

@xenoscopic xenoscopic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, I'm having some trouble with it getting it working. I have a branch here where I've combined #98 and this PR: main...container-start-cleanup

I had to add a small fixup and clean up some rebase issues, but the main problem that I'm having is that I'm receiving an EOF from ContainerStart:

error during connect: Post "http://<omitted>.sock/v1.48/containers/docker-model-runner/start": EOF

I'll play with it more on Monday, but let me know if that sparks any thoughts.

@xenoscopic
Copy link
Contributor

I've added another commit onto the tip of main...container-start-cleanup to ignore io.EOFs from ContainerStart and that seems to work fine. I don't see any obvious codepaths in the Moby server that could lead to an io.EOF being returned from the call, maybe it's something specific to the Cloud forwarding.

In any case if that branch looks okay to you, I'll replace #98 with it and close this PR.

Copy link

@p1-0tr p1-0tr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member Author

I've added another commit onto the tip of main...container-start-cleanup to ignore io.EOFs from ContainerStart and that seems to work fine. I don't see any obvious codepaths in the Moby server that could lead to an io.EOF being returned from the call, maybe it's something specific to the Cloud forwarding.

In any case if that branch looks okay to you, I'll replace #98 with it and close this PR.

Yes, feel free to push to your PR branch - merging this PR should merge into your branch as well (it was opened against that branch), so merging it would make this commit show up there.

This one is interesting though; if you're able to reproduce, might be worth capturing daemon logs to see if there's a bug somewhere, as that looks ... unexpected?


		// For some reason, this error case can also manifest as an EOF on the
		// request (I'm not sure where this arises in the Moby server), so we'll
		// let that pass silently too.
		if !(errdefs.IsNotFound(err) || errors.Is(err, io.EOF)) {

@xenoscopic
Copy link
Contributor

Closing in favor of #107, which incorporates these commits.

I couldn't figure out the source of the io.EOF after a significant amount of digging, but I assume it's probably just a deconfliction / idempotentcy measure taking effect. It doesn't seem to affect the functionality.

@xenoscopic xenoscopic closed this Jun 26, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants