Skip to content

Fix: a failed Start() breaks --volumes-from on subsequent Start()'s#8726

Merged
tiborvass merged 1 commit intomoby:masterfrom
krallin:failed-start-breaks-volumes-from
Nov 11, 2014
Merged

Fix: a failed Start() breaks --volumes-from on subsequent Start()'s#8726
tiborvass merged 1 commit intomoby:masterfrom
krallin:failed-start-breaks-volumes-from

Conversation

@krallin
Copy link
Copy Markdown
Contributor

@krallin krallin commented Oct 23, 2014

When starting a container with --volumes-from, if one volume is unavailable (e.g. the container does not exist yet), then subsequent starts (i.e. even once the volume has been created) will ignore the container that failed during the first Start().

Example:

  • Create and start data_1
  • Create and start client, with --volumes-from data_1 --volumes-from data_2. This fails because data_2 does not exist.
  • Create and start data_2
  • Start data_1 again. This passes, but data_2's volumes aren't mounted (note that although I haven't tested it, but that I actually think you get the same behavior even if you don't create data_2).

This seems to be due to container.Volumes being left in an inconsistent state if one of the containers in hostConfig.VolumesFrom is unavailable.

I included a failing test that demonstrates this issue (in a separate commit — running it without the fix will make the issue apparent).

NOTE: I'm not really a Go person. I'm happy to make edits to the provided PR, but a bit of guidance will go a long way ;)

@krallin krallin changed the title Fix: a failed Start() breaks --volumes-from on subsequent starts Fix: a failed Start() breaks --volumes-from on subsequent Start()'s Oct 23, 2014
@cpuguy83
Copy link
Copy Markdown
Member

This is specifically because of https://github.com/docker/docker/blob/master/daemon/volumes.go#L41

And I'm not sure this should be "fixed" since effectively you've created something with an error and would be better to remove and create again with the error resolved.

Also, tests of this nature should be in integration-cli.

@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Oct 23, 2014

It is indeed due to this line that container.Volumes isn't re-created the next time.

However, I'm not sure I agree with the fact that I "created something with an error": the error appeared when the container was started, not when it was created (though maybe this wasn't what you meant :) ).

In fact, if I did:

  • Create and start data_1
  • Create client, with --volumes-from data_1 --volumes-from data_2. not exist.
  • Create and start data_2
  • Start client

Then everything would work (I actually have a "control_container" in the test code that does exactly this).

So, it's specifically the Start() step that "broke" the container. However, the key problem is that Docker did not inform me that my container was now "broken".

So, you're right, I can see two ways to solve the issue:

  • Have the user re-create the container. But then I think Docker should somehow inform the user (possibly even at the Create() step) — if only by failing on anything subsequent Start() operation with an explicit error.
  • Restore the container to a consistent state

Note: this error happened to me beneath a couple layers of automation, so it wasn't immediately obvious (at all) that I had "created" something with an error.


I went with the second option of properly restoring the container in a consistent state was due to this comment, which seems to indicate that if Start() fails, the expectation is to cleanup:

https://github.com/docker/docker/blob/v1.3.0/daemon/container.go#L296-L302

Incidentally, it seems that the behavior with e.g. network links is indeed to cleanup and not place the container in an inconsistent state.


I'm certainly OK with exploring the first option, but failing pretty silently doesn't seem ideal?

Cheers,

@cpuguy83
Copy link
Copy Markdown
Member

If we move volume initialization to container create instead of start (which now that there is a docker create CLI command I've seen people complain about), then I think that will take care of the issue.

I'll have to check with the core team if that's the way to go.

@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Oct 23, 2014

It certainly would, yes (provided Docker would throw an error in the Create() call and abort everything if one "data container" is unavailable, but I expect it would ;) )

For the record, in the test code, I do pass the volume configuration in the Create() call, not in the Start() call (though, indeed, actual initialization only happens with Start()).

Cheers,

@cpuguy83
Copy link
Copy Markdown
Member

Actually, this is a bigger PITA than it seems.
Since volumes-from can be specified on start (see container start API), we need the volumes-from stuff to be parsed on start (it can be on create to of course, but must also do on start). But if, for instance, a volumes-from entry for a container that no longer exists will cause it to fail, even if the volume was already applied.

It can be worked around (and passing this option on start is being deprecated, I think), I'm hesitant to modify the existing code before #8484 is applied.

@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Oct 23, 2014

I guess this becomes a problem if a given container is started multiple times, then?

However, I'm not sure whether the proposed patch here negatively affects this? (it should pretty much do the exact same thing the old one did, except for the one edge case described here, which it seeks to address)

For what it's worth, it seems that #8484 applies cleanly on this one.

Just to give you some perspective on why I'm proposing this change here:

The reason why I'd like for Docker to be consistent here is so I don't have to check whether a data container exists before creating / starting a container that uses it. If the operation fails, I can retry it after having created the missing data container (i.e. ask for forgiveness, rather than permission).

This means I don't have to maintain my own view (i.e. in my API client code) of which containers exist, and which don't (since if I maintain my own data, it might become inconsistent with Docker's).

Cheers,

@cpuguy83
Copy link
Copy Markdown
Member

I don't see how this fixes the issue, though.
applyVolumesFrom will only get run if there are no volumes present in container.Volumes.

@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Oct 23, 2014

Which issue are you referring to? The use case I was describing, or the one you were raising about a volume no longer existing when starting?

@cpuguy83
Copy link
Copy Markdown
Member

This issue described here. The subsequent starts will still ignore that container.

@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Oct 23, 2014

The patch proposed here actually ensures that container.Volumes does stay empty in case container.Start() fails due to an error when processing VolumesFrom.

So, next time you call container.Start(), applyVolumesFrom runs again (and if the dependent container still doesn't exist, then it raises an error again).

@cpuguy83
Copy link
Copy Markdown
Member

Ooooooh, ok... I see now. 💡 If there is an error with any of the parsing it will bail, and it doesn't try to initialize anything that may have succeeded.

@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Oct 23, 2014

Exactly : )

Note that this isn't perfect, just an incremental improvement and a bunch of tests.

Namely, we may still end up with something halfway initialized in case this fails:

https://github.com/krallin/docker/blob/failed-start-breaks-volumes-from/daemon/volumes.go#L85-L88

Happy to look into a test case and a fix for that, though it looks like it's a different problem altogether. Fixing it might also require a more involved refactor, too, which could make applying that other patch you were mentioning more difficult.

@cpuguy83
Copy link
Copy Markdown
Member

Testing this now. Can you move the integration test to integration-cli and squash the commits?

@krallin krallin force-pushed the failed-start-breaks-volumes-from branch from 8c433c9 to 752cc81 Compare October 23, 2014 16:33
@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Oct 23, 2014

Squashed the commits and converted the test to a CLI one (does make it simpler!).

Waiting for the tests to finish running now. Also validated that the test does fail without the fix.

Cheers,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called TestStartVolumesFromFailsCleanly

@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Oct 23, 2014

Updated both

@krallin krallin force-pushed the failed-start-breaks-volumes-from branch 3 times, most recently from ff47d72 to f79fee2 Compare October 23, 2014 17:48
@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Oct 23, 2014

Looking back at this it does look like the cmd() function does its own error testing:

https://github.com/krallin/docker/blob/failed-start-breaks-volumes-from/integration-cli/docker_utils.go#L338-L348

Let me know if I'm misreading this / you'd like me to keep the checks, though.

Cheers,

@cpuguy83
Copy link
Copy Markdown
Member

ah, you're right, sorry :(

@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Oct 23, 2014

Will push a new version momentarily : )

@krallin krallin force-pushed the failed-start-breaks-volumes-from branch from f79fee2 to de8014b Compare October 23, 2014 17:58
@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Oct 23, 2014

And here goes : )

@cpuguy83
Copy link
Copy Markdown
Member

LGTM

@jessfraz
Copy link
Copy Markdown
Contributor

@cpuguy83 does this fix the issue or do your initial conflicts with the fix still remain, I know you said "looks good to me" but I just want to make sure the your initial hesitancy was addressed because I wasn't sure from reading the discussion :)

@cpuguy83
Copy link
Copy Markdown
Member

@jfrazelle Yep, it fixes the issue at hand just perfectly.
I am really bad at reading diffs sometimes :)

Thanks for checking!

@jessfraz
Copy link
Copy Markdown
Contributor

LGTM

@krallin krallin force-pushed the failed-start-breaks-volumes-from branch from de8014b to 1275699 Compare October 24, 2014 22:00
@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Oct 24, 2014

Hey there,

A merge conflict came up due to new test code being added in fb6ee86

I just pushed a new rebased version that resolves the conflict (the "main" code applied cleanly; only the test code had to be resolved).

Cheers,

@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Oct 29, 2014

Hey there,

Just a quick ping to check if there was anything I could do to help move this PR forward? : )

Cheers,

@jessfraz
Copy link
Copy Markdown
Contributor

ping @tiborvass @unclejack

@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Nov 10, 2014

Hi there,

Just a quick ping to check whether was a chance for this to eventually get merged in?

Thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krallin does this work with -v /foo as well? If so, i'd prefer using that form. If not, please use ioutil.TempDir() and defer os.RemoveAll(tmpdir), and bindmount a folder inside tmpdir.

Running parseVolumesFromSpec on all VolumesFrom specs before initialize
any mounts endures that we don't leave container.Volumes in an
inconsistent (partially initialized) if one of out mount groups is not
available (e.g. the container we're trying to mount from does not
exist).

Keeping container.Volumes in a consistent state ensures that next time
we Start() the container, it'll run prepareVolumes() again.

The attached test demonstrates that when a container fails to start due
to a missing container specified in VolumesFrom, it "remembers" a Volume
that worked.

Fixes: moby#8726

Signed-off-by: Thomas Orozco <thomas@orozco.fr>
@krallin krallin force-pushed the failed-start-breaks-volumes-from branch from 1275699 to fb62e18 Compare November 10, 2014 16:32
@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Nov 10, 2014

@tiborvass — I've made the change suggested in your inline comment.

The change doesn't affect the tests: they still fail without the companion change in daemon/volumes.go, and still pass when it's applied.

I pushed a new version.

Cheers,

@tiborvass
Copy link
Copy Markdown
Contributor

LGTM

tiborvass added a commit that referenced this pull request Nov 11, 2014
Fix: a failed Start() breaks --volumes-from on subsequent Start()'s
@tiborvass tiborvass merged commit 2a517fe into moby:master Nov 11, 2014
@krallin
Copy link
Copy Markdown
Contributor Author

krallin commented Nov 11, 2014

Thanks!

@krallin krallin deleted the failed-start-breaks-volumes-from branch November 11, 2014 16:01
yoheiueda pushed a commit to yoheiueda/docker that referenced this pull request Nov 21, 2014
Running parseVolumesFromSpec on all VolumesFrom specs before initialize
any mounts endures that we don't leave container.Volumes in an
inconsistent (partially initialized) if one of out mount groups is not
available (e.g. the container we're trying to mount from does not
exist).

Keeping container.Volumes in a consistent state ensures that next time
we Start() the container, it'll run prepareVolumes() again.

The attached test demonstrates that when a container fails to start due
to a missing container specified in VolumesFrom, it "remembers" a Volume
that worked.

Fixes: moby#8726

Signed-off-by: Thomas Orozco <thomas@orozco.fr>
tiborvass pushed a commit to tiborvass/docker that referenced this pull request Nov 24, 2014
Running parseVolumesFromSpec on all VolumesFrom specs before initialize
any mounts endures that we don't leave container.Volumes in an
inconsistent (partially initialized) if one of out mount groups is not
available (e.g. the container we're trying to mount from does not
exist).

Keeping container.Volumes in a consistent state ensures that next time
we Start() the container, it'll run prepareVolumes() again.

The attached test demonstrates that when a container fails to start due
to a missing container specified in VolumesFrom, it "remembers" a Volume
that worked.

Fixes: moby#8726

Signed-off-by: Thomas Orozco <thomas@orozco.fr>

Conflicts:
	integration-cli/docker_cli_start_test.go
		cli integration test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants