Fix: a failed Start() breaks --volumes-from on subsequent Start()'s#8726
Conversation
|
This is specifically because of https://github.com/docker/docker/blob/master/daemon/volumes.go#L41 And I'm not sure this should be "fixed" since effectively you've created something with an error and would be better to remove and create again with the error resolved. Also, tests of this nature should be in integration-cli. |
|
It is indeed due to this line that However, I'm not sure I agree with the fact that I "created something with an error": the error appeared when the container was started, not when it was created (though maybe this wasn't what you meant :) ). In fact, if I did:
Then everything would work (I actually have a "control_container" in the test code that does exactly this). So, it's specifically the Start() step that "broke" the container. However, the key problem is that Docker did not inform me that my container was now "broken". So, you're right, I can see two ways to solve the issue:
Note: this error happened to me beneath a couple layers of automation, so it wasn't immediately obvious (at all) that I had "created" something with an error. I went with the second option of properly restoring the container in a consistent state was due to this comment, which seems to indicate that if Start() fails, the expectation is to cleanup: https://github.com/docker/docker/blob/v1.3.0/daemon/container.go#L296-L302 Incidentally, it seems that the behavior with e.g. network links is indeed to cleanup and not place the container in an inconsistent state. I'm certainly OK with exploring the first option, but failing pretty silently doesn't seem ideal? Cheers, |
|
If we move volume initialization to container create instead of start (which now that there is a I'll have to check with the core team if that's the way to go. |
|
It certainly would, yes (provided Docker would throw an error in the Create() call and abort everything if one "data container" is unavailable, but I expect it would ;) ) For the record, in the test code, I do pass the volume configuration in the Create() call, not in the Start() call (though, indeed, actual initialization only happens with Start()). Cheers, |
|
Actually, this is a bigger PITA than it seems. It can be worked around (and passing this option on start is being deprecated, I think), I'm hesitant to modify the existing code before #8484 is applied. |
|
I guess this becomes a problem if a given container is started multiple times, then? However, I'm not sure whether the proposed patch here negatively affects this? (it should pretty much do the exact same thing the old one did, except for the one edge case described here, which it seeks to address) For what it's worth, it seems that #8484 applies cleanly on this one. Just to give you some perspective on why I'm proposing this change here: The reason why I'd like for Docker to be consistent here is so I don't have to check whether a data container exists before creating / starting a container that uses it. If the operation fails, I can retry it after having created the missing data container (i.e. ask for forgiveness, rather than permission). This means I don't have to maintain my own view (i.e. in my API client code) of which containers exist, and which don't (since if I maintain my own data, it might become inconsistent with Docker's). Cheers, |
|
I don't see how this fixes the issue, though. |
|
Which issue are you referring to? The use case I was describing, or the one you were raising about a volume no longer existing when starting? |
|
This issue described here. The subsequent starts will still ignore that container. |
|
The patch proposed here actually ensures that So, next time you call |
|
Ooooooh, ok... I see now. 💡 If there is an error with any of the parsing it will bail, and it doesn't try to initialize anything that may have succeeded. |
|
Exactly : ) Note that this isn't perfect, just an incremental improvement and a bunch of tests. Namely, we may still end up with something halfway initialized in case this fails: https://github.com/krallin/docker/blob/failed-start-breaks-volumes-from/daemon/volumes.go#L85-L88 Happy to look into a test case and a fix for that, though it looks like it's a different problem altogether. Fixing it might also require a more involved refactor, too, which could make applying that other patch you were mentioning more difficult. |
|
Testing this now. Can you move the integration test to integration-cli and squash the commits? |
8c433c9 to
752cc81
Compare
|
Squashed the commits and converted the test to a CLI one (does make it simpler!). Waiting for the tests to finish running now. Also validated that the test does fail without the fix. Cheers, |
There was a problem hiding this comment.
This should be called TestStartVolumesFromFailsCleanly
|
Updated both |
ff47d72 to
f79fee2
Compare
|
Looking back at this it does look like the cmd() function does its own error testing: Let me know if I'm misreading this / you'd like me to keep the checks, though. Cheers, |
|
ah, you're right, sorry :( |
|
Will push a new version momentarily : ) |
f79fee2 to
de8014b
Compare
|
And here goes : ) |
|
LGTM |
|
@cpuguy83 does this fix the issue or do your initial conflicts with the fix still remain, I know you said "looks good to me" but I just want to make sure the your initial hesitancy was addressed because I wasn't sure from reading the discussion :) |
|
@jfrazelle Yep, it fixes the issue at hand just perfectly. Thanks for checking! |
|
LGTM |
de8014b to
1275699
Compare
|
Hey there, A merge conflict came up due to new test code being added in fb6ee86 I just pushed a new rebased version that resolves the conflict (the "main" code applied cleanly; only the test code had to be resolved). Cheers, |
|
Hey there, Just a quick ping to check if there was anything I could do to help move this PR forward? : ) Cheers, |
|
ping @tiborvass @unclejack |
|
Hi there, Just a quick ping to check whether was a chance for this to eventually get merged in? Thanks! |
There was a problem hiding this comment.
@krallin does this work with -v /foo as well? If so, i'd prefer using that form. If not, please use ioutil.TempDir() and defer os.RemoveAll(tmpdir), and bindmount a folder inside tmpdir.
Running parseVolumesFromSpec on all VolumesFrom specs before initialize any mounts endures that we don't leave container.Volumes in an inconsistent (partially initialized) if one of out mount groups is not available (e.g. the container we're trying to mount from does not exist). Keeping container.Volumes in a consistent state ensures that next time we Start() the container, it'll run prepareVolumes() again. The attached test demonstrates that when a container fails to start due to a missing container specified in VolumesFrom, it "remembers" a Volume that worked. Fixes: moby#8726 Signed-off-by: Thomas Orozco <thomas@orozco.fr>
1275699 to
fb62e18
Compare
|
@tiborvass — I've made the change suggested in your inline comment. The change doesn't affect the tests: they still fail without the companion change in I pushed a new version. Cheers, |
|
LGTM |
Fix: a failed Start() breaks --volumes-from on subsequent Start()'s
|
Thanks! |
Running parseVolumesFromSpec on all VolumesFrom specs before initialize any mounts endures that we don't leave container.Volumes in an inconsistent (partially initialized) if one of out mount groups is not available (e.g. the container we're trying to mount from does not exist). Keeping container.Volumes in a consistent state ensures that next time we Start() the container, it'll run prepareVolumes() again. The attached test demonstrates that when a container fails to start due to a missing container specified in VolumesFrom, it "remembers" a Volume that worked. Fixes: moby#8726 Signed-off-by: Thomas Orozco <thomas@orozco.fr>
Running parseVolumesFromSpec on all VolumesFrom specs before initialize any mounts endures that we don't leave container.Volumes in an inconsistent (partially initialized) if one of out mount groups is not available (e.g. the container we're trying to mount from does not exist). Keeping container.Volumes in a consistent state ensures that next time we Start() the container, it'll run prepareVolumes() again. The attached test demonstrates that when a container fails to start due to a missing container specified in VolumesFrom, it "remembers" a Volume that worked. Fixes: moby#8726 Signed-off-by: Thomas Orozco <thomas@orozco.fr> Conflicts: integration-cli/docker_cli_start_test.go cli integration test
When starting a container with --volumes-from, if one volume is unavailable (e.g. the container does not exist yet), then subsequent starts (i.e. even once the volume has been created) will ignore the container that failed during the first Start().
Example:
data_1client, with--volumes-from data_1 --volumes-from data_2. This fails becausedata_2does not exist.data_2data_1again. This passes, butdata_2's volumes aren't mounted (note that although I haven't tested it, but that I actually think you get the same behavior even if you don't createdata_2).This seems to be due to
container.Volumesbeing left in an inconsistent state if one of the containers inhostConfig.VolumesFromis unavailable.I included a failing test that demonstrates this issue (in a separate commit — running it without the fix will make the issue apparent).
NOTE: I'm not really a Go person. I'm happy to make edits to the provided PR, but a bit of guidance will go a long way ;)