devicemapper: Wait for device removal if deferredRemoval=true and deferredDeletion=…#33877
devicemapper: Wait for device removal if deferredRemoval=true and deferredDeletion=…#33877thaJeztah merged 1 commit intomoby:masterfrom
Conversation
There was a problem hiding this comment.
Can you make this
deferredRemove := devices.deferredRemove
if !devices.deferredDelete {
deferredRemove = false
}
which IMO is easier to read. Or can you have deferred delete without deferred removal.
There was a problem hiding this comment.
Makes sense. Will change it. Deferred deletion can't be enabled without deferred removal being on.
|
👍 This is what @vrothberg also suggested in #33846. |
There was a problem hiding this comment.
Also this can be one line too
return devices.deactivatedDeviceMode(info, devices.deferredRemove)
There was a problem hiding this comment.
Right. Will change.
|
LGTM (IANAM). |
|
@slp please give this a try and let me know if this fixes the issue you are facing or not. |
|
SGTM, but I'll wait for @cpuguy83 to have a look as well Thanks for the well-written commit message ❤️ |
|
@rhvgoyal Just tried your patch. Same environment as before. I'm getting frequent errors like this: Accompanied by these others in dmesg: The problem is that, on container exit, a RemoveDeviceDeferred order is issued. Then, while processing the container delete request (remember, has been started with "--rm"), a bunch of RemoveDevice calls are made by devices.removeDevice retry loop. In the deferred removal success, the synchronous removal will fail, returning ENXIO. Those are de debug logs frow dockerd, showing the two deactivateDevice calls for the same device: I must mention that retrying the devicemapper.DeleteDevice call in devices.deleteTransaction doesn't suffer from this issue. |
|
Aha! @slp thanks for confirming that the issues we've been seeing from production boxes are the same as the issues you've had, except that they are caused by not having deferred removal or deletion. It is a shame this patch doesn't fix the problem, but luckily we've been debugging this class of problem for a few days now. If you can apply #33845 and give the debug logs (
I'm not a maintainer, so I can't speak to what maintainers would prefer. But just because it works doesn't mean it's a proper fix to the problem, and I'm worried that patching the symptoms like this will make debugging future issues harder. |
Both changes are just patching the symptoms. The actual problem, as we discussed in the other PR, is that, while conainer A is trying to deactivate and remove a device, container B (accidentally) is keeping a temporary (is being dereferenced by Unmount) reference to a mount point from A's namespace, keeping the device busy. The only signifcant difference between both approaches is retrying RemoveDevice vs. retrying DeleteDevice. Why the latter worries you but the first one doesn't? |
|
Actually you're right, however I would argue this patch actually fixes a semantic issue in deferred removal and deletion. If you have deferred removal but not deferred deletion, then when you want to delete something it doesn't make sense to defer the removal and then do a blocking delete operation. I'm under the impression that this is one of the things that
I wish we didn't do any retrying, but as I said above it's possible that the retrying changes are orthogonal to this semantic issue. I have a feeling that the final "solution" is going to be retrying though, despite it's issues simply due to the complexity of solving this properly. |
Well, I guess that depends on each user's point of view. If I'm running dockerd with "dm.use_deferred_removal=true" and "dm.use_deferred_removal=false", I'd definitely expect it to always defer removals, while making deletions synchronously.
What do you mean by that? |
This patch doesn't change that removals will happen asynchronously, but if you have a delete operation that implies a removal it doesn't make sense to defer the removal because the delete will have to wait for the removal to finish before you can synchronously delete. I'm not sure I understand how a user could tell the difference between the two because both will synchronously finish.
From my understanding, if you have a deferred removal pending and then attempt to do a delete, the delete will fail because |
…false There have been some cases where umount, a device can be busy for a very short duration. Maybe its udev rules, or maybe it is runc related races or probably it is something else. We don't know yet. If deferred removal is enabled but deferred deletion is not, then for the case of "docker run -ti --rm fedora bash", a container will exit, device will be deferred removed and then immediately a call will come to delete the device. It is possible that deletion will fail if device was busy at that time. A device can't be deleted if it can't be removed/deactivated first. There is only one exception and that is when deferred deletion is on. In that case graph driver will keep track of deleted device and try to delete it later and return success to caller. Always make sure that device deactivation is synchronous when device is being deleted (except the case when deferred deletion is enabled). This should also take care of small races when device is busy for a short duration and it is being deleted. Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
|
@slp, ok, so it makes sense that if a device is already scheduled for deferred removal and then we are trying to remove it synchronously later, it will fail saying device does not exist (as long as device deferred removal succeeds). So I have updated the patch to not error out if device is already gone and parse Please give it a try now. |
|
@rhvgoyal Just tested in the same environment. Container deletion no longer fails, but messages like these are still being logged by kernel: In any case, this patch works around the main symptom, and can easily be backported to stable versions, so I'm happy with it. Thanks Vivek! |
|
@slp, kernel messages for failing to remove the device will continue to come as we are trying to remove device when it is busy. And also trying to remove it after it has been removed due to deferred removal. Glad that this patch finally fixes the issue you are seeing. Not sure why CI is not making any progress on this. @thaJeztah will you be able to restart CI for this job. |
|
I restarted CI, thanks for the ping |
|
03:26:46 re-exec error: exit status 1: output: BackupWrite [...] There is not enough space on the disk. I am pretty sure that's not caused by this commit. @thaJeztah, @rhvgoyal, another CI run? |
|
Right, this does not look related to my patch. @thaJeztah another restart? |
|
@cpuguy83 PTAL |
|
So this patch is basically saying either both deferred deletion and deferred removal must be enabled, or neither are enabled. The patch enforces the flip-side of that without erroring out, is that right? |
|
@cpuguy83 No, this patch is making sure that a user can run with deferred removal enabled but deferred deletion disabled. So graph driver will support following 3 combinations.
We don't support fourth possibility that is deferred removal disabled and deferred deletion enabled. And we already have a check for that. |
|
@cpuguy83 BTW, I have a question about Is it for the case where top level directory is not a mount point? If yes, we could probably check if directory being removed is a mount point or not and take action accordingly. |
|
@rhvgoyal It indeed should be enough. Originally it wasn't a detached unmount. It may be something we can look at changing. |
| // rather busy wait for device removal to take care of these cases. | ||
| deferredRemove := devices.deferredRemove | ||
| if !devices.deferredDelete { | ||
| deferredRemove = false |
There was a problem hiding this comment.
So here is what I was looking at.
We always disable deferred removal if deferred delete is disabled (this is where Remove() winds up).
There was a problem hiding this comment.
ok, yes. But we will do this only if container is being deleted. If container is just exiting/stopping but not going away, then deferred removal will still work.
|
@thaJeztah Can we restart the CI. I don't think this issue is due to my patch. I am hoping this PR can be merged soon now. |
|
can we merge this PR now. It has acks and build has passed. |
|
Thanks! |
…false
There have been some cases where umount, a device can be busy for a very
short duration. Maybe its udev rules, or maybe it is runc related races
or probably it is something else. We don't know yet.
If deferred removal is enabled but deferred deletion is not, then for the
case of "docker run -ti --rm fedora bash", a container will exit, device
will be deferred removed and then immediately a call will come to delete
the device. It is possible that deletion will fail if device was busy
at that time.
A device can't be deleted if it can't be removed/deactivated first. There
is only one exception and that is when deferred deletion is on. In that
case graph driver will keep track of deleted device and try to delete it
later and return success to caller.
Always make sure that device deactivation is synchronous when device is
being deleted (except the case when deferred deletion is enabled).
This should also take care of small races when device is busy for a short
duration and it is being deleted.
Signed-off-by: Vivek Goyal vgoyal@redhat.com