devicemapper: Wait for device removal if deferredRemoval=true and deferredDeletion=… by rhvgoyal · Pull Request #33877 · moby/moby

rhvgoyal · 2017-06-29T14:46:13Z

…false

There have been some cases where umount, a device can be busy for a very
short duration. Maybe its udev rules, or maybe it is runc related races
or probably it is something else. We don't know yet.

If deferred removal is enabled but deferred deletion is not, then for the
case of "docker run -ti --rm fedora bash", a container will exit, device
will be deferred removed and then immediately a call will come to delete
the device. It is possible that deletion will fail if device was busy
at that time.

A device can't be deleted if it can't be removed/deactivated first. There
is only one exception and that is when deferred deletion is on. In that
case graph driver will keep track of deleted device and try to delete it
later and return success to caller.

Always make sure that device deactivation is synchronous when device is
being deleted (except the case when deferred deletion is enabled).

This should also take care of small races when device is busy for a short
duration and it is being deleted.

Signed-off-by: Vivek Goyal vgoyal@redhat.com

rhvgoyal · 2017-06-29T14:47:45Z

cc @slp @cyphar

rhvgoyal · 2017-06-29T14:48:47Z

cc @thaJeztah @cpuguy83

cyphar · 2017-06-29T14:48:53Z

daemon/graphdriver/devmapper/deviceset.go

Can you make this

deferredRemove := devices.deferredRemove if !devices.deferredDelete { deferredRemove = false }

which IMO is easier to read. Or can you have deferred delete without deferred removal.

Makes sense. Will change it. Deferred deletion can't be enabled without deferred removal being on.

cyphar · 2017-06-29T14:49:18Z

👍 This is what @vrothberg also suggested in #33846.

cyphar · 2017-06-29T14:54:09Z

daemon/graphdriver/devmapper/deviceset.go

Also this can be one line too

return devices.deactivatedDeviceMode(info, devices.deferredRemove)

Right. Will change.

cyphar · 2017-06-29T15:00:14Z

LGTM (IANAM).

rhvgoyal · 2017-06-29T15:01:05Z

@slp please give this a try and let me know if this fixes the issue you are facing or not.

thaJeztah · 2017-06-29T20:52:54Z

SGTM, but I'll wait for @cpuguy83 to have a look as well

Thanks for the well-written commit message ❤️

slp · 2017-06-30T11:03:41Z

@rhvgoyal Just tried your patch. Same environment as before. I'm getting frequent errors like this:

Error response from daemon: driver "devicemapper" failed to remove root filesystem for 32857d293a7af65cbe69ccbbfc869d2cc94592178bf7d04aa1514d2960537b76: failed to remove device 0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4: devicemapper: Error running RemoveDevice dm_task_run failed

Accompanied by these others in dmesg:

[260515.276364] device-mapper: ioctl: unable to remove open device docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4
[260515.344597] XFS (dm-5): Unmounting Filesystem
[260515.376775] device-mapper: ioctl: unable to remove open device docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4
[260515.477380] device-mapper: ioctl: device doesn't appear to be in the dev hash table.

The problem is that, on container exit, a RemoveDeviceDeferred order is issued. Then, while processing the container delete request (remember, has been started with "--rm"), a bunch of RemoveDevice calls are made by devices.removeDevice retry loop. In the deferred removal success, the synchronous removal will fail, returning ENXIO.

Those are de debug logs frow dockerd, showing the two deactivateDevice calls for the same device:

DEBU[0173] devmapper: deactivateDevice START(0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devicemapper: RemoveDeviceDeferred START(docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devicemapper: RemoveDeviceDeferred END(docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devmapper: deactivateDevice END(0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devmapper: UnmountDevice END(hash=0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] Calling GET /v1.24/containers/32857d293a7af65cbe69ccbbfc869d2cc94592178bf7d04aa1514d2960537b76/json 
DEBU[0173] Calling DELETE /v1.24/containers/32857d293a7af65cbe69ccbbfc869d2cc94592178bf7d04aa1514d2960537b76?force=1&v=1 
DEBU[0173] devmapper: DeleteDevice START(hash=0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4 syncDelete=false) 
DEBU[0173] devmapper: deactivateDevice START(0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] devmapper: removeDevice START(docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0173] sandbox set key processing took 79.304296ms for container bd8784ba446aa43a0bd35adf62ee412c34f473766de9250fe9be0f15669ed049 
DEBU[0174] devmapper: removeDevice END(docker-253:0-17074721-0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0174] devmapper: deactivateDevice END(0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4) 
DEBU[0174] devmapper: Error deactivating device: devicemapper: Error running RemoveDevice dm_task_run failed 
DEBU[0174] devmapper: DeleteDevice END(hash=0d51051065a81924c0a8c2b1fd291cf480be54510d699164f1370dfe1c737af4 syncDelete=false)

I must mention that retrying the devicemapper.DeleteDevice call in devices.deleteTransaction doesn't suffer from this issue.

cyphar · 2017-06-30T11:13:49Z

Aha! @slp thanks for confirming that the issues we've been seeing from production boxes are the same as the issues you've had, except that they are caused by not having deferred removal or deletion. It is a shame this patch doesn't fix the problem, but luckily we've been debugging this class of problem for a few days now.

If you can apply #33845 and give the debug logs (--storage-opt dm.libdm_log_level=7 --debug) that would be incredibly helpful.

I must mention that retrying the devicemapper.DeleteDevice call in devices.deleteTransaction doesn't suffer from this issue.

I'm not a maintainer, so I can't speak to what maintainers would prefer. But just because it works doesn't mean it's a proper fix to the problem, and I'm worried that patching the symptoms like this will make debugging future issues harder.

slp · 2017-06-30T11:31:29Z

@cyphar

I'm not a maintainer, so I can't speak to what maintainers would prefer. But just because it works doesn't mean it's a proper fix to the problem, and I'm worried that patching the symptoms like this will make debugging future issues harder.

Both changes are just patching the symptoms. The actual problem, as we discussed in the other PR, is that, while conainer A is trying to deactivate and remove a device, container B (accidentally) is keeping a temporary (is being dereferenced by Unmount) reference to a mount point from A's namespace, keeping the device busy.

The only signifcant difference between both approaches is retrying RemoveDevice vs. retrying DeleteDevice. Why the latter worries you but the first one doesn't?

cyphar · 2017-06-30T11:37:10Z

@slp

Actually you're right, however I would argue this patch actually fixes a semantic issue in deferred removal and deletion. If you have deferred removal but not deferred deletion, then when you want to delete something it doesn't make sense to defer the removal and then do a blocking delete operation. I'm under the impression that this is one of the things that libdm doesn't like.

The only signifcant difference between both approaches is retrying RemoveDevice vs. retrying DeleteDevice. Why the latter worries you but the first one doesn't?

I wish we didn't do any retrying, but as I said above it's possible that the retrying changes are orthogonal to this semantic issue. I have a feeling that the final "solution" is going to be retrying though, despite it's issues simply due to the complexity of solving this properly.

slp · 2017-06-30T12:29:39Z

@cyphar

Actually you're right, however I would argue this patch actually fixes a semantic issue in deferred removal and deletion. If you have deferred removal but not deferred deletion, then when you want to delete something it doesn't make sense to defer the removal and then do a blocking delete operation.

Well, I guess that depends on each user's point of view. If I'm running dockerd with "dm.use_deferred_removal=true" and "dm.use_deferred_removal=false", I'd definitely expect it to always defer removals, while making deletions synchronously.

I'm under the impression that this is one of the things that libdm doesn't like.

What do you mean by that?

cyphar · 2017-06-30T12:48:30Z

I'd definitely expect it to always defer removals, while making deletions synchronously.

This patch doesn't change that removals will happen asynchronously, but if you have a delete operation that implies a removal it doesn't make sense to defer the removal because the delete will have to wait for the removal to finish before you can synchronously delete. I'm not sure I understand how a user could tell the difference between the two because both will synchronously finish.

What do you mean by that?

From my understanding, if you have a deferred removal pending and then attempt to do a delete, the delete will fail because libdm sees that the device has not been removed. It's stretch to say that "it doesn't like it" but I meant that it doesn't know internally that it should wait for the removal to complete and then synchronously carry out the deletion. But I could definitely be wrong in my understanding.

…false There have been some cases where umount, a device can be busy for a very short duration. Maybe its udev rules, or maybe it is runc related races or probably it is something else. We don't know yet. If deferred removal is enabled but deferred deletion is not, then for the case of "docker run -ti --rm fedora bash", a container will exit, device will be deferred removed and then immediately a call will come to delete the device. It is possible that deletion will fail if device was busy at that time. A device can't be deleted if it can't be removed/deactivated first. There is only one exception and that is when deferred deletion is on. In that case graph driver will keep track of deleted device and try to delete it later and return success to caller. Always make sure that device deactivation is synchronous when device is being deleted (except the case when deferred deletion is enabled). This should also take care of small races when device is busy for a short duration and it is being deleted. Signed-off-by: Vivek Goyal <vgoyal@redhat.com>

rhvgoyal · 2017-06-30T18:30:30Z

@slp, ok, so it makes sense that if a device is already scheduled for deferred removal and then we are trying to remove it synchronously later, it will fail saying device does not exist (as long as device deferred removal succeeds).

So I have updated the patch to not error out if device is already gone and parse devicemapper.ErrEnxio. Also updated RemoveDevice and RemoveDeviceDeferred code to return devicemapper.ErrEnxio if device is not present.

Please give it a try now.

slp · 2017-07-01T16:20:14Z

@rhvgoyal Just tested in the same environment. Container deletion no longer fails, but messages like these are still being logged by kernel:

[366626.745332] device-mapper: ioctl: unable to remove open device docker-253:0-17074721-6db24cc6a1617f7f934dd7f85109ce1403115f8a8f367723bdfde3fcf15cf9d0
[366626.845679] device-mapper: ioctl: device doesn't appear to be in the dev hash table.
[366626.870889] device-mapper: ioctl: remove_all left 6 open device(s)

In any case, this patch works around the main symptom, and can easily be backported to stable versions, so I'm happy with it.

Thanks Vivek!

rhvgoyal · 2017-07-03T12:47:14Z

@slp, kernel messages for failing to remove the device will continue to come as we are trying to remove device when it is busy. And also trying to remove it after it has been removed due to deferred removal.

Glad that this patch finally fixes the issue you are seeing.

Not sure why CI is not making any progress on this.

@thaJeztah will you be able to restart CI for this job.

thaJeztah · 2017-07-04T03:20:26Z

I restarted CI, thanks for the ping

vrothberg · 2017-07-05T06:00:29Z

03:26:46 re-exec error: exit status 1: output: BackupWrite [...] There is not enough space on the disk.

I am pretty sure that's not caused by this commit. @thaJeztah, @rhvgoyal, another CI run?

rhvgoyal · 2017-07-05T12:58:43Z

Right, this does not look related to my patch. @thaJeztah another restart?

rhvgoyal · 2017-07-07T18:19:02Z

@cpuguy83 PTAL

cpuguy83 · 2017-07-07T18:49:18Z

So this patch is basically saying either both deferred deletion and deferred removal must be enabled, or neither are enabled.
There is already a check on startup for the user requested deferred deletion then deferred removal must be enabled.

The patch enforces the flip-side of that without erroring out, is that right?

rhvgoyal · 2017-07-07T18:52:54Z

@cpuguy83 No, this patch is making sure that a user can run with deferred removal enabled but deferred deletion disabled. So graph driver will support following 3 combinations.

both disabled
deferred removal enabled, deferred deletion disabled
both enabled

We don't support fourth possibility that is deferred removal disabled and deferred deletion enabled. And we already have a check for that.

rhvgoyal · 2017-07-07T18:57:44Z

@cpuguy83 BTW, I have a question about EnsureRemoveAll. Why do we try to do recursive unmouting? Atleast for the case of graph driver it is not needed as we are trying to remove a directory which itself is a mount point. So doing a umount(dir, MNT_DETACH) should be enough?

Is it for the case where top level directory is not a mount point? If yes, we could probably check if directory being removed is a mount point or not and take action accordingly.

cpuguy83 · 2017-07-07T19:02:37Z

@rhvgoyal It indeed should be enough. Originally it wasn't a detached unmount. It may be something we can look at changing.

cpuguy83 · 2017-07-07T19:15:36Z

daemon/graphdriver/devmapper/deviceset.go

+	// rather busy wait for device removal to take care of these cases.
+	deferredRemove := devices.deferredRemove
+	if !devices.deferredDelete {
+		deferredRemove = false


So here is what I was looking at.
We always disable deferred removal if deferred delete is disabled (this is where Remove() winds up).

ok, yes. But we will do this only if container is being deleted. If container is just exiting/stopping but not going away, then deferred removal will still work.

cpuguy83

LGTM

rhvgoyal · 2017-07-10T12:52:58Z

@thaJeztah Can we restart the CI. I don't think this issue is due to my patch. I am hoping this PR can be merged soon now.

rhvgoyal · 2017-07-12T10:41:35Z

can we merge this PR now. It has acks and build has passed.

thaJeztah

LGTM

thaJeztah · 2017-07-13T00:35:51Z

Thanks!

GordonTheTurtle added the status/0-triage label Jun 29, 2017

rhvgoyal mentioned this pull request Jun 29, 2017

Use a retry loop for DeleteDevice in deleteTransaction #33846

Closed

cyphar reviewed Jun 29, 2017

View reviewed changes

rhvgoyal force-pushed the sync-removal branch from 714a738 to c897d40 Compare June 29, 2017 14:53

cyphar reviewed Jun 29, 2017

View reviewed changes

rhvgoyal force-pushed the sync-removal branch from c897d40 to 711f776 Compare June 29, 2017 14:58

thaJeztah added status/2-code-review and removed status/0-triage labels Jun 29, 2017

rhvgoyal force-pushed the sync-removal branch from 711f776 to 36cb6ef Compare June 30, 2017 18:27

cpuguy83 reviewed Jul 7, 2017

View reviewed changes

cpuguy83 approved these changes Jul 7, 2017

View reviewed changes

cpuguy83 added the rebuild/windowsRS1 label Jul 10, 2017

GordonTheTurtle removed the rebuild/windowsRS1 label Jul 10, 2017

thaJeztah approved these changes Jul 13, 2017

View reviewed changes

thaJeztah merged commit e04dbe5 into moby:master Jul 13, 2017

thaJeztah added the area/storage/devicemapper label Jul 13, 2017

rhvgoyal mentioned this pull request Jul 13, 2017

Backport: Fixes for supporting deferred removal enabled but deferred deletion disabled projectatomic/docker#266

Closed

Conversation

rhvgoyal commented Jun 29, 2017

Uh oh!

rhvgoyal commented Jun 29, 2017

Uh oh!

rhvgoyal commented Jun 29, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyphar commented Jun 29, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyphar commented Jun 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhvgoyal commented Jun 29, 2017

Uh oh!

thaJeztah commented Jun 29, 2017

Uh oh!

slp commented Jun 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cyphar commented Jun 30, 2017

Uh oh!

slp commented Jun 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cyphar commented Jun 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slp commented Jun 30, 2017

Uh oh!

cyphar commented Jun 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhvgoyal commented Jun 30, 2017

Uh oh!

slp commented Jul 1, 2017

Uh oh!

rhvgoyal commented Jul 3, 2017

Uh oh!

thaJeztah commented Jul 4, 2017

Uh oh!

vrothberg commented Jul 5, 2017

Uh oh!

rhvgoyal commented Jul 5, 2017

Uh oh!

rhvgoyal commented Jul 7, 2017

Uh oh!

cpuguy83 commented Jul 7, 2017

Uh oh!

rhvgoyal commented Jul 7, 2017

Uh oh!

rhvgoyal commented Jul 7, 2017

Uh oh!

cpuguy83 commented Jul 7, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cpuguy83 left a comment

Choose a reason for hiding this comment

Uh oh!

rhvgoyal commented Jul 10, 2017

Uh oh!

rhvgoyal commented Jul 12, 2017

Uh oh!

thaJeztah left a comment

Choose a reason for hiding this comment

Uh oh!

cyphar commented Jun 29, 2017 •

edited

Loading

slp commented Jun 30, 2017 •

edited

Loading

slp commented Jun 30, 2017 •

edited

Loading

cyphar commented Jun 30, 2017 •

edited

Loading

cyphar commented Jun 30, 2017 •

edited

Loading