Skip to content

volumes: fix error-handling when removing volumes with swarm enabled#45148

Merged
thaJeztah merged 2 commits intomoby:masterfrom
thaJeztah:fix_volume_error_handling
Mar 14, 2023
Merged

volumes: fix error-handling when removing volumes with swarm enabled#45148
thaJeztah merged 2 commits intomoby:masterfrom
thaJeztah:fix_volume_error_handling

Conversation

@thaJeztah
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah commented Mar 12, 2023

integration/volumes: TestVolumesRemove: add coverage for force/no-force

Add additional test-cases for deleting non-existing volumes (with/without force).

With this patch:

make TEST_FILTER=TestVolumesRemove DOCKER_GRAPHDRIVER=vfs test-integration

Running /go/src/github.com/docker/docker/integration/volume (arm64.integration.volume) flags=-test.v -test.timeout=10m  -test.run TestVolumesRemove
...
=== RUN   TestVolumesRemove
=== RUN   TestVolumesRemove/volume_in_use
=== RUN   TestVolumesRemove/volume_not_in_use
=== RUN   TestVolumesRemove/non-existing_volume
=== RUN   TestVolumesRemove/non-existing_volume_force
--- PASS: TestVolumesRemove (0.04s)
    --- PASS: TestVolumesRemove/volume_in_use (0.00s)
    --- PASS: TestVolumesRemove/volume_not_in_use (0.01s)
    --- PASS: TestVolumesRemove/non-existing_volume (0.00s)
    --- PASS: TestVolumesRemove/non-existing_volume_force (0.00s)
PASS

volumes: fix error-handling when removing volumes with swarm enabled

Commit 3246db3 (#44224) added handling for removing
cluster volumes, but in some conditions, this resulted in errors not being
returned if the volume was in use;

docker swarm init
docker volume create foo
docker create -v foo:/foo busybox top
docker volume rm foo

This patch changes the logic for ignoring "local" volume errors if swarm
is enabled (and cluster volumes supported).

While working on this fix, I also discovered that Cluster.RemoveVolume()
did not handle the "force" option correctly; while swarm correctly handled
these, the cluster backend performs a lookup of the volume first (to obtain
its ID), which would fail if the volume didn't exist.

Before this patch:

make TEST_FILTER=TestVolumesRemoveSwarmEnabled DOCKER_GRAPHDRIVER=vfs test-integration
...
Running /go/src/github.com/docker/docker/integration/volume (arm64.integration.volume) flags=-test.v -test.timeout=10m  -test.run TestVolumesRemoveSwarmEnabled
...
=== RUN   TestVolumesRemoveSwarmEnabled
=== PAUSE TestVolumesRemoveSwarmEnabled
=== CONT  TestVolumesRemoveSwarmEnabled
=== RUN   TestVolumesRemoveSwarmEnabled/volume_in_use
    volume_test.go:122: assertion failed: error is nil, not errdefs.IsConflict
    volume_test.go:123: assertion failed: expected an error, got nil
=== RUN   TestVolumesRemoveSwarmEnabled/volume_not_in_use
=== RUN   TestVolumesRemoveSwarmEnabled/non-existing_volume
=== RUN   TestVolumesRemoveSwarmEnabled/non-existing_volume_force
    volume_test.go:143: assertion failed: error is not nil: Error response from daemon: volume no_such_volume not found
--- FAIL: TestVolumesRemoveSwarmEnabled (1.57s)
    --- FAIL: TestVolumesRemoveSwarmEnabled/volume_in_use (0.00s)
    --- PASS: TestVolumesRemoveSwarmEnabled/volume_not_in_use (0.01s)
    --- PASS: TestVolumesRemoveSwarmEnabled/non-existing_volume (0.00s)
    --- FAIL: TestVolumesRemoveSwarmEnabled/non-existing_volume_force (0.00s)
FAIL

With this patch:

make TEST_FILTER=TestVolumesRemoveSwarmEnabled DOCKER_GRAPHDRIVER=vfs test-integration
...
Running /go/src/github.com/docker/docker/integration/volume (arm64.integration.volume) flags=-test.v -test.timeout=10m  -test.run TestVolumesRemoveSwarmEnabled
...
=== RUN   TestVolumesRemoveSwarmEnabled
=== PAUSE TestVolumesRemoveSwarmEnabled
=== CONT  TestVolumesRemoveSwarmEnabled
=== RUN   TestVolumesRemoveSwarmEnabled/volume_in_use
=== RUN   TestVolumesRemoveSwarmEnabled/volume_not_in_use
=== RUN   TestVolumesRemoveSwarmEnabled/non-existing_volume
=== RUN   TestVolumesRemoveSwarmEnabled/non-existing_volume_force
--- PASS: TestVolumesRemoveSwarmEnabled (1.53s)
    --- PASS: TestVolumesRemoveSwarmEnabled/volume_in_use (0.00s)
    --- PASS: TestVolumesRemoveSwarmEnabled/volume_not_in_use (0.01s)
    --- PASS: TestVolumesRemoveSwarmEnabled/non-existing_volume (0.00s)
    --- PASS: TestVolumesRemoveSwarmEnabled/non-existing_volume_force (0.00s)
PASS

- Description for the changelog

Fix removing volumes not returning an error if swarm is enabled

- A picture of a cute animal (not mandatory but encouraged)

Add additional test-cases for deleting non-existing volumes (with/without force).

With this patch:

    make TEST_FILTER=TestVolumesRemove DOCKER_GRAPHDRIVER=vfs test-integration

    Running /go/src/github.com/docker/docker/integration/volume (arm64.integration.volume) flags=-test.v -test.timeout=10m  -test.run TestVolumesRemove
    ...
    === RUN   TestVolumesRemove
    === RUN   TestVolumesRemove/volume_in_use
    === RUN   TestVolumesRemove/volume_not_in_use
    === RUN   TestVolumesRemove/non-existing_volume
    === RUN   TestVolumesRemove/non-existing_volume_force
    --- PASS: TestVolumesRemove (0.04s)
        --- PASS: TestVolumesRemove/volume_in_use (0.00s)
        --- PASS: TestVolumesRemove/volume_not_in_use (0.01s)
        --- PASS: TestVolumesRemove/non-existing_volume (0.00s)
        --- PASS: TestVolumesRemove/non-existing_volume_force (0.00s)
    PASS

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah

This comment was marked as resolved.

@thaJeztah thaJeztah force-pushed the fix_volume_error_handling branch from 4f2fee7 to 53297e5 Compare March 12, 2023 14:38
// is enabled, in which case the remove won't return an error if no local
// volume existed, so we don't know if removal succeeded. In that case
// we always try to delete cluster volumes as well.
if errdefs.IsNotFound(err) || force {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that checking for errdefs.IsNotFound() is actually redundant here; I thought it made the code slightly more descriptive, but happy to revert that change.

// local volume, but could be a cluster volume, so we ignore "not found"
// errors at this stage. Note that no "not found" error is produced if
// "force" is enabled.
err := v.backend.Remove(ctx, vars["name"], opts.WithPurgeOnError(force))
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also looking if we can remove the force from the backends, and instead handle "not-exist" errors on the caller site, but looks like that may need more work ("purge" does more than only ignoring "not-exist" errors, as it also updates state in the backend to remove volumes from the store if the (remote ?) volume store (no longer) has a volume.

@thaJeztah thaJeztah force-pushed the fix_volume_error_handling branch from 53297e5 to b51a682 Compare March 12, 2023 15:02
@thaJeztah
Copy link
Copy Markdown
Member Author

@cpuguy83 @dperny @neersighted ptal

@dperny
Copy link
Copy Markdown
Contributor

dperny commented Mar 13, 2023

LGTM!

Comment on lines +167 to +169
if err != nil && !errdefs.IsNotFound(err) {
return err
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional flow here (and through all of these blocks) feels very clunky. Is there a better way to structure this that makes the flow more obvious, even if we have mode code movement?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was hoping to remove the "force" from the backend entirely, and just handle notfound errors (which are the ones we want to ignore) to the caller site; #45148 (comment)

Unfortunately, there is code in the backend that still needs it (if the plugin doesn't have the volume, then our local storage still has to remove it, and not fail)

Perhaps still possible, if we would have the backend ignore the error, for what it needs to do, but make it still return it, but that still requires more changes (probably out of scope for this PR)

if err != nil {
return err
}
if cvErr := v.cluster.RemoveVolume(vars["name"], force); cvErr != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we unify the branches here?
I think we can do err = v.cluster.RemoveVolume and then do nothing else in the branch.
This of course assumes cluster.RemoveVolume will return a not found error if force=false

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! You're right; I guess that would work; just overwrite the original err value if we reach this code.


t.Run("non-existing volume", func(t *testing.T) {
err = client.VolumeRemove(ctx, "no_such_volume", false)
assert.Check(t, is.ErrorType(err, errdefs.IsNotFound))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this error check works?
errdefs.IsNotFound is a function

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, so is.ErrorType supports either a type or an assertion function, which errdefs.IsXXX satisfies

(also didn't know this until not too long ago)

@thaJeztah thaJeztah force-pushed the fix_volume_error_handling branch from b51a682 to 2d554cd Compare March 13, 2023 18:15
Commit 3246db3 added handling for removing
cluster volumes, but in some conditions, this resulted in errors not being
returned if the volume was in use;

    docker swarm init
    docker volume create foo
    docker create -v foo:/foo busybox top
    docker volume rm foo

This patch changes the logic for ignoring "local" volume errors if swarm
is enabled (and cluster volumes supported).

While working on this fix, I also discovered that Cluster.RemoveVolume()
did not handle the "force" option correctly; while swarm correctly handled
these, the cluster backend performs a lookup of the volume first (to obtain
its ID), which would fail if the volume didn't exist.

Before this patch:

    make TEST_FILTER=TestVolumesRemoveSwarmEnabled DOCKER_GRAPHDRIVER=vfs test-integration
    ...
    Running /go/src/github.com/docker/docker/integration/volume (arm64.integration.volume) flags=-test.v -test.timeout=10m  -test.run TestVolumesRemoveSwarmEnabled
    ...
    === RUN   TestVolumesRemoveSwarmEnabled
    === PAUSE TestVolumesRemoveSwarmEnabled
    === CONT  TestVolumesRemoveSwarmEnabled
    === RUN   TestVolumesRemoveSwarmEnabled/volume_in_use
        volume_test.go:122: assertion failed: error is nil, not errdefs.IsConflict
        volume_test.go:123: assertion failed: expected an error, got nil
    === RUN   TestVolumesRemoveSwarmEnabled/volume_not_in_use
    === RUN   TestVolumesRemoveSwarmEnabled/non-existing_volume
    === RUN   TestVolumesRemoveSwarmEnabled/non-existing_volume_force
        volume_test.go:143: assertion failed: error is not nil: Error response from daemon: volume no_such_volume not found
    --- FAIL: TestVolumesRemoveSwarmEnabled (1.57s)
        --- FAIL: TestVolumesRemoveSwarmEnabled/volume_in_use (0.00s)
        --- PASS: TestVolumesRemoveSwarmEnabled/volume_not_in_use (0.01s)
        --- PASS: TestVolumesRemoveSwarmEnabled/non-existing_volume (0.00s)
        --- FAIL: TestVolumesRemoveSwarmEnabled/non-existing_volume_force (0.00s)
    FAIL

With this patch:

    make TEST_FILTER=TestVolumesRemoveSwarmEnabled DOCKER_GRAPHDRIVER=vfs test-integration
    ...
    Running /go/src/github.com/docker/docker/integration/volume (arm64.integration.volume) flags=-test.v -test.timeout=10m  -test.run TestVolumesRemoveSwarmEnabled
    ...
    make TEST_FILTER=TestVolumesRemoveSwarmEnabled DOCKER_GRAPHDRIVER=vfs test-integration
    ...
    Running /go/src/github.com/docker/docker/integration/volume (arm64.integration.volume) flags=-test.v -test.timeout=10m  -test.run TestVolumesRemoveSwarmEnabled
    ...
    === RUN   TestVolumesRemoveSwarmEnabled
    === PAUSE TestVolumesRemoveSwarmEnabled
    === CONT  TestVolumesRemoveSwarmEnabled
    === RUN   TestVolumesRemoveSwarmEnabled/volume_in_use
    === RUN   TestVolumesRemoveSwarmEnabled/volume_not_in_use
    === RUN   TestVolumesRemoveSwarmEnabled/non-existing_volume
    === RUN   TestVolumesRemoveSwarmEnabled/non-existing_volume_force
    --- PASS: TestVolumesRemoveSwarmEnabled (1.53s)
        --- PASS: TestVolumesRemoveSwarmEnabled/volume_in_use (0.00s)
        --- PASS: TestVolumesRemoveSwarmEnabled/volume_not_in_use (0.01s)
        --- PASS: TestVolumesRemoveSwarmEnabled/non-existing_volume (0.00s)
        --- PASS: TestVolumesRemoveSwarmEnabled/non-existing_volume_force (0.00s)
    PASS

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah thaJeztah force-pushed the fix_volume_error_handling branch from 2d554cd to 058a31e Compare March 13, 2023 18:17
@thaJeztah
Copy link
Copy Markdown
Member Author

Let me bring this one in, and do a cherry-pick later

@thaJeztah
Copy link
Copy Markdown
Member Author

cherry-pick in #45155

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Attempting to Remove Volume Used by Service Does Not Report Error with CLI 23.0.1 Docker volume rm

5 participants