Project

General

Profile

Actions

Bug #58090

open

Non-existent pending clone shows up in snapshot info

Added by Sebastian Hasler over 3 years ago. Updated 5 months ago.

Status:
Pending Backport
Priority:
Normal
Category:
fsck/damage handling
Target version:
% Done:

0%

Source:
Community (user)
Backport:
reef,squid
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-8040-g842fb60510
Released In:
v20.2.0~978
Upkeep Timestamp:
2025-11-01T00:58:37+00:00

Description

Ceph version: v17.2.5

My CephFS somehow got in a state where a snapshot has a pending clone, but the pending clone doesn't exist. (This is problematic, because the pending clone prevents me from being able to delete the snapshot.)

$ ceph fs subvolume --group_name=csi snapshot info ssd-fs csi-vol-9ce73497-1be0-11ec-88f1-e6360fd42c9e csi-snap-cd27f06b-4fbb-11ec-978d-8af73a17386e
{
    "created_at": "2021-11-27 19:54:16.134448",
    "data_pool": "ssd-fs-data0",
    "has_pending_clones": "yes",
    "pending_clones": [
        {
            "name": "csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec",
            "target_group": "csi" 
        }
    ]
}

$ ceph fs clone --group_name=csi status ssd-fs csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec
Error ENOENT: subvolume 'csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec' does not exist

I think the CephFS got in this state when the clone failed due to insufficient disk space. This was already some time ago with an older version of Ceph. It might or might not have been fixed in the meantime.

The point of this ticket is that CephFS should be able to recover from this state, but currently that seems to not be the case.

To try to recover from this state, I had the idea to re-create the clone with that exact name and then cancel it.

$ ceph fs subvolume --group_name=csi snapshot clone ssd-fs csi-vol-9ce73497-1be0-11ec-88f1-e6360fd42c9e csi-snap-cd27f06b-4fbb-11ec-978d-8af73a17386e csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec --target_group_name=csi

$ ceph fs clone --group_name=csi status ssd-fs csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec
{
  "status": {
    "state": "in-progress",
    "source": {
      "volume": "ssd-fs",
      "subvolume": "csi-vol-9ce73497-1be0-11ec-88f1-e6360fd42c9e",
      "snapshot": "csi-snap-cd27f06b-4fbb-11ec-978d-8af73a17386e",
      "group": "csi" 
    }
  }
}

$ ceph fs subvolume --group_name=csi snapshot info ssd-fs csi-vol-9ce73497-1be0-11ec-88f1-e6360fd42c9e csi-snap-cd27f06b-4fbb-11ec-978d-8af73a17386e
{
    "created_at": "2021-11-27 19:54:16.134448",
    "data_pool": "ssd-fs-data0",
    "has_pending_clones": "yes",
    "pending_clones": [
        {
            "name": "csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec",
            "target_group": "csi" 
        },
        {
            "name": "csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec",
            "target_group": "csi" 
        }
    ]
}

$ ceph fs clone --group_name=csi cancel ssd-fs csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec

$ ceph fs clone --group_name=csi status ssd-fs csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec
{
  "status": {
    "state": "canceled",
    "source": {
      "volume": "ssd-fs",
      "subvolume": "csi-vol-9ce73497-1be0-11ec-88f1-e6360fd42c9e",
      "snapshot": "csi-snap-cd27f06b-4fbb-11ec-978d-8af73a17386e",
      "group": "csi" 
    },
    "failure": {
      "errno": "4",
      "error_msg": "user interrupted clone operation" 
    }
  }
}

$ ceph fs subvolume --group_name=csi snapshot info ssd-fs csi-vol-9ce73497-1be0-11ec-88f1-e6360fd42c9e csi-snap-cd27f06b-4fbb-11ec-978d-8af73a17386e
{
    "created_at": "2021-11-27 19:54:16.134448",
    "data_pool": "ssd-fs-data0",
    "has_pending_clones": "yes",
    "pending_clones": [
        {
            "name": "csi-vol-ff687f29-4fbd-11ec-830e-6ed86f62d6ec",
            "target_group": "csi" 
        }
    ]
}

However, as you can see, re-creating the clone leads to a duplicate entry in the `pending_clones` list, and cancellation of the clone just removes one of those two entries. So there's still the pending clone which I don't get rid of, so I cannot delete the snapshot.


Related issues 2 (1 open1 closed)

Copied to CephFS - Backport #70232: reef: Non-existent pending clone shows up in snapshot infoResolvedNeeraj Pratap SinghActions
Copied to CephFS - Backport #70233: squid: Non-existent pending clone shows up in snapshot infoIn ProgressNeeraj Pratap SinghActions
Actions #1

Updated by Venky Shankar over 3 years ago

Hi Sebastian,

There is a stray index causing this issue. Could you list the contents of `/volumes/_index/clone/` (under cephfs mount point).

Actions #2

Updated by Sebastian Hasler over 3 years ago

Now the snapshot is deleted (finally). From the logs of our CSI provisioner, it seems that the snapshot was deleted shortly after I created this issue. So I guess the re-creation and cancellation of the clone did have an effect, just slightly delayed.

Actions #3

Updated by Sebastian Hasler over 3 years ago

The `/volumes/_index/clone/` directory is empty, by the way. But that's after the snapshot was deleted successfully. I don't know how this directory looked like for the previous year where the CSI provisioner continuously tried to delete this snapshot (and failed due to (non-existent) pending clones).

Actions #4

Updated by Venky Shankar over 3 years ago

Sebastian Hasler wrote:

The `/volumes/_index/clone/` directory is empty, by the way. But that's after the snapshot was deleted successfully. I don't know how this directory looked like for the previous year where the CSI provisioner continuously tried to delete this snapshot (and failed due to (non-existent) pending clones).

Most likely there would have been a dangling symlink in that directory. We have seen this before and one can run into it when there is insufficient disk-space.

Actions #5

Updated by Venky Shankar over 3 years ago

  • Assignee set to Rishabh Dave
  • Target version set to v18.0.0
  • Backport set to pacific,quincy
  • Component(FS) mgr/volumes added

Rishabh, please take a look at this. I think the dangling symlink can be gracefully handled by deleting it.

Actions #6

Updated by Patrick Donnelly over 2 years ago

  • Target version deleted (v18.0.0)
Actions #7

Updated by Venky Shankar about 2 years ago

  • Assignee changed from Rishabh Dave to Neeraj Pratap Singh

Neeraj, please take this one.

Actions #8

Updated by Kotresh Hiremath Ravishankar about 2 years ago

Neeraj and I had a discussion regarding this.

We fixed a bunch of issues around clones and dangling index symlinks, so I think this issue should not occur anymore. But we do require a mechanism to get out of this situation if this was occurred in older versions.
I think we can check and clear the dangling symlinks in snapshot info command.

Thanks,
Kotresh H R

Actions #9

Updated by Dhairya Parmar about 2 years ago

  • Pull request ID set to 55838
Actions #10

Updated by Rishabh Dave over 1 year ago

  • Status changed from New to Fix Under Review
Actions #11

Updated by Konstantin Shalygin over 1 year ago

  • Backport changed from pacific,quincy to quincy
Actions #12

Updated by Venky Shankar over 1 year ago

  • Target version set to v20.0.0
  • Backport changed from quincy to quincy,reef,squid
Actions #13

Updated by Konstantin Shalygin about 1 year ago

  • Backport changed from quincy,reef,squid to reef,squid
Actions #14

Updated by Rishabh Dave about 1 year ago

  • Status changed from Fix Under Review to Pending Backport
Actions #15

Updated by Upkeep Bot about 1 year ago

  • Copied to Backport #70232: reef: Non-existent pending clone shows up in snapshot info added
Actions #16

Updated by Upkeep Bot about 1 year ago

  • Copied to Backport #70233: squid: Non-existent pending clone shows up in snapshot info added
Actions #17

Updated by Upkeep Bot about 1 year ago

  • Tags (freeform) set to backport_processed
Actions #18

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 842fb60510fb7d18327e37d4ca0cbcb25c531d6d
  • Fixed In set to v19.3.0-8040-g842fb60510f
  • Upkeep Timestamp set to 2025-07-09T17:10:12+00:00
Actions #19

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-8040-g842fb60510f to v19.3.0-8040-g842fb60510
  • Upkeep Timestamp changed from 2025-07-09T17:10:12+00:00 to 2025-07-14T17:42:03+00:00
Actions #20

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~978
  • Upkeep Timestamp changed from 2025-07-14T17:42:03+00:00 to 2025-11-01T00:58:37+00:00
Actions

Also available in: Atom PDF