Bug #71648: mgr/volumes: switch to non-recursive implementation for directory tree removal - CephFS - Ceph

a8d092988fb1ffb354a20b66bcc89103faa94ed0

Category:

Correctness/Safety

Target version:

Ceph - v21.0.0

% Done:

Source:

other

Backport:

tentacle,squid,reef

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

mgr/volumes

Labels (FS):

Pull request ID:

63917

Tags (freeform):

backport_processed

Merge Commit:

Fixed In:

v20.3.0-3421-ga8d092988f

Released In:

Upkeep Timestamp:

2025-10-07T11:29:44+00:00

Description

Python has a limit on the number of call stack frames when using recursion. The python interpreter would error out if it sees ~1000 stack frames assuming an infinite recursive loop. The traceback would resemble something like

debug 2025-06-11T18:00:10.881+0000 7f716aa31640  0 [volumes WARNING volumes.fs.async_job] traceback: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/volumes/fs/async_job.py", line 51, in run
    self.async_job.execute_job(vol_job[0], vol_job[1], should_cancel=lambda: thread_id.should_cancel())
  File "/usr/share/ceph/mgr/volumes/fs/purge_queue.py", line 113, in execute_job
    purge_trash_entry_for_volume(self.fs_client, self.vc.volspec, volname, job, should_cancel)
  File "/usr/share/ceph/mgr/volumes/fs/purge_queue.py", line 89, in purge_trash_entry_for_volume
    trashcan.purge(pth, should_cancel)
  File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 87, in purge
    rmtree(trashpath)
  File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 72, in rmtree
    rmtree(d_full)
  File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 72, in rmtree
    rmtree(d_full)
  File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 72, in rmtree
    rmtree(d_full)
  [Previous line repeated 983 more times]
  File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 64, in rmtree
    log.debug("rmtree {0}".format(root_path))
  File "/usr/lib64/python3.9/logging/__init__.py", line 1434, in debug
    self._log(DEBUG, msg, args, **kwargs)
  File "/usr/lib64/python3.9/logging/__init__.py", line 1587, in _log
    record = self.makeRecord(self.name, level, fn, lno, msg, args,
  File "/usr/lib64/python3.9/logging/__init__.py", line 1556, in makeRecord
    rv = _logRecordFactory(name, level, fn, lno, msg, args, exc_info, func,
  File "/usr/lib64/python3.9/logging/__init__.py", line 316, in __init__
    self.filename = os.path.basename(pathname)
  File "/usr/lib64/python3.9/posixpath.py", line 143, in basename
    sep = _get_sep(p)
  File "/usr/lib64/python3.9/posixpath.py", line 42, in _get_sep
    if isinstance(path, bytes):
RecursionError: maximum recursion depth exceeded while calling a Python object

Related issues 6 (5 open — 1 closed)

Related to CephFS - Bug #72779: mds: generating full path regardless of the length slows down FS

Pending Backport

Related to CephFS - Bug #72992: pybind/cephfs: enhance rmtree() performance

Resolved

Related to CephFS - Bug #72993: client: very very long path slows client down

Pending Backport

Copied to CephFS - Backport #73391: tentacle: mgr/volumes: switch to non-recursive implementation for directory tree removal

QA Testing

Copied to CephFS - Backport #73392: squid: mgr/volumes: switch to non-recursive implementation for directory tree removal

QA Testing

Copied to CephFS - Backport #73393: reef: mgr/volumes: switch to non-recursive implementation for directory tree removal

QA Testing

Updated by Venky Shankar 9 months ago

Status changed from Need More Info to New

Actions

Updated by Patrick Donnelly 9 months ago

Priority changed from Immediate to Urgent

[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]

I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.

Actions

Updated by Greg Farnum 9 months ago

Priority changed from Urgent to Immediate

There are two issues here:
1) It turns out the isdir check will follow symlinks, so you can infinite loop on directory symlinks: https://docs.python.org/3/library/os.path.html#os.path.isdir
2) Due to recursion limits, it assumes a maximum depth anyway.

We need to resolve both of those issues.

Actions

Updated by Greg Farnum 9 months ago

Greg Farnum wrote in #note-3:

There are two issues here:
1) It turns out the isdir check will follow symlinks, so you can infinite loop on directory symlinks: https://docs.python.org/3/library/os.path.html#os.path.isdir

Oh, this is the Python builtin and the volumes code is using the Ceph library is_dir(), so I guess that's not relevant. Patrick doesn't think so at least. :)

Actions

Updated by Venky Shankar 9 months ago

Patrick Donnelly wrote in #note-2:

[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]

How is moving to a non-recursive implementation a "hack"?

I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.

That did come my mind, but I couldn't convince myself the some python wierdness demands that the implementation be moved to the client library. FWIW, I had considered doing something similar somewhere around the mirror daemon development stage, but didn't for similar reasons.

Actions

Updated by Patrick Donnelly 9 months ago

Venky Shankar wrote in #note-5:

Patrick Donnelly wrote in #note-2:

[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]

How is moving to a non-recursive implementation a "hack"?

Sorry it's not a hack. It just could be done better (IMO).

I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.

That did come my mind, but I couldn't convince myself the some python wierdness demands that the implementation be moved to the client library. FWIW, I had considered doing something similar somewhere around the mirror daemon development stage, but didn't for similar reasons.

Not sure I follow. The recursive removal is fairly basic to port except some of the dir listings may be tricky within the Client.

Actions

Updated by Rishabh Dave 9 months ago

Status changed from New to Fix Under Review
Pull request ID set to 63917

Actions

Updated by Venky Shankar 9 months ago

Patrick Donnelly wrote in #note-6:

Venky Shankar wrote in #note-5:

Patrick Donnelly wrote in #note-2:

[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]

How is moving to a non-recursive implementation a "hack"?

Sorry it's not a hack. It just could be done better (IMO).

I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.

That did come my mind, but I couldn't convince myself the some python wierdness demands that the implementation be moved to the client library. FWIW, I had considered doing something similar somewhere around the mirror daemon development stage, but didn't for similar reasons.

Not sure I follow. The recursive removal is fairly basic to port except some of the dir listings may be tricky within the Client.

What I meant was - do we want to port the recursive delete to the client library for a part of the reason being saving Python's GIL acquire-release switch. If we go this route, the directory listing trickiness you mention is related to the readdir cache or something else?

Actions