Bug #71648
openmgr/volumes: switch to non-recursive implementation for directory tree removal
0%
Description
Python has a limit on the number of call stack frames when using recursion. The python interpreter would error out if it sees ~1000 stack frames assuming an infinite recursive loop. The traceback would resemble something like
debug 2025-06-11T18:00:10.881+0000 7f716aa31640 0 [volumes WARNING volumes.fs.async_job] traceback: Traceback (most recent call last):
File "/usr/share/ceph/mgr/volumes/fs/async_job.py", line 51, in run
self.async_job.execute_job(vol_job[0], vol_job[1], should_cancel=lambda: thread_id.should_cancel())
File "/usr/share/ceph/mgr/volumes/fs/purge_queue.py", line 113, in execute_job
purge_trash_entry_for_volume(self.fs_client, self.vc.volspec, volname, job, should_cancel)
File "/usr/share/ceph/mgr/volumes/fs/purge_queue.py", line 89, in purge_trash_entry_for_volume
trashcan.purge(pth, should_cancel)
File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 87, in purge
rmtree(trashpath)
File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 72, in rmtree
rmtree(d_full)
File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 72, in rmtree
rmtree(d_full)
File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 72, in rmtree
rmtree(d_full)
[Previous line repeated 983 more times]
File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 64, in rmtree
log.debug("rmtree {0}".format(root_path))
File "/usr/lib64/python3.9/logging/__init__.py", line 1434, in debug
self._log(DEBUG, msg, args, **kwargs)
File "/usr/lib64/python3.9/logging/__init__.py", line 1587, in _log
record = self.makeRecord(self.name, level, fn, lno, msg, args,
File "/usr/lib64/python3.9/logging/__init__.py", line 1556, in makeRecord
rv = _logRecordFactory(name, level, fn, lno, msg, args, exc_info, func,
File "/usr/lib64/python3.9/logging/__init__.py", line 316, in __init__
self.filename = os.path.basename(pathname)
File "/usr/lib64/python3.9/posixpath.py", line 143, in basename
sep = _get_sep(p)
File "/usr/lib64/python3.9/posixpath.py", line 42, in _get_sep
if isinstance(path, bytes):
RecursionError: maximum recursion depth exceeded while calling a Python object
Updated by Patrick Donnelly 9 months ago
- Priority changed from Immediate to Urgent
[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]
I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.
Updated by Greg Farnum 9 months ago
- Priority changed from Urgent to Immediate
There are two issues here:
1) It turns out the isdir check will follow symlinks, so you can infinite loop on directory symlinks: https://docs.python.org/3/library/os.path.html#os.path.isdir
2) Due to recursion limits, it assumes a maximum depth anyway.
We need to resolve both of those issues.
Updated by Greg Farnum 9 months ago
Greg Farnum wrote in #note-3:
There are two issues here:
1) It turns out the isdir check will follow symlinks, so you can infinite loop on directory symlinks: https://docs.python.org/3/library/os.path.html#os.path.isdir
Oh, this is the Python builtin and the volumes code is using the Ceph library is_dir(), so I guess that's not relevant. Patrick doesn't think so at least. :)
Updated by Venky Shankar 9 months ago
Patrick Donnelly wrote in #note-2:
[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]
How is moving to a non-recursive implementation a "hack"?
I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.
That did come my mind, but I couldn't convince myself the some python wierdness demands that the implementation be moved to the client library. FWIW, I had considered doing something similar somewhere around the mirror daemon development stage, but didn't for similar reasons.
Updated by Patrick Donnelly 9 months ago
Venky Shankar wrote in #note-5:
Patrick Donnelly wrote in #note-2:
[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]
How is moving to a non-recursive implementation a "hack"?
Sorry it's not a hack. It just could be done better (IMO).
I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.
That did come my mind, but I couldn't convince myself the some python wierdness demands that the implementation be moved to the client library. FWIW, I had considered doing something similar somewhere around the mirror daemon development stage, but didn't for similar reasons.
Not sure I follow. The recursive removal is fairly basic to port except some of the dir listings may be tricky within the Client.
Updated by Rishabh Dave 9 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 63917
Updated by Venky Shankar 9 months ago
Patrick Donnelly wrote in #note-6:
Venky Shankar wrote in #note-5:
Patrick Donnelly wrote in #note-2:
[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]
How is moving to a non-recursive implementation a "hack"?
Sorry it's not a hack. It just could be done better (IMO).
I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.
That did come my mind, but I couldn't convince myself the some python wierdness demands that the implementation be moved to the client library. FWIW, I had considered doing something similar somewhere around the mirror daemon development stage, but didn't for similar reasons.
Not sure I follow. The recursive removal is fairly basic to port except some of the dir listings may be tricky within the Client.
What I meant was - do we want to port the recursive delete to the client library for a part of the reason being saving Python's GIL acquire-release switch. If we go this route, the directory listing trickiness you mention is related to the readdir cache or something else?
Updated by Patrick Donnelly 9 months ago
Venky Shankar wrote in #note-8:
Patrick Donnelly wrote in #note-6:
Venky Shankar wrote in #note-5:
Patrick Donnelly wrote in #note-2:
[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]
How is moving to a non-recursive implementation a "hack"?
Sorry it's not a hack. It just could be done better (IMO).
I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.
That did come my mind, but I couldn't convince myself the some python wierdness demands that the implementation be moved to the client library. FWIW, I had considered doing something similar somewhere around the mirror daemon development stage, but didn't for similar reasons.
Not sure I follow. The recursive removal is fairly basic to port except some of the dir listings may be tricky within the Client.
What I meant was - do we want to port the recursive delete to the client library for a part of the reason being saving Python's GIL acquire-release switch. If we go this route, the directory listing trickiness you mention is related to the readdir cache or something else?
Mostly that the existing API for readdir is not tailored to be used internally by the client (as I recall). It may be easier than I think though.
Updated by Patrick Donnelly 9 months ago
- Backport changed from tentacle,squid,reef to tentacle,squid
Updated by Rishabh Dave 6 months ago
- Related to Bug #72779: mds: generating full path regardless of the length slows down FS added
Updated by Rishabh Dave 6 months ago
- Related to Bug #72992: pybind/cephfs: enhance rmtree() performance added
Updated by Rishabh Dave 6 months ago
- Related to Bug #72993: client: very very long path slows client down added
Updated by Rishabh Dave 6 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Upkeep Bot 6 months ago
- Merge Commit set to a8d092988fb1ffb354a20b66bcc89103faa94ed0
- Fixed In set to v20.3.0-3421-ga8d092988f
- Upkeep Timestamp set to 2025-10-07T11:29:44+00:00
Updated by Rishabh Dave 6 months ago
- Copied to Backport #73391: tentacle: mgr/volumes: switch to non-recursive implementation for directory tree removal added
Updated by Rishabh Dave 6 months ago
- Copied to Backport #73392: squid: mgr/volumes: switch to non-recursive implementation for directory tree removal added
Updated by Rishabh Dave 6 months ago
- Backport changed from tentacle,squid to tentacle,squid,reef
Updated by Rishabh Dave 6 months ago
- Copied to Backport #73393: reef: mgr/volumes: switch to non-recursive implementation for directory tree removal added
Updated by Venky Shankar 4 months ago
@Rishabh Dave - What about the rmdir() slowness that was under investigation? (using perf dumps). Do we know what is causing the slowness?