Project

General

Profile

Actions

Bug #71648

open

mgr/volumes: switch to non-recursive implementation for directory tree removal

Added by Venky Shankar 9 months ago. Updated 4 months ago.

Status:
Pending Backport
Priority:
Immediate
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
other
Backport:
tentacle,squid,reef
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v20.3.0-3421-ga8d092988f
Released In:
Upkeep Timestamp:
2025-10-07T11:29:44+00:00

Description

Python has a limit on the number of call stack frames when using recursion. The python interpreter would error out if it sees ~1000 stack frames assuming an infinite recursive loop. The traceback would resemble something like

debug 2025-06-11T18:00:10.881+0000 7f716aa31640  0 [volumes WARNING volumes.fs.async_job] traceback: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/volumes/fs/async_job.py", line 51, in run
    self.async_job.execute_job(vol_job[0], vol_job[1], should_cancel=lambda: thread_id.should_cancel())
  File "/usr/share/ceph/mgr/volumes/fs/purge_queue.py", line 113, in execute_job
    purge_trash_entry_for_volume(self.fs_client, self.vc.volspec, volname, job, should_cancel)
  File "/usr/share/ceph/mgr/volumes/fs/purge_queue.py", line 89, in purge_trash_entry_for_volume
    trashcan.purge(pth, should_cancel)
  File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 87, in purge
    rmtree(trashpath)
  File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 72, in rmtree
    rmtree(d_full)
  File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 72, in rmtree
    rmtree(d_full)
  File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 72, in rmtree
    rmtree(d_full)
  [Previous line repeated 983 more times]
  File "/usr/share/ceph/mgr/volumes/fs/operations/trash.py", line 64, in rmtree
    log.debug("rmtree {0}".format(root_path))
  File "/usr/lib64/python3.9/logging/__init__.py", line 1434, in debug
    self._log(DEBUG, msg, args, **kwargs)
  File "/usr/lib64/python3.9/logging/__init__.py", line 1587, in _log
    record = self.makeRecord(self.name, level, fn, lno, msg, args,
  File "/usr/lib64/python3.9/logging/__init__.py", line 1556, in makeRecord
    rv = _logRecordFactory(name, level, fn, lno, msg, args, exc_info, func,
  File "/usr/lib64/python3.9/logging/__init__.py", line 316, in __init__
    self.filename = os.path.basename(pathname)
  File "/usr/lib64/python3.9/posixpath.py", line 143, in basename
    sep = _get_sep(p)
  File "/usr/lib64/python3.9/posixpath.py", line 42, in _get_sep
    if isinstance(path, bytes):
RecursionError: maximum recursion depth exceeded while calling a Python object 

Related issues 6 (5 open1 closed)

Related to CephFS - Bug #72779: mds: generating full path regardless of the length slows down FSPending BackportRishabh Dave

Actions
Related to CephFS - Bug #72992: pybind/cephfs: enhance rmtree() performanceResolvedRishabh Dave

Actions
Related to CephFS - Bug #72993: client: very very long path slows client downPending BackportRishabh Dave

Actions
Copied to CephFS - Backport #73391: tentacle: mgr/volumes: switch to non-recursive implementation for directory tree removalQA TestingRishabh DaveActions
Copied to CephFS - Backport #73392: squid: mgr/volumes: switch to non-recursive implementation for directory tree removalQA TestingRishabh DaveActions
Copied to CephFS - Backport #73393: reef: mgr/volumes: switch to non-recursive implementation for directory tree removalQA TestingRishabh DaveActions
Actions #1

Updated by Venky Shankar 9 months ago

  • Status changed from Need More Info to New
Actions #2

Updated by Patrick Donnelly 9 months ago

  • Priority changed from Immediate to Urgent

[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]

I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.

Actions #3

Updated by Greg Farnum 9 months ago

  • Priority changed from Urgent to Immediate

There are two issues here:
1) It turns out the isdir check will follow symlinks, so you can infinite loop on directory symlinks: https://docs.python.org/3/library/os.path.html#os.path.isdir
2) Due to recursion limits, it assumes a maximum depth anyway.

We need to resolve both of those issues.

Actions #4

Updated by Greg Farnum 9 months ago

Greg Farnum wrote in #note-3:

There are two issues here:
1) It turns out the isdir check will follow symlinks, so you can infinite loop on directory symlinks: https://docs.python.org/3/library/os.path.html#os.path.isdir

Oh, this is the Python builtin and the volumes code is using the Ceph library is_dir(), so I guess that's not relevant. Patrick doesn't think so at least. :)

Actions #5

Updated by Venky Shankar 9 months ago

Patrick Donnelly wrote in #note-2:

[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]

How is moving to a non-recursive implementation a "hack"?

I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.

That did come my mind, but I couldn't convince myself the some python wierdness demands that the implementation be moved to the client library. FWIW, I had considered doing something similar somewhere around the mirror daemon development stage, but didn't for similar reasons.

Actions #6

Updated by Patrick Donnelly 9 months ago

Venky Shankar wrote in #note-5:

Patrick Donnelly wrote in #note-2:

[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]

How is moving to a non-recursive implementation a "hack"?

Sorry it's not a hack. It just could be done better (IMO).

I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.

That did come my mind, but I couldn't convince myself the some python wierdness demands that the implementation be moved to the client library. FWIW, I had considered doing something similar somewhere around the mirror daemon development stage, but didn't for similar reasons.

Not sure I follow. The recursive removal is fairly basic to port except some of the dir listings may be tricky within the Client.

Actions #7

Updated by Rishabh Dave 9 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 63917
Actions #8

Updated by Venky Shankar 9 months ago

Patrick Donnelly wrote in #note-6:

Venky Shankar wrote in #note-5:

Patrick Donnelly wrote in #note-2:

[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]

How is moving to a non-recursive implementation a "hack"?

Sorry it's not a hack. It just could be done better (IMO).

I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.

That did come my mind, but I couldn't convince myself the some python wierdness demands that the implementation be moved to the client library. FWIW, I had considered doing something similar somewhere around the mirror daemon development stage, but didn't for similar reasons.

Not sure I follow. The recursive removal is fairly basic to port except some of the dir listings may be tricky within the Client.

What I meant was - do we want to port the recursive delete to the client library for a part of the reason being saving Python's GIL acquire-release switch. If we go this route, the directory listing trickiness you mention is related to the readdir cache or something else?

Actions #9

Updated by Patrick Donnelly 9 months ago

Venky Shankar wrote in #note-8:

Patrick Donnelly wrote in #note-6:

Venky Shankar wrote in #note-5:

Patrick Donnelly wrote in #note-2:

[Moving this to Urgent because my understanding is that this is not needed right now for some broken cluster; it'd be better to get the right fix here rather than a hack. Please correct me if I'm wrong Venky.]

How is moving to a non-recursive implementation a "hack"?

Sorry it's not a hack. It just could be done better (IMO).

I think the better approach here is to move the recursive unlink to Client. We can do it more easily in a non-recursive way (std::queue<Inode*>) and also simultaneously solve an annoyance where the purge thread is repeatedly reacquiring and dropping the GIL for each unlink.

That did come my mind, but I couldn't convince myself the some python wierdness demands that the implementation be moved to the client library. FWIW, I had considered doing something similar somewhere around the mirror daemon development stage, but didn't for similar reasons.

Not sure I follow. The recursive removal is fairly basic to port except some of the dir listings may be tricky within the Client.

What I meant was - do we want to port the recursive delete to the client library for a part of the reason being saving Python's GIL acquire-release switch. If we go this route, the directory listing trickiness you mention is related to the readdir cache or something else?

Mostly that the existing API for readdir is not tailored to be used internally by the client (as I recall). It may be easier than I think though.

Actions #10

Updated by Patrick Donnelly 9 months ago

  • Backport changed from tentacle,squid,reef to tentacle,squid
Actions #11

Updated by Rishabh Dave 6 months ago

  • Related to Bug #72779: mds: generating full path regardless of the length slows down FS added
Actions #12

Updated by Rishabh Dave 6 months ago

  • Related to Bug #72992: pybind/cephfs: enhance rmtree() performance added
Actions #13

Updated by Rishabh Dave 6 months ago

  • Related to Bug #72993: client: very very long path slows client down added
Actions #14

Updated by Rishabh Dave 6 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #15

Updated by Upkeep Bot 6 months ago

  • Merge Commit set to a8d092988fb1ffb354a20b66bcc89103faa94ed0
  • Fixed In set to v20.3.0-3421-ga8d092988f
  • Upkeep Timestamp set to 2025-10-07T11:29:44+00:00
Actions #16

Updated by Rishabh Dave 6 months ago

  • Copied to Backport #73391: tentacle: mgr/volumes: switch to non-recursive implementation for directory tree removal added
Actions #17

Updated by Rishabh Dave 6 months ago

  • Copied to Backport #73392: squid: mgr/volumes: switch to non-recursive implementation for directory tree removal added
Actions #18

Updated by Rishabh Dave 6 months ago

  • Tags (freeform) set to backport_processed
Actions #19

Updated by Rishabh Dave 6 months ago

  • Backport changed from tentacle,squid to tentacle,squid,reef
Actions #20

Updated by Rishabh Dave 6 months ago

  • Copied to Backport #73393: reef: mgr/volumes: switch to non-recursive implementation for directory tree removal added
Actions #21

Updated by Venky Shankar 4 months ago

@Rishabh Dave - What about the rmdir() slowness that was under investigation? (using perf dumps). Do we know what is causing the slowness?

Actions

Also available in: Atom PDF