Project

General

Profile

Actions

Bug #66560

closed

mgr/vol: get_next_job() from asyn_cloner failed because clone entry went missing

Added by Rishabh Dave almost 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Development
Backport:
quincy,reef,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-3396-ge392142c65
Released In:
v20.2.0~2506
Upkeep Timestamp:
2025-11-01T01:26:47+00:00

Description

Tough to reproduce this, since to do so clone cancellation needs to be perfectly timed just before get_next_job() from async_cloner.py runs.

Relevant entries caputre from MGR log -

8042] osd_op (unknown.0.0:1607 33.3e 33:7eaac5ee:::100000011a3.0000001c: head [write 0~4194304 in=4194304b] snapc 1=[] ondisk+write+kn __if_redirected+supports_pool_eio e622) 0x55aac7595400 con 0x55aab6b60d80 4-06-17T18:41:06.116+0530 7fcae7a006c0
1
——
192.168.29.219:0/255696103 --> [v2:192.168.29.219:6826/2005942128,v1:192.168.29.219:6827/2 942128] -- osd_op(unknown.0.0:1608 33.8 33:11072d26:::100000011a3.0000001d: head [write 0-4194304 in=4194304b] snapc 1=[] ondisk+write+k n_if_redirected+supports_pool_eio e622) 0x55aac7555000 con 0x55aab88a1a80 • [volumes WARNING volumes.fs.async_job] traceback: Traceback (most recent call last):
4-06-17T18:41:06.121+0530 7fcb82a006c0
——
ile "/home/rishabh/repos/ceph/mgr-vol-clone-stats/src/pybind/mgr/volumes/fs/async_job.py", line 52, in run vol_job = self.async_job.get_job()
^^^^^^^^^^^^^^^^^^^^^^^^
ile "/home/rishabh/repos/ceph/mgr-vol-clone-stats/src/pybind/mgr/volumes/fs/async_job.py", line 195, in get_job (ret, job) = self.get_next_job(volname, running_jobs)
^^^^^^^^^^^^^^^.
^^^^^^^^^^^^^^^
ile "/home/rishabh/repos/ceph/mgr-vol-clone-stats/src/pybind/mgr/volumes/fs/async_cloner.py", line 402, in get_next_job return get_next_clone_entry(self.fs_client, self.vc.volspec, volname, running_jobs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ile "/home/rishabh/repos/ceph/mgr-vol-clone-stats/src/pybind/mgr/volumes/fs/async_cloner.py", line 35, in get_next_clone_entry job = clone_index.get_oldest_clone_entry(running_jobs)
ΑΛΛΛΛΛΛΛ
ΜΑΛΛΙ
ile "/home/rishabh/repos/ceph/mgr-vol-clone-stats/src/pybind/mgr/volumes/fs/operations/clone_index.py", line 80, in get_oldest_clone_en
st = self.fs.lstat(dpath)
ΑΛΛΛΛΛΛ
ile "cephfs.pyx", line 1997, in cephfs.LibCephFS.lstat
ile "cephfs.pyx", line 1978, in cephfs.LibCephFS.stat
hfs.ObjectNotFound: error in stat: /volumes/_index/clone/3efa5bf7-115c-4f27-8b83-8fecc16a690a: No such file or directory [Errno 2]
4-06-17T18:41:06.122+0530 7fcad9e006c0 1
192.168.29.219:0/3444413926 <== mds.0 v2:192.168.29.219:6834/1413894566 246 ==== client_re
(???:127 = 0 (0) Success) ==== 882+0+0 (secure 0 0 0 0x55aaccb68a00 con 0x55aab70cb680

Related issues 3 (0 open3 closed)

Copied to CephFS - Backport #66927: reef: mgr/vol: get_next_job() from asyn_cloner failed because clone entry went missingResolvedRishabh DaveActions
Copied to CephFS - Backport #66928: squid: mgr/vol: get_next_job() from asyn_cloner failed because clone entry went missingResolvedRishabh DaveActions
Copied to CephFS - Backport #66929: quincy: mgr/vol: get_next_job() from asyn_cloner failed because clone entry went missingResolvedRishabh DaveActions
Actions #1

Updated by Rishabh Dave almost 2 years ago

  • Status changed from New to Fix Under Review
Actions #2

Updated by Venky Shankar almost 2 years ago

@Rishabh Dave The log entries in the description isn't totally clear since the entries overlap. How is that possible?

Actions #3

Updated by Venky Shankar almost 2 years ago

  • Category set to Correctness/Safety
  • Target version set to v20.0.0
  • Source set to Development
  • Backport set to quincy,reef,squid
Actions #4

Updated by Rishabh Dave over 1 year ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Rishabh Dave over 1 year ago

  • Copied to Backport #66927: reef: mgr/vol: get_next_job() from asyn_cloner failed because clone entry went missing added
Actions #6

Updated by Rishabh Dave over 1 year ago

  • Copied to Backport #66928: squid: mgr/vol: get_next_job() from asyn_cloner failed because clone entry went missing added
Actions #7

Updated by Rishabh Dave over 1 year ago

  • Copied to Backport #66929: quincy: mgr/vol: get_next_job() from asyn_cloner failed because clone entry went missing added
Actions #8

Updated by Rishabh Dave over 1 year ago

  • Tags (freeform) set to backport_processed
Actions #9

Updated by Rishabh Dave over 1 year ago

  • Status changed from Pending Backport to Resolved

All backports have been merged, marking this as resolved.

Actions #10

Updated by Upkeep Bot 9 months ago

  • Merge Commit set to e392142c65c4186e4ef7365acf58001c512b769a
  • Fixed In set to v19.3.0-3396-ge392142c65c
  • Upkeep Timestamp set to 2025-06-26T20:05:21+00:00
Actions #11

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-3396-ge392142c65c to v19.3.0-3396-ge392142c65
  • Upkeep Timestamp changed from 2025-06-26T20:05:21+00:00 to 2025-07-14T16:44:51+00:00
Actions #12

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~2506
  • Upkeep Timestamp changed from 2025-07-14T16:44:51+00:00 to 2025-11-01T01:26:47+00:00
Actions

Also available in: Atom PDF