Bug #75070
openlibcephfs/cephfs-mirror: fsync hang
0%
Description
While testing multithreaded cephfs-mirroring - https://github.com/ceph/ceph/pull/66572
Previously the cephfs-mirroring was doing fsync on each fd after the file sync.
All the data sync threads observed a hang at fsync as below. The tracker is raised much
after, hence don't have complete stack trace.
----- Thread 2 (Thread 0xffff644cc400 (LWP 74020) "d_replayer-0"): 0 0x0000ffff8e82656c in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 1 0x0000ffff8e828ff0 [PAC] in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libc.so.6 2 0x0000ffff8fc90fd4 [PAC] in ceph::condition_variable_debug::wait ... 3 0x0000ffff9080fc9c in ceph::condition_variable_debug::wait<Client::wait_on_context_list ... 4 Client::wait_on_context_list ... at /lsandbox/upstream/ceph/src/client/Client.cc:4540 5 0x0000ffff9083fae8 in Client::_fsync ... at /lsandbox/upstream/ceph/src/client/Client.cc:13299 6 0x0000ffff90840278 in Client::_fsync ... 7 0x0000ffff90840514 in Client::fsync ... at /lsandbox/upstream/ceph/src/client/Client.cc:13042 8 0x0000ffff907f06e0 in ceph_fsync ... at /lsandbox/upstream/ceph/src/libcephfs.cc:316 9 0x0000aaaaad5b2f88 in cephfs::mirror::PeerReplayer::copy_to_remote ... ----
Though this is not root caused and fixed. For Cephfs-mirroring, it makes sense to do sync fs once all the files are synced in the snapshot
before taking snapshot on remote. So cephfs-mirroring is no longer doing fsync on each fd. Hence that approach is chosen and this
hang is no longer seen with multi-threaded cephfs-mirroring PR.
But the hang shouldn't happen when multiple fsync are issued on different fds using same libcephfs connection.
Updated by Venky Shankar 19 days ago
- Category set to Correctness/Safety
- Assignee set to Kotresh Hiremath Ravishankar
- Target version set to v21.0.0
- Source set to Development
- Backport set to tentacle
- Component(FS) Client added
@Kotresh Hiremath Ravishankar - please take this one.