Project

General

Profile

Actions

Bug #75070

open

libcephfs/cephfs-mirror: fsync hang

Added by Kotresh Hiremath Ravishankar 28 days ago. Updated 19 days ago.

Status:
New
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Development
Backport:
tentacle
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, cephfs-mirror, libcephfs
Labels (FS):
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

While testing multithreaded cephfs-mirroring - https://github.com/ceph/ceph/pull/66572
Previously the cephfs-mirroring was doing fsync on each fd after the file sync.
All the data sync threads observed a hang at fsync as below. The tracker is raised much
after, hence don't have complete stack trace.

-----
Thread 2 (Thread 0xffff644cc400 (LWP 74020) "d_replayer-0"):
0  0x0000ffff8e82656c in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6
1  0x0000ffff8e828ff0 [PAC] in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libc.so.6
2  0x0000ffff8fc90fd4 [PAC] in ceph::condition_variable_debug::wait ...
3  0x0000ffff9080fc9c in ceph::condition_variable_debug::wait<Client::wait_on_context_list ...
4  Client::wait_on_context_list ... at /lsandbox/upstream/ceph/src/client/Client.cc:4540
5  0x0000ffff9083fae8 in Client::_fsync ... at /lsandbox/upstream/ceph/src/client/Client.cc:13299
6  0x0000ffff90840278 in Client::_fsync ...
7  0x0000ffff90840514 in Client::fsync ... at /lsandbox/upstream/ceph/src/client/Client.cc:13042
8  0x0000ffff907f06e0 in ceph_fsync ... at /lsandbox/upstream/ceph/src/libcephfs.cc:316
9  0x0000aaaaad5b2f88 in cephfs::mirror::PeerReplayer::copy_to_remote ...
----

Though this is not root caused and fixed. For Cephfs-mirroring, it makes sense to do sync fs once all the files are synced in the snapshot
before taking snapshot on remote. So cephfs-mirroring is no longer doing fsync on each fd. Hence that approach is chosen and this
hang is no longer seen with multi-threaded cephfs-mirroring PR.
But the hang shouldn't happen when multiple fsync are issued on different fds using same libcephfs connection.

Actions #1

Updated by Venky Shankar 19 days ago

  • Category set to Correctness/Safety
  • Assignee set to Kotresh Hiremath Ravishankar
  • Target version set to v21.0.0
  • Source set to Development
  • Backport set to tentacle
  • Component(FS) Client added

@Kotresh Hiremath Ravishankar - please take this one.

Actions

Also available in: Atom PDF