Bug #64534
closedqa: test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirror suite
0%
Description
test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirror suite
https://pulpito.ceph.com/jcollin-2024-02-21_01:01:12-fs:mirror-wip-jcollin-testing_20Feb2024_2-distro-default-smithi/
test_cephfs_mirror_cancel_sync succeed when running alone 100 times.
https://pulpito.ceph.com/jcollin-2024-02-21_11:40:01-fs:mirror-wip-jcollin-testing_21Feb2024_2-distro-default-smithi/
This happens probably because one of the previous test was hung in a clean up activity. The previous test executed here is test_cephfs_mirror_cancel_mirroring_and_readd.
Updated by Jos Collin about 2 years ago
- Project changed from Ceph to CephFS
- Category changed from qa to Testing
Updated by Venky Shankar about 2 years ago
Jos Collin wrote:
test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirror suite
https://pulpito.ceph.com/jcollin-2024-02-21_01:01:12-fs:mirror-wip-jcollin-testing_20Feb2024_2-distro-default-smithi/test_cephfs_mirror_cancel_sync succeed when running alone 100 times.
https://pulpito.ceph.com/jcollin-2024-02-21_11:40:01-fs:mirror-wip-jcollin-testing_21Feb2024_2-distro-default-smithi/This happens probably because one of the previous test was hung in a clean up activity. The previous test executed here is test_cephfs_mirror_cancel_mirroring_and_readd.
Do you have details as in why test test was hung?
Updated by Jos Collin about 2 years ago
Venky Shankar wrote:
Jos Collin wrote:
test_cephfs_mirror_cancel_sync fails in a 100 jobs run of fs:mirror suite
https://pulpito.ceph.com/jcollin-2024-02-21_01:01:12-fs:mirror-wip-jcollin-testing_20Feb2024_2-distro-default-smithi/test_cephfs_mirror_cancel_sync succeed when running alone 100 times.
https://pulpito.ceph.com/jcollin-2024-02-21_11:40:01-fs:mirror-wip-jcollin-testing_21Feb2024_2-distro-default-smithi/This happens probably because one of the previous test was hung in a clean up activity. The previous test executed here is test_cephfs_mirror_cancel_mirroring_and_readd.
Do you have details as in why test test was hung?
In the previously executed test: test_cephfs_mirror_cancel_mirroring_and_readd, below failure occurs during after the last 'counter dump'. Then it seems success in the next trial of the command.
55117 2024-02-21T01:42:12.466 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
hive/coverage timeout 120 ceph --cluster ceph fs snapshot mirror disable cephfs
55118 2024-02-21T01:42:12.907 INFO:teuthology.orchestra.run.smithi007.stdout:{}
55119 2024-02-21T01:42:22.930 INFO:teuthology.orchestra.run:Running command with timeout 30
55120 2024-02-21T01:42:22.930 DEBUG:teuthology.orchestra.run.smithi007:mirror status for fs: cephfs> ceph --admin-daemon /var/run/ce\
ph/cephfs-mirror.asok fs mirror status cephfs@14
55121 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:no valid command found; 10 closest matches:
55122 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:0
55123 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:1
55124 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:2
55125 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:abort
55126 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:assert
55127 2024-02-21T01:42:23.046 INFO:teuthology.orchestra.run.smithi007.stderr:config diff
55128 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:config diff get <var>
55129 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:config get <var>
55130 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:config help [<var>]
55131 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:config set <var> <val>...
55132 2024-02-21T01:42:23.047 INFO:teuthology.orchestra.run.smithi007.stderr:admin_socket: invalid command
55133 2024-02-21T01:42:23.048 DEBUG:teuthology.orchestra.run:got remote process result: 22
55134 2024-02-21T01:42:23.048 WARNING:tasks.cephfs.test_mirroring:mirror daemon command with label "mirror status for fs: cephfs" fa\
iled: Command failed (mirror status for fs: cephfs) on smithi007 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mir\
ror.asok fs mirror status cephfs@14'
Then after self.disable_mirroring call, the test starts a series of 'osd dump's and 'fs dump's with the following error, which doesn't seems normal and the test should be hanging in this activity.
55894 2024-02-21T01:42:24.477 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:24.476+0000 7f1da70d9180 -1 mgr[py] Module ios\
tat has missing NOTIFY_TYPES member
55895 2024-02-21T01:42:24.641 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:24.636+0000 7f1da70d9180 -1 mgr[py] Module tel\
emetry has missing NOTIFY_TYPES member
55896 2024-02-21T01:42:24.740 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 1
55897 2024-02-21T01:42:24.758 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
hive/coverage timeout 120 ceph --cluster ceph osd pool get cephfs_metadata pg_num
55898 2024-02-21T01:42:25.047 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 32
55899 2024-02-21T01:42:25.066 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
hive/coverage timeout 120 ceph --cluster ceph osd pool get cephfs_data pg_num
55900 2024-02-21T01:42:25.086 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.084+0000 7f1da70d9180 -1 mgr[py] Module osd\
_support has missing NOTIFY_TYPES member
55901 2024-02-21T01:42:25.140 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.136+0000 7f1da70d9180 -1 mgr[py] Module sel\
ftest has missing NOTIFY_TYPES member
55902 2024-02-21T01:42:25.195 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.192+0000 7f1da70d9180 -1 mgr[py] Module pro\
gress has missing NOTIFY_TYPES member
55903 2024-02-21T01:42:25.275 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.272+0000 7f1da70d9180 -1 mgr[py] Module cra\
sh has missing NOTIFY_TYPES member
55904 2024-02-21T01:42:25.327 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.323+0000 7f1da70d9180 -1 mgr[py] Module tel\
egraf has missing NOTIFY_TYPES member
55905 2024-02-21T01:42:25.358 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 64
55906 2024-02-21T01:42:25.376 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
hive/coverage timeout 120 ceph --cluster ceph osd pool get backup_fs_metadata pg_num
55907 2024-02-21T01:42:25.407 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.403+0000 7f1da70d9180 -1 mgr[py] Module pg_\
autoscaler has missing NOTIFY_TYPES member
55908 2024-02-21T01:42:25.459 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.455+0000 7f1da70d9180 -1 mgr[py] Module zab\
bix has missing NOTIFY_TYPES member
55909 2024-02-21T01:42:25.608 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.603+0000 7f1da70d9180 -1 mgr[py] Module nfs\
has missing NOTIFY_TYPES member
55910 2024-02-21T01:42:25.661 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:25.659+0000 7f1da70d9180 -1 mgr[py] Module bal\
ancer has missing NOTIFY_TYPES member
55949 2024-02-21T01:42:26.940 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
hive/coverage timeout 120 ceph --cluster ceph osd pool get .mgr pg_num
55950 2024-02-21T01:42:26.947 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:26.943+0000 7f1da70d9180 -1 mgr[py] Module sta\
tus has missing NOTIFY_TYPES member
55951 2024-02-21T01:42:27.104 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.099+0000 7f1da70d9180 -1 mgr[py] Module vol\
umes has missing NOTIFY_TYPES member
55952 2024-02-21T01:42:27.182 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.179+0000 7f1da70d9180 -1 mgr[py] Module osd\
_perf_query has missing NOTIFY_TYPES member
55953 2024-02-21T01:42:27.230 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 1
55954 2024-02-21T01:42:27.249 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
hive/coverage timeout 120 ceph --cluster ceph osd pool get cephfs_metadata pg_num
55955 2024-02-21T01:42:27.483 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.479+0000 7f1da70d9180 -1 mgr[py] Module pro\
metheus has missing NOTIFY_TYPES member
55956 2024-02-21T01:42:27.540 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.535+0000 7f1da70d9180 -1 mgr[py] Module sna\
p_schedule has missing NOTIFY_TYPES member
55957 2024-02-21T01:42:27.590 INFO:teuthology.orchestra.run.smithi007.stdout:pg_num: 32
55958 2024-02-21T01:42:27.608 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
hive/coverage timeout 120 ceph --cluster ceph osd pool get cephfs_data pg_num
55959 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\
istering admin socket command: (17) File exists
55960 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\
istering admin socket command: (17) File exists
55961 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\
istering admin socket command: (17) File exists
55962 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\
istering admin socket command: (17) File exists
55963 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d325f1640 -1 client.0 error reg\
istering admin socket command: (17) File exists
55964 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\
istering admin socket command: (17) File exists
55965 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\
istering admin socket command: (17) File exists
55966 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\
istering admin socket command: (17) File exists
55967 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\
istering admin socket command: (17) File exists
55968 2024-02-21T01:42:27.640 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.635+0000 7f1d2f5eb640 -1 client.0 error reg\
istering admin socket command: (17) File exists
55969 2024-02-21T01:42:27.658 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.655+0000 7f1d2f5eb640 -1 client.0 error reg\
istering admin socket command: (17) File exists
55970 2024-02-21T01:42:27.658 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:27.655+0000 7f1d2f5eb640 -1 client.0 error reg\
istering admin socket command: (17) File exists
blocklist ls shows:
55919 2024-02-21T01:42:26.064 DEBUG:teuthology.orchestra.run.smithi007:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/arc\
hive/coverage timeout 120 ceph --cluster ceph osd blocklist ls
55920 2024-02-21T01:42:26.153 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:26.151+0000 7f1da70d9180 -1 mgr[py] Module rbd\
_support has missing NOTIFY_TYPES member
55921 2024-02-21T01:42:26.219 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:26.215+0000 7f1da70d9180 -1 mgr[py] Module ale\
rts has missing NOTIFY_TYPES member
55922 2024-02-21T01:42:26.359 INFO:tasks.ceph.mgr.x.smithi007.stderr:2024-02-21T01:42:26.355+0000 7f1da70d9180 -1 mgr[py] Module tes\
t_orchestrator has missing NOTIFY_TYPES member
55923 2024-02-21T01:42:26.364 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/1325339273 2024-02-21T02:42:15.181435+000\
0
55924 2024-02-21T01:42:26.364 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/574585810 2024-02-22T01:37:57.704079+0000
55925 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/3959480242 2024-02-22T01:37:57.704079+000\
0
55926 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/1985957039 2024-02-22T01:37:57.704079+000\
0
55927 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/2806878632 2024-02-22T01:37:57.704079+000\
0
55928 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/840312218 2024-02-22T01:37:57.704079+0000
55929 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/1395945597 2024-02-22T01:37:57.704079+000\
0
55930 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/2862425469 2024-02-22T01:37:57.704079+000\
0
55931 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:0/4005849246 2024-02-22T01:37:57.704079+000\
0
55932 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:6825/841721431 2024-02-22T01:37:57.704079+0\
000
55933 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stdout:172.21.15.7:6824/841721431 2024-02-22T01:37:57.704079+0\
000
55934 2024-02-21T01:42:26.365 INFO:teuthology.orchestra.run.smithi007.stderr:listed 11 entries
Updated by Jos Collin over 1 year ago
- Status changed from New to In Progress
As this doesn't happen always (only in 100 job runs) and the logs seems similar, let's reproduce this when the current cephfs-mirror fixes are merged.
Updated by Milind Changire about 1 year ago
2025-01-09T14:11:21.595 INFO:tasks.cephfs_test_runner:test_cephfs_mirror_cancel_sync (tasks.cephfs.test_mirroring.TestMirroring) ... FAIL
2025-01-09T14:11:21.595 INFO:tasks.cephfs_test_runner:
2025-01-09T14:11:21.595 INFO:tasks.cephfs_test_runner:======================================================================
2025-01-09T14:11:21.595 INFO:tasks.cephfs_test_runner:FAIL: test_cephfs_mirror_cancel_sync (tasks.cephfs.test_mirroring.TestMirroring)
2025-01-09T14:11:21.596 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2025-01-09T14:11:21.596 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2025-01-09T14:11:21.596 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_5c75cd1fe96d802206a9e78c50b744b9025e749d/qa/tasks/ceph>
2025-01-09T14:11:21.596 INFO:tasks.cephfs_test_runner: self.assertTrue('snap0' not in snap_list)
2025-01-09T14:11:21.596 INFO:tasks.cephfs_test_runner:AssertionError: False is not true
2025-01-09T14:11:21.596 INFO:tasks.cephfs_test_runner:
2025-01-09T14:11:21.596 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2025-01-09T14:11:21.596 INFO:tasks.cephfs_test_runner:Ran 8 tests in 1986.652s
2025-01-09T14:11:21.596 INFO:tasks.cephfs_test_runner:
2025-01-09T14:11:21.596 INFO:tasks.cephfs_test_runner:FAILED (failures=1)
2025-01-09T14:11:21.596 INFO:tasks.cephfs_test_runner:
Updated by Jos Collin 8 months ago ยท Edited
- Status changed from In Progress to Resolved
- Pull request ID set to 62250
This sync issue should be resolved by the latest changes in do_synchronize: https://github.com/ceph/ceph/pull/62250 and it's not observed in a 100 times run of fs:mirror suite: https://pulpito.ceph.com/jcollin-2025-07-28_05:34:22-fs:mirror-wip-B64534-check-distro-default-smithi/. The failed jobs are unrelated and tracked separately here https://tracker.ceph.com/issues/68187. So closing this for now.
Updated by Upkeep Bot 8 months ago
- Merge Commit set to 53e32489fb72b7e12c93b7c413dee198cc1fbae5
- Fixed In set to v20.3.0-302-g53e32489fb
- Upkeep Timestamp set to 2025-07-28T09:43:09+00:00