Project

General

Profile

Actions

Bug #67360

open

The following counters failed to be set on mds daemons: {'mds_server.req_rmsnap_latency.avgcount'}

Added by Venky Shankar over 1 year ago. Updated 5 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
Testing
Target version:
% Done:

0%

Source:
Q/A
Backport:
reef,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
qa, qa-failure
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-6328-g7fdc8a7a25
Released In:
v20.2.0~1538
Upkeep Timestamp:
2025-11-01T01:02:48+00:00

Description

/a/vshankar-2024-07-31_11:42:32-fs-wip-vshankar-testing-20240730.074544-debug-testing-default-smithi/7828326

2024-07-31T19:52:06.272+0000 7f90021f6640  5 mds.beacon.l Sending beacon up:standby seq 244
2024-07-31T19:52:06.272+0000 7f90021f6640  1 -- [v2:172.21.15.150:6832/691845093,v1:172.21.15.150:6833/691845093] --> [v2:172.21.15.150:3300/0,v1:172.21.15.150:6789/0] -- mdsbeacon(24532/l up:standby seq=244 v2) -- 0x561e28f55500 con 0x561e29091180
2024-07-31T19:52:06.272+0000 7f90021f6640 20 mds.beacon.l sender thread waiting interval 4s
2024-07-31T19:52:06.272+0000 7f9007200640  2 --2- [v2:172.21.15.150:6832/691845093,v1:172.21.15.150:6833/691845093] >> [v2:172.21.15.150:3300/0,v1:172.21.15.150:6789/0] conn(0x561e29091180 0x561e29c7e000 secure :-1 s=READY pgs=51 cs=0 l=1 rev1=1 crypto rx=0x561e29cdd530
 tx=0x561e290971a0 comp rx=0 tx=0).write_message sending message m=0x561e28f55500 seq=259 mdsbeacon(24532/l up:standby seq=244 v2)
2024-07-31T19:52:06.273+0000 7f90041fa640  1 -- [v2:172.21.15.150:6832/691845093,v1:172.21.15.150:6833/691845093] <== mon.1 v2:172.21.15.150:3300/0 270 ==== mdsbeacon(24532/l up:standby seq=244 v2) ==== 130+0+0 (secure 0 0 0) 0x561e28f55500 con 0x561e29091180
2024-07-31T19:52:06.273+0000 7f90041fa640  5 mds.beacon.l received beacon reply up:standby seq 244 rtt 0.000999981
2024-07-31T19:52:06.273+0000 7f90021f6640 20 mds.beacon.l sender thread waiting interval 3.999s
2024-07-31T19:52:09.218+0000 7f90019f5640  1 -- [v2:172.21.15.150:6832/691845093,v1:172.21.15.150:6833/691845093] --> [v2:172.21.15.88:6800/1827242782,v1:172.21.15.88:6801/1827242782] -- mgrreport(unknown.l +0-0 packed 6) -- 0x561e29d29c00 con 0x561e29d0cd80
2024-07-31T19:52:09.218+0000 7f90069ff640  2 --2- [v2:172.21.15.150:6832/691845093,v1:172.21.15.150:6833/691845093] >> [v2:172.21.15.88:6800/1827242782,v1:172.21.15.88:6801/1827242782] conn(0x561e29d0cd80 0x561e29c7f600 secure :-1 s=READY pgs=317 cs=0 l=1 rev1=1 crypto
rx=0x561e29cdd6b0 tx=0x561e24ef8720 comp rx=0 tx=0).write_message sending message m=0x561e29d29c00 seq=17 mgrreport(mds.l +0-0 packed 6)
2024-07-31T19:52:10.273+0000 7f90021f6640  5 mds.beacon.l Sending beacon up:standby seq 245

perf dump was executed at 2024-07-31T19:52:07.448

2024-07-31T19:52:07.448 DEBUG:teuthology.orchestra.run.smithi150:> sudo /home/ubuntu/cephtest/cephadm --image quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:ce234b0a267c47e2aea5499156e0aa83fa2b9bb6 shell --fsid de017642-4f72-11ef-bcc9-c7b262605968 -- ceph daemon mds.l perf dump

Since the MDS state is up:standby, certain counters will not be reported in perf dump, thereby causing task/counter to report the following (and there aren't any MDS logs ~19:52:07)

2024-07-31T19:52:09.791 INFO:journalctl@ceph.mon.c.smithi150.stdout:Jul 31 19:52:09 smithi150 ceph-mon[36797]: osdmap e90: 12 total, 12 up, 12 in
2024-07-31T19:52:10.150 INFO:journalctl@ceph.mon.a.smithi088.stdout:Jul 31 19:52:09 smithi088 ceph-mon[32742]: osdmap e90: 12 total, 12 up, 12 in
2024-07-31T19:52:10.163 WARNING:tasks.check_counter:Counter 'mds.exported' not found on daemon mds.l
2024-07-31T19:52:10.163 WARNING:tasks.check_counter:Counter 'mds.imported' not found on daemon mds.l
2024-07-31T19:52:10.163 WARNING:tasks.check_counter:Counter 'mds_cache.dir_update' not found on daemon mds.l
2024-07-31T19:52:10.163 WARNING:tasks.check_counter:Counter 'mds_cache.dir_update_receipt' not found on daemon mds.l
2024-07-31T19:52:10.163 WARNING:tasks.check_counter:Counter 'mds.root_rsnaps' not found on daemon mds.l
2024-07-31T19:52:10.163 WARNING:tasks.check_counter:Counter 'mds_server.req_mksnap_latency.avgcount' not found on daemon mds.l
2024-07-31T19:52:10.163 WARNING:tasks.check_counter:Counter 'mds_server.req_rmsnap_latency.avgcount' not found on daemon mds.l
2024-07-31T19:52:10.163 WARNING:tasks.check_counter:Counter 'mds.dir_split' not found on daemon mds.l
2024-07-31T19:52:10.164 ERROR:teuthology.run_tasks:Manager failed: check-counter
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_53ce1462e129f6eb4071986336534c740fdebd31/teuthology/run_tasks.py", line 154, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/home/teuthworker/src/git.ceph.com_teuthology_53ce1462e129f6eb4071986336534c740fdebd31/teuthology/task/__init__.py", line 132, in __exit__
    self.end()
  File "/home/teuthworker/src/github.com_vshankar_ceph_765d6d11439c6f768c7b822a6dcb95d1138473e3/qa/tasks/check_counter.py", line 125, in end
    raise RuntimeError("The following counters failed to be set " 
RuntimeError: The following counters failed to be set on mds daemons: {'mds_server.req_rmsnap_latency.avgcount'}

Related issues 2 (1 open1 closed)

Copied to CephFS - Backport #69141: reef: The following counters failed to be set on mds daemons: {'mds_server.req_rmsnap_latency.avgcount'}Fix Under ReviewJos CollinActions
Copied to CephFS - Backport #69142: squid: The following counters failed to be set on mds daemons: {'mds_server.req_rmsnap_latency.avgcount'}ResolvedJos CollinActions
Actions #1

Updated by Jos Collin over 1 year ago

Venky Shankar wrote:

Since the MDS state is up:standby, certain counters will not be reported in perf dump, thereby causing task/counter to report the following (and there aren't any MDS logs ~19:52:07)

[...]

Yes, the 'perf dump' doesn't have mds, mds_server, mds_cache sections. Need to check why.

Actions #2

Updated by Milind Changire over 1 year ago

main: https://pulpito.ceph.com/mchangir-2024-08-02_07:51:06-fs-wip-mchangir-uninline-debug-distro-default-smithi/7831925

The following counters failed to be set on mds daemons: {'mds.imported', 'mds.exported'}

Actions #3

Updated by Venky Shankar over 1 year ago

Jos Collin wrote in #note-1:

Venky Shankar wrote:

Since the MDS state is up:standby, certain counters will not be reported in perf dump, thereby causing task/counter to report the following (and there aren't any MDS logs ~19:52:07)

[...]

Yes, the 'perf dump' doesn't have mds, mds_server, mds_cache sections. Need to check why.

Maybe task/counters has a racy check that picks an active mds and then when fetching the perf counters, the MDS respawns as standby-replay. That might be a plausible cause. Please check that.

Actions #4

Updated by Jos Collin over 1 year ago

Venky Shankar wrote in #note-3:

Jos Collin wrote in #note-1:

Venky Shankar wrote:

Since the MDS state is up:standby, certain counters will not be reported in perf dump, thereby causing task/counter to report the following (and there aren't any MDS logs ~19:52:07)

[...]

Yes, the 'perf dump' doesn't have mds, mds_server, mds_cache sections. Need to check why.

Maybe task/counters has a racy check that picks an active mds and then when fetching the perf counters, the MDS respawns as standby-replay. That might be a plausible cause. Please check that.

Looks like it doesn't check for the active mds. It gets the list from https://github.com/ceph/ceph/blob/main/qa/cephfs/clusters/1a11s-mds-1c-client-3node.yaml and then calls 'perf dump' on each mds and continues even if a counter <name> not found on a daemon, until it reaches 'mds.l'. I'll verify this with qa/tasks/check_counter.py.

Actions #5

Updated by Jos Collin over 1 year ago

  • Status changed from New to In Progress
  • Pull request ID set to 59383
Actions #6

Updated by Jos Collin over 1 year ago

  • Status changed from In Progress to Fix Under Review
Actions #7

Updated by Jos Collin over 1 year ago

  • Status changed from Fix Under Review to Pending Backport
Actions #8

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #69141: reef: The following counters failed to be set on mds daemons: {'mds_server.req_rmsnap_latency.avgcount'} added
Actions #9

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #69142: squid: The following counters failed to be set on mds daemons: {'mds_server.req_rmsnap_latency.avgcount'} added
Actions #10

Updated by Upkeep Bot over 1 year ago

  • Tags (freeform) set to backport_processed
Actions #15

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 7fdc8a7a254aa82df6291d8984341f482049e441
  • Fixed In set to v19.3.0-6328-g7fdc8a7a254
  • Upkeep Timestamp set to 2025-07-08T18:45:47+00:00
Actions #16

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-6328-g7fdc8a7a254 to v19.3.0-6328-g7fdc8a7a254a
  • Upkeep Timestamp changed from 2025-07-08T18:45:47+00:00 to 2025-07-14T15:46:04+00:00
Actions #17

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-6328-g7fdc8a7a254a to v19.3.0-6328-g7fdc8a7a25
  • Upkeep Timestamp changed from 2025-07-14T15:46:04+00:00 to 2025-07-14T21:10:17+00:00
Actions #18

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~1538
  • Upkeep Timestamp changed from 2025-07-14T21:10:17+00:00 to 2025-11-01T01:02:48+00:00
Actions

Also available in: Atom PDF