Project

General

Profile

Actions

Bug #65770

open

qa: failed to be set on mds daemons: {'mds.imported', 'mds.exported'}

Added by Rishabh Dave almost 2 years ago. Updated 8 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Q/A
Backport:
tentacle,squid,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa, qa-failure
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v20.3.0-1773-gac782011e8
Released In:
Upkeep Timestamp:
2025-07-18T13:02:22+00:00


Related issues 5 (2 open3 closed)

Related to CephFS - Bug #69665: qa: The following counters failed to be set on mds daemons: {'mds.exported', 'mds.imported'}DuplicateJos Collin

Actions
Has duplicate CephFS - Bug #70990: qa: The following counters failed to be set on mds daemons: {'mds.imported', 'mds.exported'}DuplicateJos Collin

Actions
Copied to CephFS - Backport #72184: tentacle: qa: failed to be set on mds daemons: {'mds.imported', 'mds.exported'}ResolvedJos CollinActions
Copied to CephFS - Backport #72185: squid: qa: failed to be set on mds daemons: {'mds.imported', 'mds.exported'}Fix Under ReviewJos CollinActions
Copied to CephFS - Backport #72186: reef: qa: failed to be set on mds daemons: {'mds.imported', 'mds.exported'}Fix Under ReviewJos CollinActions
Actions #1

Updated by Kotresh Hiremath Ravishankar almost 2 years ago

  • Assignee set to Jos Collin
Actions #2

Updated by Venky Shankar almost 2 years ago

Jos, start by checking if the workload isn't heavy enough to trigger subtree export/import (which would then update the respective perf counters). If that's the case check-counters would trip since it expects there counters to be present in `perf dump`. Also, do the same for other counters that show up in failed runs.

Actions #4

Updated by Jos Collin almost 2 years ago

Venky Shankar wrote in #note-2:

Jos, start by checking if the workload isn't heavy enough to trigger subtree export/import (which would then update the respective perf counters). If that's the case check-counters would trip since it expects there counters to be present in `perf dump`. Also, do the same for other counters that show up in failed runs.

I'll check that. check_counter.py and l_mds_imported/l_mds_exported are there since long time. So wondering what made the workload suddenly isn't heavy enough?

Actions #5

Updated by Jos Collin almost 2 years ago

  • Status changed from New to In Progress
Actions #7

Updated by Venky Shankar over 1 year ago

Jos Collin wrote in #note-4:

Venky Shankar wrote in #note-2:

Jos, start by checking if the workload isn't heavy enough to trigger subtree export/import (which would then update the respective perf counters). If that's the case check-counters would trip since it expects there counters to be present in `perf dump`. Also, do the same for other counters that show up in failed runs.

I'll check that. check_counter.py and l_mds_imported/l_mds_exported are there since long time. So wondering what made the workload suddenly isn't heavy enough?

Any update on this @Jos Collin ?

Actions #8

Updated by Jos Collin over 1 year ago

Venky Shankar wrote in #note-7:

Jos Collin wrote in #note-4:

Venky Shankar wrote in #note-2:

Jos, start by checking if the workload isn't heavy enough to trigger subtree export/import (which would then update the respective perf counters). If that's the case check-counters would trip since it expects there counters to be present in `perf dump`. Also, do the same for other counters that show up in failed runs.

I'll check that. check_counter.py and l_mds_imported/l_mds_exported are there since long time. So wondering what made the workload suddenly isn't heavy enough?

Any update on this @Jos Collin ?

Not yet. Will continue working on this soon.

Actions #10

Updated by Venky Shankar over 1 year ago

  • Category set to Correctness/Safety
  • Target version set to v20.0.0
  • Source set to Q/A
  • Backport set to quincy,reef,squid
Actions #11

Updated by Jos Collin over 1 year ago

  • Status changed from In Progress to Duplicate

This is a duplicate of https://tracker.ceph.com/issues/67360. Closing this as https://tracker.ceph.com/issues/67360 contains better debug info.

Actions #12

Updated by Rishabh Dave about 1 year ago · Edited

Jos Collin wrote in #note-11:

This is a duplicate of https://tracker.ceph.com/issues/67360. Closing this as https://tracker.ceph.com/issues/67360 contains better debug info.

@Jos Collin This ticket was marked duplicate but we never stopped seeing this failure in QA runs even thought the PR for #67360 has been merged. Perhaps this was separate issue.

Actions #13

Updated by Venky Shankar about 1 year ago

  • Related to Bug #69665: qa: The following counters failed to be set on mds daemons: {'mds.exported', 'mds.imported'} added
Actions #15

Updated by Jos Collin about 1 year ago

  • Status changed from Duplicate to In Progress
  • Pull request ID set to 62247
Actions #16

Updated by Jos Collin about 1 year ago

  • Status changed from In Progress to Fix Under Review
Actions #17

Updated by Jos Collin 9 months ago

  • Status changed from Fix Under Review to In Progress
  • Pull request ID deleted (62247)

The test never hits Migrator::handle_export_dir and Migrator::export_go_synced, where l_mds_imported and l_mds_exported counters are getting incremented. So they remain 0 and the check_counters failed as they are not `seen`: https://github.com/ceph/ceph/blob/main/qa/tasks/check_counter.py#L141.

Actions #18

Updated by Jos Collin 8 months ago

@Venky Shankar
This is caused by qa/suites/fs/workload/ranks/multi/export-check.yaml.
Only in this yaml, the max_mds is not set.

Actions #19

Updated by Venky Shankar 8 months ago

Jos Collin wrote in #note-18:

@Venky Shankar
This is caused by qa/suites/fs/workload/ranks/multi/export-check.yaml.
Only in this yaml, the max_mds is not set.

qa/tasks/check_counter.py should get the counters from all active MDSs. Have you checked why directories aren't exported to other ranks? Maybe it's related to the balancer configuration (random, etc..), which might be causing the directories to not be exported. Note that the default balancer is disabled and we specifically turn on some balance in QA. E.g.: ranks/multi/{balancer/random

Actions #20

Updated by Jos Collin 8 months ago

Update:

From the logs where mds.imported/mds.exported failed to set, it's run by fs/workload/ranks/multi/balancer/random.yaml in all the failed jobs.

Actions #21

Updated by Venky Shankar 8 months ago

  • Has duplicate Bug #70990: qa: The following counters failed to be set on mds daemons: {'mds.imported', 'mds.exported'} added
Actions #22

Updated by Jos Collin 8 months ago

  • Pull request ID set to 64549
Actions #23

Updated by Venky Shankar 8 months ago

  • Status changed from In Progress to Fix Under Review
  • Backport changed from quincy,reef,squid to tentacle,squid,reef
Actions #24

Updated by Jos Collin 8 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #25

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to ac782011e8d37d848c123b6e2f85a0ea6a10cc27
  • Fixed In set to v20.3.0-1773-gac782011e8
  • Upkeep Timestamp set to 2025-07-18T13:02:22+00:00
Actions #26

Updated by Upkeep Bot 8 months ago

  • Copied to Backport #72184: tentacle: qa: failed to be set on mds daemons: {'mds.imported', 'mds.exported'} added
Actions #27

Updated by Upkeep Bot 8 months ago

  • Copied to Backport #72185: squid: qa: failed to be set on mds daemons: {'mds.imported', 'mds.exported'} added
Actions #28

Updated by Upkeep Bot 8 months ago

  • Copied to Backport #72186: reef: qa: failed to be set on mds daemons: {'mds.imported', 'mds.exported'} added
Actions #29

Updated by Upkeep Bot 8 months ago

  • Tags (freeform) set to backport_processed
Actions

Also available in: Atom PDF