Bug #65770
openqa: failed to be set on mds daemons: {'mds.imported', 'mds.exported'}
0%
Description
This issue has been seen in QA runs for a couple of months but it incorrectly got marked as known issue. https://tracker.ceph.com/issues/54108 was used to mark it known and it can't be that issue since a patch fixing it was merged 2 years ago.
Following are are some instances where this issue occurred -
https://pulpito.ceph.com/pdonnell-2024-04-30_05:04:19-fs-wip-pdonnell-testing-20240429.210911-debug-distro-default-smithi/7680522
https://pulpito.ceph.com/pdonnell-2024-04-30_05:04:19-fs-wip-pdonnell-testing-20240429.210911-debug-distro-default-smithi/7680641
https://pulpito.ceph.com/pdonnell-2024-04-20_23:33:17-fs-wip-pdonnell-testing-20240420.180737-debug-distro-default-smithi/7666020
https://pulpito.ceph.com/pdonnell-2024-04-02_11:52:43-fs-wip-batrick-testing-20240402.004512-distro-default-smithi/7635633
Updated by Kotresh Hiremath Ravishankar almost 2 years ago
- Assignee set to Jos Collin
Updated by Venky Shankar almost 2 years ago
Jos, start by checking if the workload isn't heavy enough to trigger subtree export/import (which would then update the respective perf counters). If that's the case check-counters would trip since it expects there counters to be present in `perf dump`. Also, do the same for other counters that show up in failed runs.
Updated by Venky Shankar almost 2 years ago
Updated by Jos Collin almost 2 years ago
Venky Shankar wrote in #note-2:
Jos, start by checking if the workload isn't heavy enough to trigger subtree export/import (which would then update the respective perf counters). If that's the case check-counters would trip since it expects there counters to be present in `perf dump`. Also, do the same for other counters that show up in failed runs.
I'll check that. check_counter.py and l_mds_imported/l_mds_exported are there since long time. So wondering what made the workload suddenly isn't heavy enough?
Updated by Jos Collin almost 2 years ago · Edited
Updated by Venky Shankar over 1 year ago
Jos Collin wrote in #note-4:
Venky Shankar wrote in #note-2:
Jos, start by checking if the workload isn't heavy enough to trigger subtree export/import (which would then update the respective perf counters). If that's the case check-counters would trip since it expects there counters to be present in `perf dump`. Also, do the same for other counters that show up in failed runs.
I'll check that. check_counter.py and l_mds_imported/l_mds_exported are there since long time. So wondering what made the workload suddenly isn't heavy enough?
Any update on this @Jos Collin ?
Updated by Jos Collin over 1 year ago
Venky Shankar wrote in #note-7:
Jos Collin wrote in #note-4:
Venky Shankar wrote in #note-2:
Jos, start by checking if the workload isn't heavy enough to trigger subtree export/import (which would then update the respective perf counters). If that's the case check-counters would trip since it expects there counters to be present in `perf dump`. Also, do the same for other counters that show up in failed runs.
I'll check that. check_counter.py and l_mds_imported/l_mds_exported are there since long time. So wondering what made the workload suddenly isn't heavy enough?
Any update on this @Jos Collin ?
Not yet. Will continue working on this soon.
Updated by Venky Shankar over 1 year ago
Updated by Venky Shankar over 1 year ago
- Category set to Correctness/Safety
- Target version set to v20.0.0
- Source set to Q/A
- Backport set to quincy,reef,squid
Updated by Jos Collin over 1 year ago
- Status changed from In Progress to Duplicate
This is a duplicate of https://tracker.ceph.com/issues/67360. Closing this as https://tracker.ceph.com/issues/67360 contains better debug info.
Updated by Rishabh Dave about 1 year ago · Edited
Jos Collin wrote in #note-11:
This is a duplicate of https://tracker.ceph.com/issues/67360. Closing this as https://tracker.ceph.com/issues/67360 contains better debug info.
@Jos Collin This ticket was marked duplicate but we never stopped seeing this failure in QA runs even thought the PR for #67360 has been merged. Perhaps this was separate issue.
Updated by Venky Shankar about 1 year ago
- Related to Bug #69665: qa: The following counters failed to be set on mds daemons: {'mds.exported', 'mds.imported'} added
Updated by Venky Shankar about 1 year ago
Updated by Jos Collin about 1 year ago
- Status changed from Duplicate to In Progress
- Pull request ID set to 62247
Updated by Jos Collin about 1 year ago
- Status changed from In Progress to Fix Under Review
Updated by Jos Collin 9 months ago
- Status changed from Fix Under Review to In Progress
- Pull request ID deleted (
62247)
The test never hits Migrator::handle_export_dir and Migrator::export_go_synced, where l_mds_imported and l_mds_exported counters are getting incremented. So they remain 0 and the check_counters failed as they are not `seen`: https://github.com/ceph/ceph/blob/main/qa/tasks/check_counter.py#L141.
Updated by Jos Collin 8 months ago
@Venky Shankar
This is caused by qa/suites/fs/workload/ranks/multi/export-check.yaml.
Only in this yaml, the max_mds is not set.
Updated by Venky Shankar 8 months ago
Jos Collin wrote in #note-18:
@Venky Shankar
This is caused by qa/suites/fs/workload/ranks/multi/export-check.yaml.
Only in this yaml, the max_mds is not set.
qa/tasks/check_counter.py should get the counters from all active MDSs. Have you checked why directories aren't exported to other ranks? Maybe it's related to the balancer configuration (random, etc..), which might be causing the directories to not be exported. Note that the default balancer is disabled and we specifically turn on some balance in QA. E.g.: ranks/multi/{balancer/random
Updated by Jos Collin 8 months ago
Update:
From the logs where mds.imported/mds.exported failed to set, it's run by fs/workload/ranks/multi/balancer/random.yaml in all the failed jobs.
Updated by Venky Shankar 8 months ago
- Has duplicate Bug #70990: qa: The following counters failed to be set on mds daemons: {'mds.imported', 'mds.exported'} added
Updated by Venky Shankar 8 months ago
- Status changed from In Progress to Fix Under Review
- Backport changed from quincy,reef,squid to tentacle,squid,reef
Updated by Jos Collin 8 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Upkeep Bot 8 months ago
- Merge Commit set to ac782011e8d37d848c123b6e2f85a0ea6a10cc27
- Fixed In set to v20.3.0-1773-gac782011e8
- Upkeep Timestamp set to 2025-07-18T13:02:22+00:00
Updated by Upkeep Bot 8 months ago
- Copied to Backport #72184: tentacle: qa: failed to be set on mds daemons: {'mds.imported', 'mds.exported'} added
Updated by Upkeep Bot 8 months ago
- Copied to Backport #72185: squid: qa: failed to be set on mds daemons: {'mds.imported', 'mds.exported'} added
Updated by Upkeep Bot 8 months ago
- Copied to Backport #72186: reef: qa: failed to be set on mds daemons: {'mds.imported', 'mds.exported'} added