Bug #66758
closedBug #66714: osd/scheduler: add perfcounter for mclock
src/common: dead loop in PerfCountersCollectionImpl::add() when adding the same logger_ptr to logger_collection
100%
Description
discover from https://github.com/ceph/ceph/pull/58302
**unittest_mclock_scheduler** failed at here, while 1 dead loop: In unittest_mclock_scheduler, - multiple test cases use the same logger ADDR (ptr) - and add it to the logger collection of the same g_ceph_context cct - Caused a dead loop - I finally understood why perfcounter is hardly used in src/test
Solution: - I don't think it's mclock's fault. - I think it's a problem with perfcounter itself (add/remove...) - This PR does not introduce this issue, It should be introduced by e9ecd1b38447 @sseshasa - So I added ignore_perfcounter=true parameter to TEST
https://github.com/ceph/ceph/blob/main/src/common/perf_counters.cc#L41
```
void PerfCountersCollectionImpl::add(PerfCounters *l)
{
// make sure the name is unique
perf_counters_set_t::iterator i;
i = m_loggers.find(l); //ptr1
while (i != m_loggers.end()) { // dead loop here
ostringstream ss;
ss << l->get_name() << "-" << (void*)l;
l->set_name(ss.str());
i = m_loggers.find(l); //ptr1=ptr1, cannot be m_loggers.end()
}
m_loggers.insert(l);
... ...
}
```
https://github.com/ceph/ceph/blob/main/src/common/perf_counters.h#L359
https://github.com/ceph/ceph/blob/main/src/common/perf_counters.h#L312
```
class SortPerfCountersByName {
public:
bool operator()(const PerfCounters* lhs, const PerfCounters* rhs) const {
return (lhs->get_name() < rhs->get_name());
}
};
typedef std::set <PerfCounters*, SortPerfCountersByName> perf_counters_set_t; ///ptr set
//PerfCountersCollectionImpl
perf_counters_set_t m_loggers;
```
``` [----------] 5 tests from mClockSchedulerTest [ RUN ] mClockSchedulerTest.TestEmpty _init_logger zhangjianwei2 mclock-shard-queue-0 _init_logger zhangjianwei2 logger=0x5610ac84e0a0 //same addr (ptr) add zhangjianwei2 111111 add zhangjianwei2 121111 l=0x5610ac84e0a0 //first time add to logger collection add zhangjianwei2 131111 add zhangjianwei2 141111 add zhangjianwei2 151111 _init_logger zhangjianwei2 add logger=0x5610ac84e0a0 [ OK ] mClockSchedulerTest.TestEmpty (0 ms) [ RUN ] mClockSchedulerTest.TestSingleClientOrderedEnqueueDequeue _init_logger zhangjianwei2 mclock-shard-queue-0 _init_logger zhangjianwei2 logger=0x5610ac84e0a0 //same addr (ptr) add zhangjianwei2 111111 //PerfCountersCollectionImpl::add add zhangjianwei2 121111 l=0x5610ac84e0a0 //second time add to logger collection, dead loop here add zhangjianwei2 120111 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0 add zhangjianwei2 120211 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0 add zhangjianwei2 120111 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0-0x5610ac84e0a0 add zhangjianwei2 120211 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0-0x5610ac84e0a0 add zhangjianwei2 120111 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0-0x5610ac84e0a0-0x5610ac84e0a0 add zhangjianwei2 120211 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0-0x5610ac84e0a0-0x5610ac84e0a0 add zhangjianwei2 120111 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0-0x5610ac84e0a0-0x5610ac84e0a0-0x5610ac84e0a0 ```
Updated by Laura Flores over 1 year ago
@zhangjianwei do you plan to work on this? If not, we will assign it to someone else.
(note from bug scrub)
Updated by jianwei zhang over 1 year ago
Laura Flores wrote in #note-1:
@zhangjianwei do you plan to work on this? If not, we will assign it to someone else.
(note from bug scrub)
I'm not working for this issue,
You can assign it to someone else to solve,
Thanks
Updated by Radoslaw Zarzynski over 1 year ago
- Assignee set to MOHIT AGRAWAL
Hi @MOHIT AGRAWAL! Would you mind taking a look?
Updated by Radoslaw Zarzynski over 1 year ago ยท Edited
Hi @MOHIT AGRAWAL! I don't recall whether we already talked about this one offline. Would you mind refreshing my memory?
We did not discuss it but i will check the same and discuss with you.
Updated by MOHIT AGRAWAL over 1 year ago
The issue is resolve after apply the below patch on top of this https://github.com/ceph/ceph/pull/58302
diff --git a/src/dmclock b/src/dmclock
--- a/src/dmclock
+++ b/src/dmclock
@@ -1 +1 @@
-Subproject commit e4ccdcfa828c84b8ea775a928118f2b8012d0f42
+Subproject commit e4ccdcfa828c84b8ea775a928118f2b8012d0f42-dirty
diff --git a/src/osd/scheduler/mClockScheduler.cc b/src/osd/scheduler/mClockScheduler.cc
index 7efa46dacf2..5241cdcd382 100644
--- a/src/osd/scheduler/mClockScheduler.cc
+++ b/src/osd/scheduler/mClockScheduler.cc
@@ -697,6 +697,7 @@ mClockScheduler::~mClockScheduler()
{
cct->_conf.remove_observer(this);
if (logger) {
+ cct->get_perfcounters_collection()->remove(logger);
delete logger;
logger = nullptr;
}
diff --git a/src/test/osd/TestMClockScheduler.cc b/src/test/osd/TestMClockScheduler.cc
index 1499493159b..3f46d6e50ee 100644
--- a/src/test/osd/TestMClockScheduler.cc
+++ b/src/test/osd/TestMClockScheduler.cc
@@ -54,7 +54,7 @@ public:
is_rotational(false),
cutoff_priority(12),
monc(nullptr),
- init_perfcounter(false),
+ init_perfcounter(true),
q(g_ceph_context, whoami, num_shards, shard_id, is_rotational,
cutoff_priority, monc, init_perfcounter),
client1(1001),
Updated by MOHIT AGRAWAL over 1 year ago
- Status changed from New to Duplicate
- Parent task set to #66714
Updated by jianwei zhang over 1 year ago
- % Done changed from 0 to 100