Project

General

Profile

Actions

Bug #66758

closed

Bug #66714: osd/scheduler: add perfcounter for mclock

src/common: dead loop in PerfCountersCollectionImpl::add() when adding the same logger_ptr to logger_collection

Added by jianwei zhang over 1 year ago. Updated over 1 year ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
06/29/2024
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

discover from https://github.com/ceph/ceph/pull/58302

**unittest_mclock_scheduler**  failed at here,  while 1 dead loop:

In unittest_mclock_scheduler, 
- multiple test cases use the same logger ADDR (ptr)
- and add it to the logger collection of the same g_ceph_context cct
- Caused a dead loop
- I finally understood why perfcounter is hardly used in src/test

Solution:
- I don't think it's mclock's fault.
- I think it's a problem with perfcounter itself (add/remove...)
- This PR does not introduce this issue, It should be introduced by e9ecd1b38447  @sseshasa 
- So I added ignore_perfcounter=true parameter to TEST

https://github.com/ceph/ceph/blob/main/src/common/perf_counters.cc#L41

```
void PerfCountersCollectionImpl::add(PerfCounters *l)
{
  // make sure the name is unique
  perf_counters_set_t::iterator i;
  i = m_loggers.find(l);  //ptr1
  while (i != m_loggers.end()) {  // dead loop here
    ostringstream ss;
    ss << l->get_name() << "-" << (void*)l;
    l->set_name(ss.str());
    i = m_loggers.find(l); //ptr1=ptr1,  cannot be m_loggers.end()
  }

  m_loggers.insert(l);
  ... ...
}
```

https://github.com/ceph/ceph/blob/main/src/common/perf_counters.h#L359
https://github.com/ceph/ceph/blob/main/src/common/perf_counters.h#L312

```
class SortPerfCountersByName {
public:
  bool operator()(const PerfCounters* lhs, const PerfCounters* rhs) const {
    return (lhs->get_name() < rhs->get_name());
  }
};

typedef std::set <PerfCounters*, SortPerfCountersByName> perf_counters_set_t; ///ptr set

//PerfCountersCollectionImpl
  perf_counters_set_t m_loggers;
```


```
[----------] 5 tests from mClockSchedulerTest
[ RUN      ] mClockSchedulerTest.TestEmpty
_init_logger zhangjianwei2 mclock-shard-queue-0
_init_logger zhangjianwei2 logger=0x5610ac84e0a0  //same addr (ptr)
add zhangjianwei2 111111
add zhangjianwei2 121111 l=0x5610ac84e0a0          //first time add to logger collection
add zhangjianwei2 131111
add zhangjianwei2 141111
add zhangjianwei2 151111
_init_logger zhangjianwei2 add logger=0x5610ac84e0a0  
[       OK ] mClockSchedulerTest.TestEmpty (0 ms)
[ RUN      ] mClockSchedulerTest.TestSingleClientOrderedEnqueueDequeue
_init_logger zhangjianwei2 mclock-shard-queue-0
_init_logger zhangjianwei2 logger=0x5610ac84e0a0   //same addr (ptr)
add zhangjianwei2 111111                                            //PerfCountersCollectionImpl::add
add zhangjianwei2 121111 l=0x5610ac84e0a0           //second time add to logger collection, dead loop here 
add zhangjianwei2 120111 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0
add zhangjianwei2 120211 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0
add zhangjianwei2 120111 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0-0x5610ac84e0a0
add zhangjianwei2 120211 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0-0x5610ac84e0a0
add zhangjianwei2 120111 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0-0x5610ac84e0a0-0x5610ac84e0a0
add zhangjianwei2 120211 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0-0x5610ac84e0a0-0x5610ac84e0a0
add zhangjianwei2 120111 l=0x5610ac84e0a0 mclock-shard-queue-0-0x5610ac84e0a0-0x5610ac84e0a0-0x5610ac84e0a0-0x5610ac84e0a0
```
Actions #1

Updated by Laura Flores over 1 year ago

@zhangjianwei do you plan to work on this? If not, we will assign it to someone else.
(note from bug scrub)

Actions #2

Updated by jianwei zhang over 1 year ago

Laura Flores wrote in #note-1:

@zhangjianwei do you plan to work on this? If not, we will assign it to someone else.
(note from bug scrub)

I'm not working for this issue,

You can assign it to someone else to solve,

Thanks

Actions #3

Updated by Radoslaw Zarzynski over 1 year ago

  • Assignee set to MOHIT AGRAWAL

Hi @MOHIT AGRAWAL! Would you mind taking a look?

Actions #4

Updated by Radoslaw Zarzynski over 1 year ago ยท Edited

Hi @MOHIT AGRAWAL! I don't recall whether we already talked about this one offline. Would you mind refreshing my memory?
We did not discuss it but i will check the same and discuss with you.

Actions #5

Updated by MOHIT AGRAWAL over 1 year ago

The issue is resolve after apply the below patch on top of this https://github.com/ceph/ceph/pull/58302


diff --git a/src/dmclock b/src/dmclock
--- a/src/dmclock
+++ b/src/dmclock
@@ -1 +1 @@
-Subproject commit e4ccdcfa828c84b8ea775a928118f2b8012d0f42
+Subproject commit e4ccdcfa828c84b8ea775a928118f2b8012d0f42-dirty
diff --git a/src/osd/scheduler/mClockScheduler.cc b/src/osd/scheduler/mClockScheduler.cc
index 7efa46dacf2..5241cdcd382 100644
--- a/src/osd/scheduler/mClockScheduler.cc
+++ b/src/osd/scheduler/mClockScheduler.cc
@@ -697,6 +697,7 @@ mClockScheduler::~mClockScheduler()
 {
   cct->_conf.remove_observer(this);
   if (logger) {
+    cct->get_perfcounters_collection()->remove(logger);
     delete logger;
     logger = nullptr;
   }
diff --git a/src/test/osd/TestMClockScheduler.cc b/src/test/osd/TestMClockScheduler.cc
index 1499493159b..3f46d6e50ee 100644
--- a/src/test/osd/TestMClockScheduler.cc
+++ b/src/test/osd/TestMClockScheduler.cc
@@ -54,7 +54,7 @@ public:
     is_rotational(false),
     cutoff_priority(12),
     monc(nullptr),
-    init_perfcounter(false),
+    init_perfcounter(true),
     q(g_ceph_context, whoami, num_shards, shard_id, is_rotational,
       cutoff_priority, monc, init_perfcounter),
     client1(1001),

Actions #6

Updated by MOHIT AGRAWAL over 1 year ago

  • Status changed from New to Duplicate
  • Parent task set to #66714
Actions #7

Updated by jianwei zhang over 1 year ago

  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF