Skip to content

unittest_osdmap aborted during OSDMapTest.BUG_42485#57988

Merged
ljflores merged 1 commit intoceph:mainfrom
mohit84:issue_62934
Jun 17, 2024
Merged

unittest_osdmap aborted during OSDMapTest.BUG_42485#57988
ljflores merged 1 commit intoceph:mainfrom
mohit84:issue_62934

Conversation

@mohit84
Copy link
Contributor

@mohit84 mohit84 commented Jun 12, 2024

The testcase is aborted during the call of clean_upmap_tp thread. The function(clean_pg_upmaps) spawns a number of worker threads to process a PGMapper job. The worker thread fetch a job from the queue and then process the job and call process_finish the job. The process function of PGMapper class destroying the object and as worker thread call _process_finish function it crashes because job pointer has become a dangling pointer.

Solution: To avoid a crash destroy the object in _process_finish
instead of doing in _process.

Fixes: https://tracker.ceph.com/issues/62934

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

The testcase is aborted during the call of clean_upmap_tp
thread. The function(clean_pg_upmaps) spawns a number
of worker threads to process a PGMapper job. The worker
thread fetch a job from the queue and then process the
job and call process_finish the job. The process function
of PGMapper class destroying the object and as worker thread
call _process_finish function it crashes because job pointer
has become a dangling pointer.

Solution: To avoid a crash destroy the object in  _process_finish
          instead of doing in _process.

Fixes: https://tracker.ceph.com/issues/62934
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
@mohit84 mohit84 requested a review from a team as a code owner June 12, 2024 11:58
@github-actions github-actions bot added the core label Jun 12, 2024
Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think the relevant code is:

  template<class T>
  class WorkQueue : public WorkQueue_ {
    // ...
    void _void_process(void *p, TPHandle &handle) override {
      _process(static_cast<T *>(p), handle);
    }
    void _void_process_finish(void *p) override {
      _process_finish(static_cast<T *>(p));
    }
void ThreadPool::worker(WorkThread *wt)
{
  // ...
          wq->_void_process(item, tp_handle);
          ul.lock();
          wq->_void_process_finish(item);
          processing--;

@rzarzynski
Copy link
Contributor

I think this is all about a problem that is exercised by the make check bot, so we don't need a teuthology run.
Let's merge after all the required checks are green.

@ljflores
Copy link
Member

jenkins test api

@ljflores
Copy link
Member

jenkins test make check

@Svelar
Copy link
Member

Svelar commented Jun 13, 2024

jenkins test make check arm64

@Svelar
Copy link
Member

Svelar commented Jun 13, 2024

jenkins test api

@ljflores ljflores merged commit cd090c6 into ceph:main Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants