Project

General

Profile

Actions

Bug #51835

closed

mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)

Added by Neha Ojha over 4 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

100%

Pull request ID:
Tags (freeform):
Fixed In:
v17.0.0-13048-g6d95d4383b
Released In:
v18.2.0~2004
Upkeep Timestamp:
2025-07-14T17:44:11+00:00

Description

{
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "archived": "2021-04-28 11:23:38.548252",
    "crash_id": "2021-04-28T10:56:41.541431Z_ba9764d3-8860-427d-bc94-7f8333db7a3f",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12b20) [0x7fc69e5f6b20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7fc69fa0b52d]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x2766f6) [0x7fc69fa0b6f6]",
        "(DaemonServer::got_service_map()+0xb2d) [0x55d30d90886d]",
        "(Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0x1b6) [0x55d30d9374a6]",
        "(Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x894) [0x55d30d93a0e4]",
        "(MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xa5) [0x55d30d943695]",
        "(DispatchQueue::entry()+0x126a) [0x7fc69fc463fa]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7fc69fcf60f1]",
        "/lib64/libpthread.so.0(+0x814a) [0x7fc69e5ec14a]",
        "clone()" 
    ],
    "stack_sig": "3ec3e614ab29042467cd5a3774212f599db755986c38bc99d273a7070238f8dc",
    "timestamp": "2021-04-28T10:56:41.541431Z",
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.1/rpm/el8/BUILD/ceph-16.2.1/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7fc6961e4700 time 2021-04-28T10:56:41.536299+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.1/rpm/el8/BUILD/ceph-16.2.1/src/mgr/DaemonServer.cc: 2925: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n",
    "os_version": "8",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.1/rpm/el8/BUILD/ceph-16.2.1/src/mgr/DaemonServer.cc",
    "assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>",
    "assert_line": 2925,
    "entity_name": "mgr.05334b5141c222302e04d9cc04d44e194e47a598",
    "ceph_version": "16.2.1",
    "process_name": "ceph-mgr",
    "os_version_id": "8",
    "utsname_machine": "x86_64",
    "utsname_release": "4.19.0-14-amd64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Debian 4.19.171-2 (2021-01-30)",
    "assert_condition": "pending_service_map.epoch > service_map.epoch",
    "assert_thread_name": "ms_dispatch" 
}

Related issues 3 (0 open3 closed)

Related to mgr - Bug #48022: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)Resolved

Actions
Copied to mgr - Backport #56053: pacific: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)ResolvedMykola GolubActions
Copied to mgr - Backport #56096: quincy: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)ResolvedMykola GolubActions
Actions #1

Updated by Neha Ojha over 4 years ago

  • Related to Bug #48022: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added
Actions #2

Updated by Neha Ojha over 4 years ago

This happened in a version which has the fix for https://tracker.ceph.com/issues/48022.

Actions #3

Updated by Sage Weil over 4 years ago

  • Related to Bug #51929: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions #4

Updated by Sage Weil over 4 years ago

  • Has duplicate Bug #51916: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions #5

Updated by Sage Weil over 4 years ago

  • Related to Bug #51926: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions #6

Updated by Sage Weil over 4 years ago

  • Has duplicate Bug #51922: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions #7

Updated by Sage Weil over 4 years ago

  • Related to deleted (Bug #51926: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch))
Actions #8

Updated by Sage Weil over 4 years ago

  • Has duplicate Bug #51926: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions #9

Updated by Sage Weil over 4 years ago

  • Has duplicate Bug #51913: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions #10

Updated by Sage Weil over 4 years ago

  • Related to deleted (Bug #51929: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch))
Actions #11

Updated by Sage Weil over 4 years ago

  • Has duplicate Bug #51929: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions #12

Updated by Sage Weil over 4 years ago

  • Has duplicate deleted (Bug #51916: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch))
Actions #13

Updated by Sage Weil over 4 years ago

  • Has duplicate deleted (Bug #51922: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch))
Actions #14

Updated by Sage Weil over 4 years ago

  • Has duplicate deleted (Bug #51926: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch))
Actions #15

Updated by Sage Weil over 4 years ago

  • Has duplicate deleted (Bug #51929: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch))
Actions #16

Updated by Sage Weil over 4 years ago

  • Has duplicate deleted (Bug #51913: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch))
Actions #17

Updated by Sage Weil over 4 years ago

  • Status changed from New to Can't reproduce

My theory is that this affected the mgr daemon during the upgrade process, while the mon was still running octopus. (The fix for the original bug included both a mgr fix and a mon patch.) Let's see if this pops up again (and, if so, we can dig deeper into the raw telemetry to see if it coincides with an upgrade.)

Actions #19

Updated by Neha Ojha over 4 years ago

  • Status changed from Can't reproduce to New
[ceph: root@magna031 /]# ceph crash info 2021-11-09T16:58:47.494357Z_13443875-1308-4c2a-8be8-9d0bfad08681
{
    "assert_condition": "pending_service_map.epoch > service_map.epoch",
    "assert_file": "/builddir/build/BUILD/ceph-16.2.6/src/mgr/DaemonServer.cc",
    "assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>",
    "assert_line": 2934,
    "assert_msg": "/builddir/build/BUILD/ceph-16.2.6/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7f438e385700 time 2021-11-09T16:58:47.491291+0000\n/builddir/build/BUILD/ceph-16.2.6/src/mgr/DaemonServer.cc: 2934: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n",
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12b20) [0x7f43967a1b20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f4397970def]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x276fb8) [0x7f4397970fb8]",
        "(DaemonServer::got_service_map()+0xb2d) [0x55ceda2060ed]",
        "(Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0x1b6) [0x55ceda234e36]",
        "(Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x894) [0x55ceda237b04]",
        "(MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xb0) [0x55ceda2416f0]",
        "(DispatchQueue::entry()+0x126a) [0x7f4397bae66a]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f4397c5ef81]",
        "/lib64/libpthread.so.0(+0x814a) [0x7f439679714a]",
        "clone()" 
    ],
    "ceph_version": "16.2.6-20.el8cp",
    "crash_id": "2021-11-09T16:58:47.494357Z_13443875-1308-4c2a-8be8-9d0bfad08681",
    "entity_name": "mgr.magna006.vxieja",
    "os_id": "rhel",
    "os_name": "Red Hat Enterprise Linux",
    "os_version": "8.4 (Ootpa)",
    "os_version_id": "8.4",
    "process_name": "ceph-mgr",
    "stack_sig": "cd47423882a28f890d30b420551e2a67e76e9ce5432da2dcbce2e69e2213a2bb",
    "timestamp": "2021-11-09T16:58:47.494357Z",
    "utsname_hostname": "magna006",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-305.el8.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Thu Apr 29 08:54:30 EDT 2021" 
}

This happened in 16.2.6 based on https://bugzilla.redhat.com/show_bug.cgi?id=1984881#c13

Actions #20

Updated by Telemetry Bot about 4 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v16.1.0, v16.2.0, v16.2.1, v16.2.2, v16.2.4, v16.2.5, v16.2.6, v16.2.7 added

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=172fd5df40a73f4369e7240613305d57096a0a8bfae90c9e74170eff9b7065d8

Assert condition: pending_service_map.epoch > service_map.epoch
Assert function: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>

Sanitized backtrace:

    /lib64/libpthread.so.0(
    /usr/lib64/ceph/libceph-common.so.2(
    DaemonServer::got_service_map()
    Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)
    Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)
    MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)
    DispatchQueue::entry()
    DispatchQueue::DispatchThread::entry()
    /lib64/libpthread.so.0(
    clone()

Crash dump sample:
{
    "archived": "2022-03-05 08:06:21.796494",
    "assert_condition": "pending_service_map.epoch > service_map.epoch",
    "assert_file": "mgr/DaemonServer.cc",
    "assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>",
    "assert_line": 2934,
    "assert_msg": "mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7ff3f1a21700 time 2022-03-05T14:41:03.715505+0700\nmgr/DaemonServer.cc: 2934: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)",
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12c20) [0x7ff3f9429c20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7ff3fa5fcba3]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x276d6c) [0x7ff3fa5fcd6c]",
        "(DaemonServer::got_service_map()+0xb2d) [0x557973e23fdd]",
        "(Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0x1b6) [0x557973e52cb6]",
        "(Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x894) [0x557973e55984]",
        "(MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xa5) [0x557973e5f925]",
        "(DispatchQueue::entry()+0x126a) [0x7ff3fa840aba]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3fa8f25d1]",
        "/lib64/libpthread.so.0(+0x817a) [0x7ff3f941f17a]",
        "clone()" 
    ],
    "ceph_version": "16.2.7",
    "crash_id": "2022-03-05T07:41:03.717812Z_78da9a2c-57c6-464d-a13e-a93423b91c58",
    "entity_name": "mgr.af8f19c9cd2b637d2255a58d4c9f3c0965d78c0d",
    "os_id": "centos",
    "os_name": "CentOS Stream",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "0998b43e2acf7885c0d520cb04bcac94422785326e2c3e613066c40ddbb222d1",
    "timestamp": "2022-03-05T07:41:03.717812Z",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-348.2.1.el8_5.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Tue Nov 16 14:42:35 UTC 2021" 
}

Actions #21

Updated by Telemetry Bot about 4 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
Actions #22

Updated by Telemetry Bot about 4 years ago

  • Crash signature (v1) updated (diff)
  • Affected Versions v15.2.10, v15.2.11, v15.2.12, v15.2.13, v15.2.14, v15.2.15, v15.2.7, v15.2.8, v15.2.9 added
Actions #23

Updated by Aishwarya Mathuria almost 4 years ago

  • Crash signature (v1) updated (diff)

/a/yuriw-2022-04-06_16:35:43-rados-wip-yuri5-testing-2022-04-05-1720-distro-default-smithi/6780002

2022-04-06T21:15:42.882 DEBUG:teuthology.orchestra.run.smithi057:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mgr dump --format=json-pretty
2022-04-06T21:15:42.889 INFO:tasks.ceph.mgr.y.smithi107.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11491-g37ef971d/rpm/el8/BUILD/ceph-17.0.0-11491-g37ef971d/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7fbc3e634700 time 2022-04-06T21:15:39.735575+0000
2022-04-06T21:15:42.890 INFO:tasks.ceph.mgr.y.smithi107.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11491-g37ef971d/rpm/el8/BUILD/ceph-17.0.0-11491-g37ef971d/src/mgr/DaemonServer.cc: 2992: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)
2022-04-06T21:15:42.890 INFO:tasks.ceph.mgr.y.smithi107.stderr: ceph version 17.0.0-11491-g37ef971d (37ef971db5d69256a78734330cbd85e2b14fd088) quincy (dev)
2022-04-06T21:15:42.890 INFO:tasks.ceph.mgr.y.smithi107.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7fbc47e14604]
2022-04-06T21:15:42.890 INFO:tasks.ceph.mgr.y.smithi107.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x284825) [0x7fbc47e14825]
2022-04-06T21:15:42.891 INFO:tasks.ceph.mgr.y.smithi107.stderr: 3: (DaemonServer::got_service_map()+0xb2d) [0x55e9fd789b0d]
2022-04-06T21:15:42.891 INFO:tasks.ceph.mgr.y.smithi107.stderr: 4: (Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0xee) [0x55e9fd7bb83e]
2022-04-06T21:15:42.891 INFO:tasks.ceph.mgr.y.smithi107.stderr: 5: (Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x8c4) [0x55e9fd7be744]
2022-04-06T21:15:42.891 INFO:tasks.ceph.mgr.y.smithi107.stderr: 6: (MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xae) [0x55e9fd7c99de]
2022-04-06T21:15:42.891 INFO:tasks.ceph.mgr.y.smithi107.stderr: 7: (DispatchQueue::entry()+0x14fa) [0x7fbc4809c1fa]
2022-04-06T21:15:42.892 INFO:tasks.ceph.mgr.y.smithi107.stderr: 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fbc48153eb1]
2022-04-06T21:15:42.892 INFO:tasks.ceph.mgr.y.smithi107.stderr: 9: /lib64/libpthread.so.0(+0x81cf) [0x7fbc46c291cf]
2022-04-06T21:15:42.892 INFO:tasks.ceph.mgr.y.smithi107.stderr: 10: clone()
2022-04-06T21:15:42.892 INFO:tasks.ceph.mgr.y.smithi107.stderr:*** Caught signal (Aborted) **
2022-04-06T21:15:42.893 INFO:tasks.ceph.mgr.y.smithi107.stderr: in thread 7fbc3e634700 thread_name:ms_dispatch
2022-04-06T21:15:42.893 INFO:tasks.ceph.mgr.y.smithi107.stderr:2022-04-06T21:15:39.734+0000 7fbc3e634700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11491-g37ef971d/rpm/el8/BUILD/ceph-17.0.0-11491-g37ef971d/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7fbc3e634700 time 2022-04-06T21:15:39.735575+0000
2022-04-06T21:15:42.893 INFO:tasks.ceph.mgr.y.smithi107.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11491-g37ef971d/rpm/el8/BUILD/ceph-17.0.0-11491-g37ef971d/src/mgr/DaemonServer.cc: 2992: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)
Actions #24

Updated by Neha Ojha almost 4 years ago

  • Assignee set to Mykola Golub
  • Priority changed from Normal to High

Hi Mykola, looks like this issue has not been resolved yet. We saw it during the latest LRC upgrade to quincy;

root@reesi001:~# ceph crash info 2022-04-18T17:29:40.498001Z_2b39c813-cae1-43d2-9185-5b7149f48e8d
{
    "assert_condition": "pending_service_map.epoch > service_map.epoch",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.1.0-206-g4fb951d2/rpm/el8/BUILD/ceph-17.1.0-206-g4fb951d2/src/mgr/DaemonServer.cc",
    "assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>",
    "assert_line": 2946,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.1.0-206-g4fb951d2/rpm/el8/BUILD/ceph-17.1.0-206-g4fb951d2/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7f1193216700 time 2022-04-18T17:29:40.493572+0000\n/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.1.0-206-g4fb951d2/rpm/el8/BUILD/ceph-17.1.0-206-g4fb951d2/src/mgr/DaemonServer.cc: 2946: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n",
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12ce0) [0x7f119ba28ce0]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x7f119cc08082]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x283245) [0x7f119cc08245]",
        "(DaemonServer::got_service_map()+0xb2d) [0x561293f6f94d]",
        "(Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0xee) [0x561293fa18de]",
        "(Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x8c4) [0x561293fa47f4]",
        "(MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xae) [0x561293fafc1e]",
        "(DispatchQueue::entry()+0x14fa) [0x7f119ce8e3aa]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f119cf44bd1]",
        "/lib64/libpthread.so.0(+0x81cf) [0x7f119ba1e1cf]",
        "clone()" 
    ],
    "ceph_version": "17.1.0-206-g4fb951d2",
    "crash_id": "2022-04-18T17:29:40.498001Z_2b39c813-cae1-43d2-9185-5b7149f48e8d",
    "entity_name": "mgr.reesi004.tplfrt",
    "os_id": "centos",
    "os_name": "CentOS Stream",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "55ad203b14ba86896fb522b12a64c16ae6acefd0736d21d85ee042e9ff121474",
    "timestamp": "2022-04-18T17:29:40.498001Z",
    "utsname_hostname": "reesi004",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-66-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#74~18.04.2-Ubuntu SMP Fri Feb 5 11:17:31 UTC 2021" 
}

and https://tracker.ceph.com/issues/51835#note-23, which has some useful logs. Could you please take a look? In this case, it appears that active mgr is experiencing the assert, just after becoming active.

2022-04-06T21:15:38.829+0000 7fbc3e634700  1 -- 172.21.15.107:0/38689 <== mon.1 v2:172.21.15.107:3300/0 11 ==== service_map(e18 4 svc) v1 ==== 1052+0+0 (secure 0 0 0) 0x55ea06cb5b00 con 0x55ea06b7a800
2022-04-06T21:15:38.829+0000 7fbc3e634700 10 mgr ms_dispatch2 active (starting) service_map(e18 4 svc) v1
2022-04-06T21:15:38.829+0000 7fbc3e634700 10 mgr ms_dispatch2 service_map(e18 4 svc) v1
2022-04-06T21:15:38.829+0000 7fbbf6cf0700  5 AuthRegistry(0x55ea069b7340) adding auth protocol
...
2022-04-06T21:15:39.733+0000 7fbc3e634700  1 -- 172.21.15.107:0/38689 <== mon.1 v2:172.21.15.107:3300/0 30 ==== service_map(e19 4 svc) v1 ==== 2812+0+0 (secure 0 0 0) 0x55ea06a2be00 con 0x55ea06b7a800
2022-04-06T21:15:39.733+0000 7fbc3e634700 10 mgr ms_dispatch2 active service_map(e19 4 svc) v1
2022-04-06T21:15:39.733+0000 7fbc3e634700 10 mgr ms_dispatch2 service_map(e19 4 svc) v1
2022-04-06T21:15:39.733+0000 7fbc3e634700 10 mgr handle_service_map e19
2022-04-06T21:15:39.733+0000 7fbc3e634700 10 mgr.server operator() got updated map e19
2022-04-06T21:15:39.734+0000 7fbc3e634700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11491-g37ef971d/rpm/el8/BUILD/ceph-17.0.0-11491-g37ef971d/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7fbc3e634700 time 2022-04-06T21:15:39.735575+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11491-g37ef971d/rpm/el8/BUILD/ceph-17.0.0-11491-g37ef971d/src/mgr/DaemonServer.cc: 2992: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)

 ceph version 17.0.0-11491-g37ef971d (37ef971db5d69256a78734330cbd85e2b14fd088) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7fbc47e14604]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x284825) [0x7fbc47e14825]
 3: (DaemonServer::got_service_map()+0xb2d) [0x55e9fd789b0d]
 4: (Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0xee) [0x55e9fd7bb83e]
 5: (Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x8c4) [0x55e9fd7be744]
 6: (MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xae) [0x55e9fd7c99de]
 7: (DispatchQueue::entry()+0x14fa) [0x7fbc4809c1fa]
 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fbc48153eb1]
 9: /lib64/libpthread.so.0(+0x81cf) [0x7fbc46c291cf]
 10: clone()
Actions #25

Updated by Laura Flores almost 4 years ago

/a/yuriw-2022-04-18_21:23:05-rados-wip-yuri2-testing-2022-04-18-1150-distro-default-smithi/6795328
Description: rados/dashboard/{0-single-container-host debug/mgr mon_election/classic random-objectstore$/{bluestore-comp-lz4} tasks/dashboard}
Test failure: test_access_permissions (tasks.mgr.dashboard.test_cephfs.CephfsTest)

/a/yuriw-2022-04-18_21:23:05-rados-wip-yuri2-testing-2022-04-18-1150-distro-default-smithi/6795328/remote/smithi185/crash/posted/2022-04-18T22:05:55.427559Z_9db23c98-7234-4044-9e8c-6a51e12c5164/log

/BUILD/ceph-17.0.0-11670-g14f114d5/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7f0117ccd700 time 2022-04-18T22:05:55.425868+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11670-g14f114d5/rpm/el8/BUILD/ceph-17.0.0-11670-g14f114d5/src/mgr/DaemonServer.cc: 2992: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)

 ceph version 17.0.0-11670-g14f114d5 (14f114d5aa5304e8bd79f8addd90f60680cfce27) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f01214ad604]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x284825) [0x7f01214ad825]
 3: (DaemonServer::got_service_map()+0xb2d) [0x563cec74db3d]
 4: (Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0xee) [0x563cec77f86e]
 5: (Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x8c4) [0x563cec782774]
 6: (MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xae) [0x563cec78da0e]
 7: (DispatchQueue::entry()+0x14fa) [0x7f01217351ea]
 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f01217ecea1]
 9: /lib64/libpthread.so.0(+0x81cf) [0x7f01202c21cf]
 10: clone()

This teuthology job has some useful logs as well. Pasted below are relevant Tracebacks:

/a/yuriw-2022-04-18_21:23:05-rados-wip-yuri2-testing-2022-04-18-1150-distro-default-smithi/6795328/teuthology.log

2022-04-18T22:06:05.178 ERROR:teuthology.orchestra.daemon.state:Error while waiting for process to exit
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_f0db28781c751636a2f4758c956e38df311ffefc/teuthology/orchestra/daemon/state.py", line 139, in stop
    run.wait([self.proc], timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_f0db28781c751636a2f4758c956e38df311ffefc/teuthology/orchestra/run.py", line 479, in wait
    proc.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_f0db28781c751636a2f4758c956e38df311ffefc/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_f0db28781c751636a2f4758c956e38df311ffefc/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi185 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mgr -f --cluster ceph -i y'

...

2022-04-18T22:11:56.425 INFO:tasks.cephfs_test_runner:ERROR: test_access_permissions (tasks.mgr.dashboard.test_cephfs.CephfsTest)
2022-04-18T22:11:56.425 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2022-04-18T22:11:56.425 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2022-04-18T22:11:56.425 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_14f114d5aa5304e8bd79f8addd90f60680cfce27/qa/tasks/mgr/dashboard/helper.py", line 271, in setUp
2022-04-18T22:11:56.425 INFO:tasks.cephfs_test_runner:    self.wait_for_health_clear(self.TIMEOUT_HEALTH_CLEAR)
2022-04-18T22:11:56.425 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_14f114d5aa5304e8bd79f8addd90f60680cfce27/qa/tasks/ceph_test_case.py", line 175, in wait_for_health_clear
2022-04-18T22:11:56.426 INFO:tasks.cephfs_test_runner:    self.wait_until_true(is_clear, timeout)
2022-04-18T22:11:56.426 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_14f114d5aa5304e8bd79f8addd90f60680cfce27/qa/tasks/ceph_test_case.py", line 212, in wait_until_true
2022-04-18T22:11:56.426 INFO:tasks.cephfs_test_runner:    raise TestTimeoutError("Timed out after {0}s and {1} retries".format(elapsed, retry_count))
2022-04-18T22:11:56.426 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 60s and 0 retries

/a/yuriw-2022-04-18_21:23:05-rados-wip-yuri2-testing-2022-04-18-1150-distro-default-smithi/6795328/remote/smithi073/log/ceph-mgr.x.log.gz

2022-04-18T22:07:27.997+0000 7f474c572700  0 [dashboard DEBUG auth] checking authorization...
2022-04-18T22:07:27.997+0000 7f474c572700  0 [dashboard ERROR exception] Internal Server Error
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 238, in get_client_version
    cherrypy.request.headers['Accept'])
  File "/usr/share/ceph/mgr/dashboard/controllers/_version.py", line 41, in from_mime_type
    return cls.from_string(cls.__MIME_TYPE_REGEX.match(mime_type).group(1))
AttributeError: 'NoneType' object has no attribute 'group'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler
    return handler(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 260, in inner
    client_version = BaseController.get_client_version()
  File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 241, in get_client_version
    415, "Unable to find version in request header")
cherrypy._cperror.HTTPError: (415, 'Unable to find version in request header')

Actions #26

Updated by Mykola Golub almost 4 years ago

  • Status changed from New to In Progress

Looking at one example [1].

The current mgr implementation, when processing a service map it received from a mon in DaemonServer::got_service_map [1], assumes two cases:

1) it is an initial service_map the mgr receives from a mon on activation;
2) it is the mgr own pending map it sent to a mon to commit.

The problem is that as in the example [1] when activating the mgr may receive several service_map versions sent by the previous active mgr, and the second map causes the assertion failure.

In the mgr crash log we see:

The mgr activation started:

 -4109> 2022-04-06T21:15:38.823+0000 7fbc3e634700  1 mgr handle_mgr_map Activating!
 -4108> 2022-04-06T21:15:38.824+0000 7fbc3e634700  1 mgr handle_mgr_map I am now activating

Then it receives service_map e18:

 -3789> 2022-04-06T21:15:38.829+0000 7fbc3e634700  1 -- 172.21.15.107:0/38689 --> [v2:172.21.15.107:3300/0,v1:172.21.15.107:6789/0] -- mon_subscribe({osdmap=26}) v3 -- 0x55ea06cf2b60 con 0x55ea06b7a800
 -3788> 2022-04-06T21:15:38.829+0000 7fbc3e634700  1 -- 172.21.15.107:0/38689 <== mon.1 v2:172.21.15.107:3300/0 11 ==== service_map(e18 4 svc) v1 ==== 1052+0+0 (secure 0 0 0) 0x55ea06cb5b00 con 0x55ea06b7a800

And use it as an initial map

 -3736> 2022-04-06T21:15:38.830+0000 7fbc3e634700 10 mgr handle_service_map e18
 -3735> 2022-04-06T21:15:38.830+0000 7fbc3e634700 10 mgr.server operator() got initial map e18

And later it receives service_map e19, which causes the assertion failure:

    -6> 2022-04-06T21:15:39.733+0000 7fbc3e634700  1 -- 172.21.15.107:0/38689 <== mon.1 v2:172.21.15.107:3300/0 30 ==== service_map(e19 4 svc) v1 ==== 2812+0+0 (secure 0 0 0) 0x55ea06a2be00 
con 0x55ea06b7a800
    -5> 2022-04-06T21:15:39.733+0000 7fbc3e634700 10 mgr ms_dispatch2 active service_map(e19 4 svc) v1
    -4> 2022-04-06T21:15:39.733+0000 7fbc3e634700 10 mgr ms_dispatch2 service_map(e19 4 svc) v1
    -3> 2022-04-06T21:15:39.733+0000 7fbc3e634700 10 mgr handle_service_map e19
    -2> 2022-04-06T21:15:39.733+0000 7fbc3e634700 10 mgr.server operator() got updated map e19
    -1> 2022-04-06T21:15:39.734+0000 7fbc3e634700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE
/gigantic/release/17.0.0-11491-g37ef971d/rpm/el8/BUILD/ceph-17.0.0-11491-g37ef971d/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7fbc3e634700 time 2022-04-06T21:15:39.735575+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11491-g37ef971d/rpm/el8/BUILD/ceph-17.0.0-11491-g37ef971d/src/mgr/DaemonServer.cc: 2992: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)

And from the another mgr (mgr.z) log we may see that the maps were generated and sent by it just before it deactivated:

2022-04-06T21:15:36.738+0000 7f74324a8700 10 mgr.server operator() sending service_map e18
...
2022-04-06T21:15:38.738+0000 7f74324a8700 10 mgr.server operator() sending service_map e19
...
2022-04-06T21:15:38.822+0000 7f74605e9700  4 mgr handle_mgr_map received map epoch 20
2022-04-06T21:15:38.822+0000 7f74605e9700  4 mgr handle_mgr_map active in map: 0 active is 4681
2022-04-06T21:15:38.823+0000 7f74605e9700 -1 mgr handle_mgr_map I was active but no longer am

[1] /a/yuriw-2022-04-06_16:35:43-rados-wip-yuri5-testing-2022-04-05-1720-distro-default-smithi/6780002
[2] https://github.com/ceph/ceph/blob/8b7ee35c3a0a758b24cfdee68c1fa666d0a1d408/src/mgr/DaemonServer.cc#L2978

Actions #27

Updated by Mykola Golub almost 4 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 45984
Actions #28

Updated by Laura Flores almost 4 years ago

/a/yuriw-2022-05-27_21:59:17-rados-wip-yuri-testing-2022-05-27-0934-distro-default-smithi/6851266

Actions #29

Updated by Laura Flores almost 4 years ago

  • Backport set to pacific

/a/yuriw-2022-05-31_21:35:41-rados-wip-yuri2-testing-2022-05-31-1300-pacific-distro-default-smithi/6856512

Actions #30

Updated by Mykola Golub almost 4 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #31

Updated by Upkeep Bot almost 4 years ago

  • Copied to Backport #56053: pacific: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added
Actions #32

Updated by Neha Ojha almost 4 years ago

  • Backport changed from pacific to pacific,quincy

Also, need a quincy backport

Actions #33

Updated by Upkeep Bot almost 4 years ago

  • Copied to Backport #56096: quincy: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added
Actions #34

Updated by Vikhyat Umrao almost 4 years ago

We have seen this issue in the Gibba cluster upgrading from 17.2.0 to 17.2.1 RC because quincy backport https://github.com/ceph/ceph/pull/46738 is still not merged in quincy branch and is not part of the RC.

[root@gibba001 f9d4cf6a-edcf-11ec-a96a-3cecef3d8fb8]# ceph crash info 2022-06-17T01:46:09.396216Z_598785fc-c69d-4d7b-b315-3cec6e289181
{
    "assert_condition": "pending_service_map.epoch > service_map.epoch",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.0-436-gda36d2c9/rpm/el8/BUILD/ceph-17.2.0-436-gda36d2c9/src/mgr/DaemonServer.cc",
    "assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>",
    "assert_line": 2946,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.0-436-gda36d2c9/rpm/el8/BUILD/ceph-17.2.0-436-gda36d2c9/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7fbc0ebef700 time 2022-06-17T01:46:09.394730+0000\n/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.0-436-gda36d2c9/rpm/el8/BUILD/ceph-17.2.0-436-gda36d2c9/src/mgr/DaemonServer.cc: 2946: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n",
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12ce0) [0x7fbc17404ce0]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x7fbc185e4c32]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x283df5) [0x7fbc185e4df5]",
        "(DaemonServer::got_service_map()+0xb2d) [0x55eae853ceed]",
        "(Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0xee) [0x55eae856ee7e]",
        "(Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x8c4) [0x55eae8571d94]",
        "(MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xae) [0x55eae857d1be]",
        "(DispatchQueue::entry()+0x14fa) [0x7fbc1886b43a]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7fbc18922581]",
        "/lib64/libpthread.so.0(+0x81ca) [0x7fbc173fa1ca]",
        "clone()" 
    ],
    "ceph_version": "17.2.0-436-gda36d2c9",
    "crash_id": "2022-06-17T01:46:09.396216Z_598785fc-c69d-4d7b-b315-3cec6e289181",
    "entity_name": "mgr.gibba008.tfggyq",
    "os_id": "centos",
    "os_name": "CentOS Stream",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "5d711175d9ef3767a3e8f9de1a229853f45300f71d5d966e94bc9ffa6360673b",
    "timestamp": "2022-06-17T01:46:09.396216Z",
    "utsname_hostname": "gibba008",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-301.1.el8.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Tue Apr 13 16:24:22 UTC 2021" 
}
[root@gibba001 f9d4cf6a-edcf-11ec-a96a-3cecef3d8fb8]# ceph crash info 2022-06-17T02:11:50.434785Z_e7b604fb-aaea-4cf2-87a9-231ee226a9ed
{
    "assert_condition": "pending_service_map.epoch > service_map.epoch",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.0-436-gda36d2c9/rpm/el8/BUILD/ceph-17.2.0-436-gda36d2c9/src/mgr/DaemonServer.cc",
    "assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>",
    "assert_line": 2946,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.0-436-gda36d2c9/rpm/el8/BUILD/ceph-17.2.0-436-gda36d2c9/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7f6a6787c700 time 2022-06-17T02:11:50.433270+0000\n/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.0-436-gda36d2c9/rpm/el8/BUILD/ceph-17.2.0-436-gda36d2c9/src/mgr/DaemonServer.cc: 2946: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n",
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12ce0) [0x7f6a6fe7ece0]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x7f6a7105ec32]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x283df5) [0x7f6a7105edf5]",
        "(DaemonServer::got_service_map()+0xb2d) [0x56046a99deed]",
        "(Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0xee) [0x56046a9cfe7e]",
        "(Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x8c4) [0x56046a9d2d94]",
        "(MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xae) [0x56046a9de1be]",
        "(DispatchQueue::entry()+0x14fa) [0x7f6a712e543a]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f6a7139c581]",
        "/lib64/libpthread.so.0(+0x81ca) [0x7f6a6fe741ca]",
        "clone()" 
    ],
    "ceph_version": "17.2.0-436-gda36d2c9",
    "crash_id": "2022-06-17T02:11:50.434785Z_e7b604fb-aaea-4cf2-87a9-231ee226a9ed",
    "entity_name": "mgr.gibba008.tfggyq",
    "os_id": "centos",
    "os_name": "CentOS Stream",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "5d711175d9ef3767a3e8f9de1a229853f45300f71d5d966e94bc9ffa6360673b",
    "timestamp": "2022-06-17T02:11:50.434785Z",
    "utsname_hostname": "gibba008",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-301.1.el8.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Tue Apr 13 16:24:22 UTC 2021" 
}
[root@gibba001 f9d4cf6a-edcf-11ec-a96a-3cecef3d8fb8]# ceph crash info 2022-06-17T07:34:07.419170Z_1529efe0-3620-4b36-b813-329f1225b04d
{
    "assert_condition": "pending_service_map.epoch > service_map.epoch",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.0-436-gda36d2c9/rpm/el8/BUILD/ceph-17.2.0-436-gda36d2c9/src/mgr/DaemonServer.cc",
    "assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>",
    "assert_line": 2946,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.0-436-gda36d2c9/rpm/el8/BUILD/ceph-17.2.0-436-gda36d2c9/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7f846ae19700 time 2022-06-17T07:34:07.417730+0000\n/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.0-436-gda36d2c9/rpm/el8/BUILD/ceph-17.2.0-436-gda36d2c9/src/mgr/DaemonServer.cc: 2946: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n",
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12ce0) [0x7f847341bce0]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x7f84745fbc32]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x283df5) [0x7f84745fbdf5]",
        "(DaemonServer::got_service_map()+0xb2d) [0x560a2d31aeed]",
        "(Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0xee) [0x560a2d34ce7e]",
        "(Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x8c4) [0x560a2d34fd94]",
        "(MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xae) [0x560a2d35b1be]",
        "(DispatchQueue::entry()+0x14fa) [0x7f847488243a]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f8474939581]",
        "/lib64/libpthread.so.0(+0x81ca) [0x7f84734111ca]",
        "clone()" 
    ],
    "ceph_version": "17.2.0-436-gda36d2c9",
    "crash_id": "2022-06-17T07:34:07.419170Z_1529efe0-3620-4b36-b813-329f1225b04d",
    "entity_name": "mgr.gibba006.enemnj",
    "os_id": "centos",
    "os_name": "CentOS Stream",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "5d711175d9ef3767a3e8f9de1a229853f45300f71d5d966e94bc9ffa6360673b",
    "timestamp": "2022-06-17T07:34:07.419170Z",
    "utsname_hostname": "gibba006",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-301.1.el8.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Tue Apr 13 16:24:22 UTC 2021" 
}

Actions #35

Updated by Telemetry Bot over 3 years ago

  • Crash signature (v1) updated (diff)
  • Affected Versions v15.2.16, v16.2.9, v17.0.0, v17.1.0, v17.2.0 added
Actions #36

Updated by Telemetry Bot over 3 years ago

  • Crash signature (v1) updated (diff)
  • Affected Versions v17.2.1 added
Actions #38

Updated by Konstantin Shalygin over 3 years ago

  • Status changed from Pending Backport to Resolved
  • Crash signature (v1) updated (diff)
Actions #39

Updated by Konstantin Shalygin over 3 years ago

  • % Done changed from 0 to 100
Actions #40

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 6d95d4383b100fcbabdd7ea88de134070926e359
  • Fixed In set to v17.0.0-13048-g6d95d4383b
  • Released In set to v18.2.0~2004
  • Upkeep Timestamp set to 2025-07-14T17:44:11+00:00
Actions

Also available in: Atom PDF