Project

General

Profile

Actions

Bug #57628

open

osd:PeeringState.cc: FAILED ceph_assert(info.history.same_interval_since != 0)

Added by Laura Flores over 3 years ago. Updated over 2 years ago.

Status:
In Progress
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
rados
Component(RADOS):
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

/a/yuriw-2022-09-09_14:59:25-rados-wip-yuri2-testing-2022-09-06-1007-pacific-distro-default-smithi/7022809

2022-09-09T20:41:28.514 INFO:tasks.ceph.osd.4.smithi134.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10-813-g928e03bd/rpm/el8/BUILD/ceph-16.2.10-813-g928e03bd/src/osd/PeeringState.cc: 649: FAILED ceph_assert(info.history.same_interval_since != 0)

...

2022-09-09T20:41:28.615 INFO:tasks.ceph.osd.4.smithi134.stderr: ceph version 16.2.10-813-g928e03bd (928e03bd0c8ce53d78c1f3dddd6852e2ffd05b7f) pacific (stable)
2022-09-09T20:41:28.615 INFO:tasks.ceph.osd.4.smithi134.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x689824]
2022-09-09T20:41:28.615 INFO:tasks.ceph.osd.4.smithi134.stderr: 2: ceph-osd(+0x581a3e) [0x689a3e]
2022-09-09T20:41:28.616 INFO:tasks.ceph.osd.4.smithi134.stderr: 3: (PeeringState::start_peering_interval(std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> > const&, int, std::vector<int, std::allocator<int> > const&, int, ceph::os::Transaction&)+0x1453) [0xa16fd3]
2022-09-09T20:41:28.616 INFO:tasks.ceph.osd.4.smithi134.stderr: 4: (PeeringState::Reset::react(PeeringState::AdvMap const&)+0x293) [0xa32453]
2022-09-09T20:41:28.616 INFO:tasks.ceph.osd.4.smithi134.stderr: 5: (boost::statechart::simple_state<PeeringState::Reset, PeeringState::PeeringMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xf5) [0xa6eeb5]
2022-09-09T20:41:28.617 INFO:tasks.ceph.osd.4.smithi134.stderr: 6: (boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xa7) [0xa58be7]
2022-09-09T20:41:28.617 INFO:tasks.ceph.osd.4.smithi134.stderr: 7: (PeeringState::advance_map(std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, PeeringCtx&)+0x269) [0xa12ce9]
2022-09-09T20:41:28.617 INFO:tasks.ceph.osd.4.smithi134.stderr: 8: (PG::handle_advance_map(std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, PeeringCtx&)+0x1e6) [0x8491c6]
2022-09-09T20:41:28.617 INFO:tasks.ceph.osd.4.smithi134.stderr: 9: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PeeringCtx&)+0x303) [0x7bc813]
2022-09-09T20:41:28.618 INFO:tasks.ceph.osd.4.smithi134.stderr: 10: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xa4) [0x7be964]
2022-09-09T20:41:28.618 INFO:tasks.ceph.osd.4.smithi134.stderr: 11: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x56) [0x9f5586]
2022-09-09T20:41:28.618 INFO:tasks.ceph.osd.4.smithi134.stderr: 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xc28) [0x7b0908]
2022-09-09T20:41:28.618 INFO:tasks.ceph.osd.4.smithi134.stderr: 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0xe33374]
2022-09-09T20:41:28.619 INFO:tasks.ceph.osd.4.smithi134.stderr: 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0xe36254]
2022-09-09T20:41:28.619 INFO:tasks.ceph.osd.4.smithi134.stderr: 15: /lib64/libpthread.so.0(+0x81ca) [0xc8471ca]
2022-09-09T20:41:28.619 INFO:tasks.ceph.osd.4.smithi134.stderr: 16: clone()


Related issues 4 (1 open3 closed)

Related to RADOS - Bug #39659: FAILED ceph_assert(info.history.same_interval_since != 0)New05/10/2019

Actions
Related to RADOS - Bug #45991: PG merge: FAILED ceph_assert(info.history.same_interval_since != 0)Resolvedxie xingguo

Actions
Related to RADOS - Bug #37654: FAILED ceph_assert(info.history.same_interval_since != 0) in PG::start_peering_interval()Resolvedxie xingguo

Actions
Related to RADOS - Bug #63881: Inaccurate pg splits/merges and pool deletion/creation on OSD mapgapResolvedMatan Breizman

Actions
Actions #1

Updated by Laura Flores over 3 years ago

  • Related to Bug #39659: FAILED ceph_assert(info.history.same_interval_since != 0) added
Actions #2

Updated by Laura Flores over 3 years ago

Might be Tracker #39659, but there aren't any logs anymore, so no way to be sure.

Actions #4

Updated by Laura Flores over 3 years ago

  • Affected Versions v16.2.10 added
Actions #6

Updated by Yaarit Hatuka over 3 years ago

  • Affected Versions v14.0.0, v15.0.0 added

The same issue was reported in telemetry also on version 15.0.0:
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?var-sig_v2=43429c06cd8a3e57052f2bcc913b85f2571f3f66774aa9c8a9027be5b8f0f22a&orgId=1

Three different signatures were created due to differences in the sanitized backtraces, and the assert functions.
The differences between the sanitized backtraces and the assert functions of (15.0.0 and 16.2.7) and 14.1.1 are easy to spot.

The only difference between the sanitized backtraces of 15.0.0 and 16.2.7 is a single frame:

15.0.0:

boost::statechart::simple_state<PeeringState::Reset, PeeringState::PeeringMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)>::react_impl(boost::statechart::event_base const&, void const*)

See dashboard search for this frame:
http://telemetry.front.sepia.ceph.com:4000/d/Nvj6XTaMk/spec-search?orgId=1&var-substr_1=boost::statechart::simple_state%3CPeeringState::Reset,%20PeeringState::PeeringMachine,%20boost::mpl::list%3Cmpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na%3E,%20(boost::statechart::history_mode)%3E::react_impl(boost::statechart::event_base%20const&,%20void%20const*)

16.2.7:

boost::statechart::simple_state<PeeringState::Reset, PeeringState::PeeringMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)>::react_impl(boost::statechart::event_base const&, void const*)

See dashboard search for this frame:
http://telemetry.front.sepia.ceph.com:4000/d/Nvj6XTaMk/spec-search?orgId=1&var-substr_1=boost::statechart::simple_state%3CPeeringState::Reset,%20PeeringState::PeeringMachine,%20boost::mpl::list%3Cmpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na,%20mpl_::na%3E,%20(boost::statechart::history_mode)%3E::react_impl(boost::statechart::event_base%20const&,%20void%20const*)

Updated the affected versions - picked v14.0.0 since v14.1.1 does not exist.

Actions #7

Updated by Neha Ojha over 3 years ago

  • Assignee set to Matan Breizman
Actions #8

Updated by Matan Breizman over 3 years ago

  • Related to Bug #45991: PG merge: FAILED ceph_assert(info.history.same_interval_since != 0) added
Actions #9

Updated by Matan Breizman over 3 years ago

  • Related to Bug #37654: FAILED ceph_assert(info.history.same_interval_since != 0) in PG::start_peering_interval() added
Actions #10

Updated by Laura Flores over 2 years ago

/a/lflores-2023-09-08_20:36:06-rados-wip-lflores-testing-2-2023-09-08-1755-distro-default-smithi/7391621

Actions #11

Updated by Laura Flores over 2 years ago

  • Affected Versions v18.0.0 added
  • Affected Versions deleted (v14.0.0, v15.0.0, v16.2.10)
Actions #12

Updated by Laura Flores over 2 years ago

  • Affected Versions v14.0.0, v14.2.0, v14.2.1, v14.2.10, v14.2.11, v14.2.12, v14.2.13, v14.2.14, v14.2.15, v14.2.16, v14.2.17, v14.2.18, v14.2.19, v14.2.2, v14.2.20, v14.2.21, v14.2.22, v14.2.23, v14.2.3, v14.2.4, v14.2.5, v14.2.6, v14.2.7, v14.2.8, v14.2.9, v15.0.0, v15.2.1, v15.2.10, v15.2.11, v15.2.12, v15.2.13, v15.2.14, v15.2.15, v15.2.16, v15.2.17, v15.2.2, v15.2.3, v15.2.4, v15.2.5, v15.2.6, v15.2.7, v15.2.8, v15.2.9, v16.0.0, v16.0.1, v16.1.0, v16.1.1, v16.2.0, v16.2.1, v16.2.10, v16.2.11, v16.2.12, v16.2.13, v16.2.14, v16.2.15, v16.2.2, v16.2.3, v16.2.4, v16.2.5, v16.2.6, v16.2.7, v16.2.8, v16.2.9, v17.0.0, v17.2.1, v17.2.2, v17.2.3, v17.2.4, v17.2.4, v17.2.5, v17.2.6, v17.2.6, v17.2.7 added
  • ceph-qa-suite ceph-ansible added
Actions #13

Updated by Laura Flores over 2 years ago

  • Affected Versions v18.1.0 added
  • Affected Versions deleted (v14.0.0, v14.2.0, v14.2.1, v14.2.10, v14.2.11, v14.2.12, v14.2.13, v14.2.14, v14.2.15, v14.2.16, v14.2.17, v14.2.18, v14.2.19, v14.2.2, v14.2.20, v14.2.21, v14.2.22, v14.2.23, v14.2.3, v14.2.4, v14.2.5, v14.2.6, v14.2.7, v14.2.8, v14.2.9, v15.0.0, v15.2.1, v15.2.10, v15.2.11, v15.2.12, v15.2.13, v15.2.14, v15.2.15, v15.2.16, v15.2.17, v15.2.2, v15.2.3, v15.2.4, v15.2.5, v15.2.6, v15.2.7, v15.2.8, v15.2.9, v16.0.0, v16.0.1, v16.1.0, v16.1.1, v16.2.0, v16.2.1, v16.2.10, v16.2.11, v16.2.12, v16.2.13, v16.2.14, v16.2.15, v16.2.2, v16.2.3, v16.2.4, v16.2.5, v16.2.6, v16.2.7, v16.2.8, v16.2.9, v17.0.0, v17.2.1, v17.2.2, v17.2.3, v17.2.4, v17.2.4, v17.2.5, v17.2.6, v17.2.6, v17.2.7, v18.0.0)
Actions #14

Updated by Laura Flores over 2 years ago

  • Affected Versions v18.0.0 added
  • Affected Versions deleted (v18.1.0)
  • ceph-qa-suite rados added
  • ceph-qa-suite deleted (ceph-ansible)

If anyone knows how to properly select multiple affected versions, please go ahead.

v18.0.0, v14.0.0, v15.0.0, and v16.2.10 are all affected versions.

Actions #15

Updated by Yaarit Hatuka over 2 years ago

  • Affected Versions v14.0.0, v15.0.0, v16.2.10 added

selected by holding the ALT key :-)

Actions #16

Updated by Matan Breizman over 2 years ago

WIP: https://gist.github.com/Matan-B/40b5a7ee30e9e73d20c052594365aae8

This seems to be highly related to map gap event.

Actions #17

Updated by Matan Breizman over 2 years ago

  • Status changed from New to In Progress
Actions #18

Updated by Matan Breizman almost 2 years ago

  • Related to Bug #63881: Inaccurate pg splits/merges and pool deletion/creation on OSD mapgap added
Actions

Also available in: Atom PDF