Skip to content

Issue with no import stats during hot restart merge #7227

@mattklein123

Description

@mattklein123

I was running a smoke test using a debug build and hit this call stack/assert:

#0  0x00007f3a85eb1c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#0  0x00007f3a85eb1c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f3a85eb5028 in __GI_abort () at abort.c:89
#2  0x000000000107c688 in Envoy::Stats::GaugeImpl<Envoy::Stats::HeapStatData>::sub (this=0x1206c990, amount=15409887979504015509) at bazel-out/k8-dbg/bin/external/envoy/source/common/stats/_virtual_includes/stat_data_allocator_lib/common/stats/stat_data_allocator_impl.h:150
#3  0x000000000108471a in Envoy::Stats::StatMerger::mergeGauges (this=0x112f65c0, gauges=...) at external/envoy/source/common/stats/stat_merger.cc:50
#4  0x00000000010847d5 in Envoy::Stats::StatMerger::mergeStats (this=0x112f65c0, counter_deltas=..., gauges=...) at external/envoy/source/common/stats/stat_merger.cc:58
#5  0x00000000008b7b43 in Envoy::Server::HotRestartingChild::mergeParentStats (this=0x3b0f4a8, stats_store=..., stats_proto=...) at external/envoy/source/server/hot_restarting_child.cc:97
#6  0x00000000008b47c9 in Envoy::Server::HotRestartImpl::mergeParentStatsIfAny (this=0x3b0f4a0, stats_store=...) at external/envoy/source/server/hot_restart_impl.cc:131
#7  0x00000000008f121e in Envoy::Server::InstanceImpl::flushStats()::$_0::operator()() const (this=0x7ffc3726a308) at external/envoy/source/server/server.cc:173
#8  0x00000000008f10bd in std::_Function_handler<void (), Envoy::Server::InstanceImpl::flushStats()::$_0>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/std_function.h:316
#9  0x0000000000628e5e in std::function<void ()>::operator()() const (this=0x7ffc3726a308) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/std_function.h:706
#10 0x0000000000a52333 in Envoy::Stats::ThreadLocalStoreImpl::mergeInternal(std::function<void ()>) (this=0x3b0f680, merge_complete_cb=...) at external/envoy/source/common/stats/thread_local_store.cc:192
#11 0x0000000000a56a80 in Envoy::Stats::ThreadLocalStoreImpl::mergeHistograms(std::function<void ()>)::$_2::operator()() const (this=0x8f3f200) at external/envoy/source/common/stats/thread_local_store.cc:180
#12 0x0000000000a5689d in std::_Function_handler<void (), Envoy::Stats::ThreadLocalStoreImpl::mergeHistograms(std::function<void ()>)::$_2>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/std_function.h:316
#13 0x0000000000628e5e in std::function<void ()>::operator()() const (this=0x7ffc3726a3b8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/std_function.h:706
#14 0x00000000009e6afc in Envoy::Event::DispatcherImpl::runPostCallbacks (this=0x3a68b40) at external/envoy/source/common/event/dispatcher_impl.cc:198
#15 0x00000000009e7158 in Envoy::Event::DispatcherImpl::DispatcherImpl(std::unique_ptr<Envoy::Buffer::WatermarkFactory, std::default_delete<Envoy::Buffer::WatermarkFactory> >&&, Envoy::Api::Api&, Envoy::Event::TimeSystem&)::$_1::operator()() const (this=0x3a2f660) at external/envoy/source/common/event/dispatcher_impl.cc:39
#16 0x00000000009e700d in std::_Function_handler<void (), Envoy::Event::DispatcherImpl::DispatcherImpl(std::unique_ptr<Envoy::Buffer::WatermarkFactory, std::default_delete<Envoy::Buffer::WatermarkFactory> >&&, Envoy::Api::Api&, Envoy::Event::TimeSystem&)::$_1>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/std_function.h:316
#17 0x0000000000628e5e in std::function<void ()>::operator()() const (this=0x3a2f660) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/std_function.h:706
#18 0x0000000000a1eded in Envoy::Event::TimerImpl::TimerImpl(Envoy::CSmartPtr<event_base, &event_base_free>&, std::function<void ()>)::$_0::operator()(int, short, void*) const (this=0xffffffff, arg=0x3a2f5e0) at external/envoy/source/common/event/timer_impl.cc:22
#19 0x0000000000a1edb9 in Envoy::Event::TimerImpl::TimerImpl(Envoy::CSmartPtr<event_base, &event_base_free>&, std::function<void ()>)::$_0::__invoke(int, short, void*) (arg=0x3a2f5e0) at external/envoy/source/common/event/timer_impl.cc:22
#20 0x0000000000ffca81 in event_process_active_single_queue (base=0x3a9c2c0, activeq=0x3a60450, max_to_process=2147483647, endtime=0x0) at /root/.cache/bazel/_bazel_root/b570b5ccd0454dc9af9f65ab1833764d/sandbox/processwrapper-sandbox/167/execroot/envoy/external/com_github_libevent_libevent/event.c:1707
#21 0x0000000000ff9830 in event_process_active (base=<optimized out>) at /root/.cache/bazel/_bazel_root/b570b5ccd0454dc9af9f65ab1833764d/sandbox/processwrapper-sandbox/167/execroot/envoy/external/com_github_libevent_libevent/event.c:1799
#22 event_base_loop (base=0x3a9c2c0, flags=<optimized out>) at /root/.cache/bazel/_bazel_root/b570b5ccd0454dc9af9f65ab1833764d/sandbox/processwrapper-sandbox/167/execroot/envoy/external/com_github_libevent_libevent/event.c:2041
#23 0x0000000000a1dadc in Envoy::Event::LibeventScheduler::run (this=0x3a68b88, mode=Envoy::Event::Dispatcher::Block) at external/envoy/source/common/event/libevent_scheduler.cc:47
#24 0x00000000009e6a0e in Envoy::Event::DispatcherImpl::run (this=0x3a68b40, type=Envoy::Event::Dispatcher::Block) at external/envoy/source/common/event/dispatcher_impl.cc:178
#25 0x00000000008efcdf in Envoy::Server::InstanceImpl::run (this=0x3b34000) at external/envoy/source/server/server.cc:491
#26 0x00000000004bb3ee in Envoy::MainCommonBase::run (this=0x3a46440) at external/envoy/source/exe/main_common.cc:102
#27 0x00000000004b914c in Envoy::MainCommon::run (this=0x3a46280) at bazel-out/k8-dbg/bin/external/envoy/source/exe/_virtual_includes/envoy_main_common_lib/exe/main_common.h:91
#28 0x00000000004b8d77 in main (argc=15, argv=0x7ffc3726aa78) at external/envoy/source/exe/main.cc:39

The stat in question is this one:

GAUGE(version, NeverImport) \

The issue is that for stats that are being set using set() vs. standard increment/decrement, there is no guarantee that the count is going to be stable to do the add/sub logic check in the gauge merging code.

I'm not sure of the best way of handling this other than potentially removing the assert or relaxing it somehow for the merge. @jmarantz WDYT? Are you willing to pick this up?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions