Skip to content

Fix seg fault issue#3

Draft
ryanhristovski wants to merge 44 commits intonezdolik:ip-tagging-async-file-reloadfrom
ryanhristovski:add-verbose-logging-ip-tagging-debug
Draft

Fix seg fault issue#3
ryanhristovski wants to merge 44 commits intonezdolik:ip-tagging-async-file-reloadfrom
ryanhristovski:add-verbose-logging-ip-tagging-debug

Conversation

@ryanhristovski
Copy link
Copy Markdown

@ryanhristovski ryanhristovski commented Jun 24, 2025

Fix IP tagging filter segfault and implement reliable file reloading
Replaced temporary loader approach with direct file parsing to eliminate use-after-free crashes

Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
@ryanhristovski ryanhristovski changed the title set verbose logging Fix seg fault issue Jun 25, 2025
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
Signed-off-by: Ryan Hristovski <ryan.hristovski@docker.com>
nezdolik pushed a commit that referenced this pull request Dec 17, 2025
…voyproxy#42554)

## Description

Today, when a filesystem watch callback returns a non-OK status or
throws an exception, the error gets propagated to `FileEventImpl` which
uses `THROW_IF_NOT_OK`.

Since there's no exception handler in the `libevent` loop, this causes
`std::terminate` to be called, which crashes Envoy.

**Stack Trace:**
```
Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.119][234999][warning][misc] [source/common/protobuf/message_validator_impl.cc:23] Deprecated field: type envoy.config.core.v3.HeaderValueOption Using deprecated option 'envoy.config.core.v3.HeaderValueOption.append' from file base.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/version_history/version_history for details. If continued use of this field is absolutely necessary, see https://www.envoyproxy.io/docs/envoy/latest/configuration/operations/runtime#using-runtime-overrides-for-deprecated-features for how to apply a temporary and highly discouraged override.
Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.120][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '0_listener'
Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.123][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '1_listener'
Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.126][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '2_listener'
Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.127][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '3_listener'
Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.128][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '4_listener'
Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.130][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '5_listener'
Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.132][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener '6_listener'
Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.134][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener 'mtls_untrusted_regional_transparent_tunnel_listener'
Dec 11 00:11:26 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:26.135][234999][info][upstream] [source/common/listener_manager/lds_api.cc:109] lds: add/update listener 'mtls_app_trusted_regional_transparent_tunnel_listener'
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.097][234999][critical][main] [source/exe/terminate_handler.cc:36] std::terminate called! Uncaught unknown exception, see trace.
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.097][234999][critical][backtrace] [./source/server/backtrace.h:113] Backtrace (use tools/stack_decode.py to get line numbers):
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.097][234999][critical][backtrace] [./source/server/backtrace.h:114] Envoy version: 5eaabe0bbaad4612cb85473cd151039d8f1a2760/1.34.2-dev/Clean/RELEASE/BoringSSL
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.097][234999][critical][backtrace] [./source/server/backtrace.h:116] Address mapping: 558d8afcc000-558d8ee2f000 /usr/local/bin/envoy
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.100][234999][critical][backtrace] [./source/server/backtrace.h:123] #0: [0x558d8da5784f]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.102][234999][critical][backtrace] [./source/server/backtrace.h:123] #1: [0x558d8edd8673]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.104][234999][critical][backtrace] [./source/server/backtrace.h:123] #2: [0x558d8e3b120b]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.106][234999][critical][backtrace] [./source/server/backtrace.h:121] #3: Envoy::Filesystem::WatcherImpl::onInotifyEvent() [0x558d8e3990c3]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.108][234999][critical][backtrace] [./source/server/backtrace.h:123] envoyproxy#4: [0x558d8e3998d2]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.109][234999][critical][backtrace] [./source/server/backtrace.h:123] envoyproxy#5: [0x558d8e393de6]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.111][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#6: Envoy::Event::FileEventImpl::mergeInjectedEventsAndRunCb() [0x558d8e394eb5]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.113][234999][critical][backtrace] [./source/server/backtrace.h:123] envoyproxy#7: [0x558d8e710823]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.115][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#8: event_base_loop [0x558d8e70d4a1]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.117][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#9: Envoy::Server::InstanceBase::run() [0x558d8daa2b99]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.119][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#10: Envoy::MainCommonBase::run() [0x558d8da4327a]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.121][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#11: Envoy::MainCommon::main() [0x558d8da44234]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.123][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#12: main [0x558d8afcc11c]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.123][234999][critical][backtrace] [./source/server/backtrace.h:123] envoyproxy#13: [0x7f1d54073efb]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.123][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#14: __libc_start_main [0x7f1d54073fbb]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#15: _start [0x558d8afcc02e]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:129] Caught Aborted, suspect faulting address 0x395f7
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:113] Backtrace (use tools/stack_decode.py to get line numbers):
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:114] Envoy version: 5eaabe0bbaad4612cb85473cd151039d8f1a2760/1.34.2-dev/Clean/RELEASE/BoringSSL
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:116] Address mapping: 558d8afcc000-558d8ee2f000 /usr/local/bin/envoy
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:123] #0: [0x7f1d54089c90]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:121] #1: gsignal [0x7f1d54089bde]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.124][234999][critical][backtrace] [./source/server/backtrace.h:121] #2: abort [0x7f1d54072832]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.126][234999][critical][backtrace] [./source/server/backtrace.h:123] #3: [0x558d8da5785c]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.128][234999][critical][backtrace] [./source/server/backtrace.h:123] envoyproxy#4: [0x558d8edd8673]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.129][234999][critical][backtrace] [./source/server/backtrace.h:123] envoyproxy#5: [0x558d8e3b120b]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.129][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#6: Envoy::Filesystem::WatcherImpl::onInotifyEvent() [0x558d8e3990c3]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.131][234999][critical][backtrace] [./source/server/backtrace.h:123] envoyproxy#7: [0x558d8e3998d2]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.133][234999][critical][backtrace] [./source/server/backtrace.h:123] envoyproxy#8: [0x558d8e393de6]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.133][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#9: Envoy::Event::FileEventImpl::mergeInjectedEventsAndRunCb() [0x558d8e394eb5]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:123] envoyproxy#10: [0x558d8e710823]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#11: event_base_loop [0x558d8e70d4a1]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#12: Envoy::Server::InstanceBase::run() [0x558d8daa2b99]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#13: Envoy::MainCommonBase::run() [0x558d8da4327a]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#14: Envoy::MainCommon::main() [0x558d8da44234]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#15: main [0x558d8afcc11c]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:123] envoyproxy#16: [0x7f1d54073efb]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#17: __libc_start_main [0x7f1d54073fbb]
Dec 11 00:11:30 dbletE9433T node-envoy[234999]: [2025-12-11 00:11:30.135][234999][critical][backtrace] [./source/server/backtrace.h:121] envoyproxy#18: _start [0x558d8afcc02e]
```

In this change, we are making the `inotify` and `kqueue` watchers handle
callback errors gracefully by catching any exceptions using
`TRY_ASSERT_MAIN_THREAD`, logging errors instead of propagating them and
always returning the `OkStatus` to the event loop.

---

**Commit Message:** filesystem: Fix crash when watch callback returns
error or throws
**Additional Description:** Make `inotify` and `kqueue` watchers handle
callback errors gracefully.
**Risk Level:** Low
**Testing:** CI
**Docs Changes:** N/A
**Release Notes:** N/A

---------

Signed-off-by: Rohit Agrawal <rohit.agrawal@salesforce.com>
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
nezdolik pushed a commit that referenced this pull request Mar 24, 2026
…proxy#43667)

Commit Message:
The LEDS subscription callback lambda captured `used_load_assignment` by
value as a raw pointer to the object owned by the
`cluster_load_assignment_` unique_ptr. When a subsequent EDS update
reassigned `cluster_load_assignment_`, the old object was destroyed but
existing LEDS subscriptions (not recreated for unchanged configs) still
held the dangling pointer. When the LEDS subscription later fired its
callback (e.g. onConfigUpdateFailed), dereferencing this pointer caused
a segfault.

Stack trace:
```
  #0: [0x77b9d6de8330]
  #1: Envoy::Upstream::EdsClusterImpl::BatchUpdateHelper::batchUpdate()
  #2: Envoy::Upstream::PrioritySetImpl::batchHostUpdate()
  #3: std::__1::__function::__func<>::operator()()
  envoyproxy#4: Envoy::Upstream::LedsSubscription::onConfigUpdateFailed()
  envoyproxy#5: Envoy::Config::GrpcSubscriptionImpl::onConfigUpdateFailed()
  envoyproxy#6: event_process_active_single_queue
  envoyproxy#7: event_base_loop
  envoyproxy#8: Envoy::Server::InstanceBase::run()
```

Fix by capturing `this` and accessing `cluster_load_assignment_`
directly, which always reflects the current valid assignment.
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]

Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant