feat: inbound only graceful draining#38430
Conversation
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
|
Can someone from @envoyproxy/envoy-mobile-triage help me with these Android test failures? The smoking gun appears to be errors like these: I tried to add a link command to the jni bazel config but no dice. I've never touched Envoy native development, so I truly have no idea what I'm doing |
You added the Line 110 in 87b4ae8 std::atomic linker error with your change, as we've had std::atomic in Envoy code prior and no linker issues on Android.
|
That's what I thought, too but when I tested it in 40faf7e, it didn't help, though: https://github.com/envoyproxy/envoy/actions/runs/13312980039/job/37180020453 I'm also not able to reproduce this locally. |
Oh looks like it did help. I just need to link it with |
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
|
Android is now passing after integrating @fredyw's changes; thanks for the assist! This is now ready for review |
|
/retest |
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
yanavlasov
left a comment
There was a problem hiding this comment.
Which test was flaky?
/wait
| # leave room for compiler/linker. | ||
| # The number 3G is chosen heuristically to both support large VM and small VM with RBE. | ||
| # Startup options cannot be selected via config. | ||
| # TODO: Adding just to test android |
There was a problem hiding this comment.
Yes android doesn't run in PRs by default; I wanted to get an approval on the code first before removing so there aren't surprises on main post merge
| std::atomic<DrainPair> draining_{DrainPair{false, Network::DrainDirection::None}}; | ||
| // A map of timers keyed by the direction that triggered the drain | ||
| std::map<Network::DrainDirection, Event::TimerPtr> drain_tick_timers_; | ||
| std::map<Network::DrainDirection, MonotonicTime> drain_deadlines_ = { |
There was a problem hiding this comment.
This looks like a struct would be a better choice? Are the timer keys dynamic?
There was a problem hiding this comment.
No we only expect a max of two keys (All and InboundOnly) at any time (and we have an assertion in the code to check).
AdminDrainInboundOnlyIdempotent (or something similar). I believe I addressed it by adding the stopped_listener flatHashSet check into a callback function that runs on the main thread which should prevent the race conditions that led to the flaky test. Here's a failed run: https://github.com/envoyproxy/envoy/actions/runs/13172376287/job/36764947882 |
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
|
@yanavlasov do you have any additional feedback? |
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
|
/retest |
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
|
/retest |
Trying envoyproxy#37873 again to get to the bottom of the flakes that caused us to need to revert. This got a lot more complex after envoyproxy#38333, so I may need @yanavlasov to help me out with the proper semantics Commit Message: ``` Fixes envoyproxy#35020. The inbound_only query param for the drain_listeners admin endpoint doesn't work with the graceful query parameter. As a result, outbound listeners will send connection: close headers to upstreams which is undesired. This PR adds the ability for drain_manager to drain in a single direction. ``` Additional Description: Risk Level: Medium Testing: Docs Changes: Release Notes: Platform Specific Features: [Optional Runtime guard:] [Optional Fixes #Issue] [Optional Fixes commit #PR or SHA] [Optional Deprecated:] [Optional [API Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):] --------- Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Trying envoyproxy#37873 again to get to the bottom of the flakes that caused us to need to revert. This got a lot more complex after envoyproxy#38333, so I may need @yanavlasov to help me out with the proper semantics Commit Message: ``` Fixes envoyproxy#35020. The inbound_only query param for the drain_listeners admin endpoint doesn't work with the graceful query parameter. As a result, outbound listeners will send connection: close headers to upstreams which is undesired. This PR adds the ability for drain_manager to drain in a single direction. ``` Additional Description: Risk Level: Medium Testing: Docs Changes: Release Notes: Platform Specific Features: [Optional Runtime guard:] [Optional Fixes #Issue] [Optional Fixes commit #PR or SHA] [Optional Deprecated:] [Optional [API Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):] --------- Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Trying envoyproxy#37873 again to get to the bottom of the flakes that caused us to need to revert. This got a lot more complex after envoyproxy#38333, so I may need @yanavlasov to help me out with the proper semantics Commit Message: ``` Fixes envoyproxy#35020. The inbound_only query param for the drain_listeners admin endpoint doesn't work with the graceful query parameter. As a result, outbound listeners will send connection: close headers to upstreams which is undesired. This PR adds the ability for drain_manager to drain in a single direction. ``` Additional Description: Risk Level: Medium Testing: Docs Changes: Release Notes: Platform Specific Features: [Optional Runtime guard:] [Optional Fixes #Issue] [Optional Fixes commit #PR or SHA] [Optional Deprecated:] [Optional [API Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):] --------- Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Trying #37873 again to get to the bottom of the flakes that caused us to need to revert. This got a lot more complex after #38333, so I may need @yanavlasov to help me out with the proper semantics
Commit Message:
Additional Description:
Risk Level: Medium
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]