tap: fix tap response streams never ending#299
Merged
Conversation
A bug currently exists in the tap code where response streams never end when reaching the limit on the number of events. Since these streams never end, the tap controller never restarts them, causing the user-facing tap output to hang. This is due to a bug introduced in 073a1be. The `Dispatch` future previously was responsible for holding a sender to the events channel, and cloning it into each tapped service. In that change, the `Dispatch` type was removed, and the cloned sender was moved to the `Tap` struct. The `Dispatch` future was dropped when the limit is reached, but the `Tap` struct is not. This means that the final sender is never dropped, and the events stream never ends. This commit fixes the bug by moving the sender into `Shared`, which `Tap` holds a `Weak` reference to. This ensures that the sender that's cloned into each tapped service is dropped when the limit is reached, allowing the channel to close when all other senders are dropped. Fixes linkerd/linkerd2#2891 Signed-off-by: Eliza Weisman <eliza@buoyant.io>
scottcarol
pushed a commit
to linkerd/linkerd2
that referenced
this pull request
Aug 8, 2019
* update-rust-version: Check usage (linkerd/linkerd2-proxy#298) * tap: fix tap response streams never ending (linkerd/linkerd2-proxy#299) * Require identity on tap requests (linkerd/linkerd2-proxy#295) * Authority label should reflect logical dst (linkerd/linkerd2-proxy#300) * Replace futures_watch with tokio::sync::watch (linkerd/linkerd2-proxy#301)
olix0r
added a commit
to linkerd/linkerd2
that referenced
this pull request
Aug 9, 2019
* Update h2 to v0.1.26 * Properly fall back in the dst_router (linkerd/linkerd2-proxy#291) * Tap server authorizes clients when identity is expected (linkerd/linkerd2-proxy#290) * update-rust-version: Check usage (linkerd/linkerd2-proxy#298) * tap: fix tap response streams never ending (linkerd/linkerd2-proxy#299) * Require identity on tap requests (linkerd/linkerd2-proxy#295) * Authority label should reflect logical dst (linkerd/linkerd2-proxy#300) * Replace futures_watch with tokio::sync::watch (linkerd/linkerd2-proxy#301) * metrics: add `request_handle_us` histogram (linkerd/linkerd2-proxy#294) * linkerd2-proxy: Adopt Rust 2018 (linkerd/linkerd2-proxy#302) * Remove futures-mpsc-lossy (linkerd/linkerd2-proxy#305) * Adopt std::convert::TryFrom (linkerd/linkerd2-proxy#304) * lib: Rename directories to match crate names (linkerd/linkerd2-proxy#303)
olix0r
added a commit
to linkerd/linkerd2
that referenced
this pull request
Aug 9, 2019
* update-rust-version: Check usage (linkerd/linkerd2-proxy#298) * tap: fix tap response streams never ending (linkerd/linkerd2-proxy#299) * Require identity on tap requests (linkerd/linkerd2-proxy#295) * Authority label should reflect logical dst (linkerd/linkerd2-proxy#300) * Replace futures_watch with tokio::sync::watch (linkerd/linkerd2-proxy#301) * metrics: add `request_handle_us` histogram (linkerd/linkerd2-proxy#294) * linkerd2-proxy: Adopt Rust 2018 (linkerd/linkerd2-proxy#302) * Remove futures-mpsc-lossy (linkerd/linkerd2-proxy#305) * Adopt std::convert::TryFrom (linkerd/linkerd2-proxy#304) * lib: Rename directories to match crate names (linkerd/linkerd2-proxy#303)
olix0r
added a commit
to linkerd/linkerd2
that referenced
this pull request
Aug 9, 2019
* update-rust-version: Check usage (linkerd/linkerd2-proxy#298) * tap: fix tap response streams never ending (linkerd/linkerd2-proxy#299) * Require identity on tap requests (linkerd/linkerd2-proxy#295) * Authority label should reflect logical dst (linkerd/linkerd2-proxy#300) * Replace futures_watch with tokio::sync::watch (linkerd/linkerd2-proxy#301) * metrics: add `request_handle_us` histogram (linkerd/linkerd2-proxy#294) * linkerd2-proxy: Adopt Rust 2018 (linkerd/linkerd2-proxy#302) * Remove futures-mpsc-lossy (linkerd/linkerd2-proxy#305) * Adopt std::convert::TryFrom (linkerd/linkerd2-proxy#304) * lib: Rename directories to match crate names (linkerd/linkerd2-proxy#303)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A bug currently exists in the tap code where response streams never end
when reaching the limit on the number of events. Since these streams
never end, the tap controller never restarts them, causing the
user-facing tap output to hang.
This is due to a bug introduced in 073a1be. The
Dispatchfuturepreviously was responsible for holding a sender to the events channel,
and cloning it into each tapped service. In that change, the
Dispatchtype was removed, and the cloned sender was moved to the
Tapstruct.The
Dispatchfuture was dropped when the limit is reached, but theTapstruct is not. This means that the final sender is never dropped,and the events stream never ends.
This commit fixes the bug by moving the sender into
Shared, whichTapholds aWeakreference to. This ensures that the sender that'scloned into each tapped service is dropped when the limit is reached,
allowing the channel to close when all other senders are dropped.
I've manually verified that the hang as described in
linkerd/linkerd2#2891 no longer occurs after making this change.
Fixes linkerd/linkerd2#2891
Signed-off-by: Eliza Weisman eliza@buoyant.io