Skip to content

tap: fix tap response streams never ending#299

Merged
hawkw merged 1 commit intomasterfrom
eliza/fix-tap-hang
Aug 6, 2019
Merged

tap: fix tap response streams never ending#299
hawkw merged 1 commit intomasterfrom
eliza/fix-tap-hang

Conversation

@hawkw
Copy link
Contributor

@hawkw hawkw commented Aug 6, 2019

A bug currently exists in the tap code where response streams never end
when reaching the limit on the number of events. Since these streams
never end, the tap controller never restarts them, causing the
user-facing tap output to hang.

This is due to a bug introduced in 073a1be. The Dispatch future
previously was responsible for holding a sender to the events channel,
and cloning it into each tapped service. In that change, the Dispatch
type was removed, and the cloned sender was moved to the Tap struct.
The Dispatch future was dropped when the limit is reached, but the
Tap struct is not. This means that the final sender is never dropped,
and the events stream never ends.

This commit fixes the bug by moving the sender into Shared, which
Tap holds a Weak reference to. This ensures that the sender that's
cloned into each tapped service is dropped when the limit is reached,
allowing the channel to close when all other senders are dropped.

I've manually verified that the hang as described in
linkerd/linkerd2#2891 no longer occurs after making this change.

Fixes linkerd/linkerd2#2891

Signed-off-by: Eliza Weisman eliza@buoyant.io

A bug currently exists in the tap code where response streams never end
when reaching the limit on the number of events. Since these streams
never end, the tap controller never restarts them, causing the
user-facing tap output to hang.

This is due to a bug introduced in
073a1be. The `Dispatch` future
previously was responsible for holding a sender to the events channel,
and cloning it into each tapped service. In that change, the `Dispatch`
type was removed, and the cloned sender was moved to the `Tap` struct.
The `Dispatch` future was dropped when the limit is reached, but the
`Tap` struct is not. This means that the final sender is never dropped,
and the events stream never ends.

This commit fixes the bug by moving the sender into `Shared`, which
`Tap` holds a `Weak` reference to. This ensures that the sender that's
cloned into each tapped service is dropped when the limit is reached,
allowing the channel to close when all other senders are dropped.

Fixes linkerd/linkerd2#2891

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@hawkw hawkw added the bug Something isn't working label Aug 6, 2019
@hawkw hawkw requested review from olix0r and seanmonstar August 6, 2019 21:56
@hawkw hawkw self-assigned this Aug 6, 2019
Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! lgtm.

@hawkw hawkw merged commit 767806b into master Aug 6, 2019
scottcarol pushed a commit to linkerd/linkerd2 that referenced this pull request Aug 8, 2019
* update-rust-version: Check usage (linkerd/linkerd2-proxy#298)
* tap: fix tap response streams never ending (linkerd/linkerd2-proxy#299)
* Require identity on tap requests (linkerd/linkerd2-proxy#295)
* Authority label should reflect logical dst (linkerd/linkerd2-proxy#300)
* Replace futures_watch with tokio::sync::watch (linkerd/linkerd2-proxy#301)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Aug 9, 2019
* Update h2 to v0.1.26
* Properly fall back in the dst_router (linkerd/linkerd2-proxy#291)
* Tap server authorizes clients when identity is expected (linkerd/linkerd2-proxy#290)
* update-rust-version: Check usage (linkerd/linkerd2-proxy#298)
* tap: fix tap response streams never ending (linkerd/linkerd2-proxy#299)
* Require identity on tap requests (linkerd/linkerd2-proxy#295)
* Authority label should reflect logical dst (linkerd/linkerd2-proxy#300)
* Replace futures_watch with tokio::sync::watch (linkerd/linkerd2-proxy#301)
* metrics: add `request_handle_us` histogram (linkerd/linkerd2-proxy#294)
* linkerd2-proxy: Adopt Rust 2018 (linkerd/linkerd2-proxy#302)
* Remove futures-mpsc-lossy (linkerd/linkerd2-proxy#305)
* Adopt std::convert::TryFrom (linkerd/linkerd2-proxy#304)
* lib: Rename directories to match crate names (linkerd/linkerd2-proxy#303)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Aug 9, 2019
* update-rust-version: Check usage (linkerd/linkerd2-proxy#298)
* tap: fix tap response streams never ending (linkerd/linkerd2-proxy#299)
* Require identity on tap requests (linkerd/linkerd2-proxy#295)
* Authority label should reflect logical dst (linkerd/linkerd2-proxy#300)
* Replace futures_watch with tokio::sync::watch (linkerd/linkerd2-proxy#301)
* metrics: add `request_handle_us` histogram (linkerd/linkerd2-proxy#294)
* linkerd2-proxy: Adopt Rust 2018 (linkerd/linkerd2-proxy#302)
* Remove futures-mpsc-lossy (linkerd/linkerd2-proxy#305)
* Adopt std::convert::TryFrom (linkerd/linkerd2-proxy#304)
* lib: Rename directories to match crate names (linkerd/linkerd2-proxy#303)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Aug 9, 2019
* update-rust-version: Check usage (linkerd/linkerd2-proxy#298)
* tap: fix tap response streams never ending (linkerd/linkerd2-proxy#299)
* Require identity on tap requests (linkerd/linkerd2-proxy#295)
* Authority label should reflect logical dst (linkerd/linkerd2-proxy#300)
* Replace futures_watch with tokio::sync::watch (linkerd/linkerd2-proxy#301)
* metrics: add `request_handle_us` histogram (linkerd/linkerd2-proxy#294)
* linkerd2-proxy: Adopt Rust 2018 (linkerd/linkerd2-proxy#302)
* Remove futures-mpsc-lossy (linkerd/linkerd2-proxy#305)
* Adopt std::convert::TryFrom (linkerd/linkerd2-proxy#304)
* lib: Rename directories to match crate names (linkerd/linkerd2-proxy#303)
@olix0r olix0r deleted the eliza/fix-tap-hang branch August 17, 2019 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tap and Top stops streaming intermittently

2 participants