Skip to content

hubble: Fix reply state unknown being interpreted as false #13750

Merged
pchaigno merged 2 commits intomasterfrom
pr/gandro/hubble-fix-port-distribution-metric
Oct 27, 2020
Merged

hubble: Fix reply state unknown being interpreted as false #13750
pchaigno merged 2 commits intomasterfrom
pr/gandro/hubble-fix-port-distribution-metric

Conversation

@gandro
Copy link
Copy Markdown
Member

@gandro gandro commented Oct 26, 2020

This fixes a long-standing issue with Hubble not being able to
distingiush between the reply field being false and the reply field
being absent. To address the issue, the old reply field is replaced
with a tri-state is_reply field which can be either nil, true or
false. All uses of the old field are removed in the source tree, but the
field itself is preserved for backwards-compatibility.

This incidentally also fixes a bug where the port-distribution metric
suffered from the same confusion, which caused it to report source ports
as destination ports.

Ref: #13248
Fixes: #13744

This replaces the existing `reply` field, which suffers from false
negatives due to protobuf not being able to distinguish between a false
value and an absent value.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/hubble labels Oct 26, 2020
@gandro gandro requested a review from a team as a code owner October 26, 2020 10:59
@gandro gandro requested a review from a team October 26, 2020 10:59
Comment thread pkg/hubble/filters/reply.go Outdated
Comment on lines 49 to 53
Copy link
Copy Markdown
Member Author

@gandro gandro Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer note: This assumption is not presumed anymore with this change, so there is a slight change in behavior here: The reply filter will never report any Drop or Policy Verdict events anymore (as they have is_reply always nil).

If people feel strongly about this, I'm happy to add back an exception in the filter logic here to keep this previous behavior. But I'm not 100% confident that the exception is always correct (e.g. we could have dropped a spurious reply packet), which is why I removed it.

@gandro gandro force-pushed the pr/gandro/hubble-fix-port-distribution-metric branch from 7c24be6 to 18dcdf3 Compare October 26, 2020 11:03
@gandro
Copy link
Copy Markdown
Member Author

gandro commented Oct 26, 2020

Unfortunately we don't have proper CI for the metrics yet. But here are some stats from my local testing.

Hubble with the connectivity-check.yaml running and the port-distribution metric enabled.

Without the bugfix, after ~15 minutes:

$ curl -s localhost:9091/metrics | grep hubble_port_distribution_total | wc -l
8343

With the bug fix applied:

$ curl -s localhost:9091/metrics | grep hubble_port_distribution_total | wc -l
11

@gandro gandro force-pushed the pr/gandro/hubble-fix-port-distribution-metric branch from 18dcdf3 to 5132516 Compare October 26, 2020 11:50
This fixes a long-standing issue with Hubble not being able to
distingiush between the reply field being false and the reply field
being absent. To address the issue, the old `reply` field is replaced
with a tri-state `is_reply` field which can be either nil, true or
false. All uses of the old field are removed in the source tree, but the
field itself is preserved for backwards-compatibility.

This incidentally also fixes a bug where the `port-distribution` metric
suffered from the same confusion, which caused it to report source ports
as destination ports.

Ref: #13248
Fixes: #13744

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/gandro/hubble-fix-port-distribution-metric branch from 5132516 to 94e1c60 Compare October 26, 2020 11:51
Copy link
Copy Markdown
Member

@rolinh rolinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Comment thread pkg/hubble/parser/threefour/parser.go
Copy link
Copy Markdown
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍 one question around nil check

@gandro
Copy link
Copy Markdown
Member Author

gandro commented Oct 26, 2020

test-me-please

@gandro
Copy link
Copy Markdown
Member Author

gandro commented Oct 27, 2020

Two CI flakes:
Provisioning error GKE (timeout in "selecting random cluster"): https://jenkins.cilium.io/job/Cilium-PR-K8s-GKE/2949/console
Runtime-4.9 hit: #12891 https://jenkins.cilium.io/job/Cilium-PR-Runtime-4.9/2319/

@gandro
Copy link
Copy Markdown
Member Author

gandro commented Oct 27, 2020

retest-gke

@gandro
Copy link
Copy Markdown
Member Author

gandro commented Oct 27, 2020

retest-4.9

@gandro
Copy link
Copy Markdown
Member Author

gandro commented Oct 27, 2020

retest-runtime

@maintainer-s-little-helper maintainer-s-little-helper Bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 27, 2020
@pchaigno pchaigno merged commit 0507c62 into master Oct 27, 2020
@pchaigno pchaigno deleted the pr/gandro/hubble-fix-port-distribution-metric branch October 27, 2020 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hubble possibly misrepresenting replies as requests for UDP traffic

8 participants