Skip to content

hubble-relay: add initial multi-node support#11171

Merged
qmonnet merged 8 commits intomasterfrom
pr/rolinh/hubble-relay-multi-node
Apr 30, 2020
Merged

hubble-relay: add initial multi-node support#11171
qmonnet merged 8 commits intomasterfrom
pr/rolinh/hubble-relay-multi-node

Conversation

@rolinh
Copy link
Copy Markdown
Member

@rolinh rolinh commented Apr 27, 2020

Please, see individual commits for details.

$ hubble-relay serve --debug 
level=debug msg="os.Hostname() returned" nodeName=k8s1 subsys=node
level=info msg="Starting server..." options="{HubbleTarget:unix:///var/run/cilium/hubble.sock DialTimeout:5s RetryTimeout:30s ListenAddress::4245 Debug:true}" subsys=hubble-relay
level=debug msg="Received peer change notification" change notification="name:\"k8s1\" address:\"192.168.34.11\" type:PEER_ADDED " subsys=hubble-relay
level=debug msg="Received peer change notification" change notification="name:\"k8s2\" address:\"192.168.34.12\" type:PEER_ADDED " subsys=hubble-relay
$ hubble observe --server 'localhost:4245' -f
Apr 27 14:29:45.370 [k8s1]: [f00d::a10:0:0:4bf6]:4240(cilium-health) -> [fd00::b]:51018 from-endpoint FORWARDED (TCP Flags: ACK)
Apr 27 14:29:45.370 [k8s1]: [f00d::a10:0:0:4bf6]:4240(cilium-health) -> [fd00::b]:51018 to-stack FORWARDED (TCP Flags: ACK)
Apr 27 14:29:45.370 [k8s1]: [fd00::b]:51018 -> [f00d::a10:0:0:4bf6]:4240(cilium-health) from-host FORWARDED (TCP Flags: ACK)
Apr 27 14:29:36.038 [k8s2]: [fd00::c]:38070 -> [f00d::a11:0:0:cd56]:4240(cilium-health) to-endpoint FORWARDED (TCP Flags: ACK)
Apr 27 14:29:45.489 [k8s2]: 10.17.142.26:4240(cilium-health) -> 192.168.34.11:49364 from-endpoint FORWARDED (TCP Flags: ACK)
Apr 27 14:29:45.489 [k8s2]: 10.17.142.26:4240(cilium-health) -> 192.168.34.11:49364 to-stack FORWARDED (TCP Flags: ACK)
Apr 27 14:29:45.489 [k8s2]: [f00d::a11:0:0:cd56]:4240(cilium-health) -> [fd00::b]:41642 from-endpoint FORWARDED (TCP Flags: ACK)
Apr 27 14:29:45.489 [k8s2]: [f00d::a11:0:0:cd56]:4240(cilium-health) -> [fd00::b]:41642 to-stack FORWARDED (TCP Flags: ACK)
Apr 27 14:29:45.490 [k8s2]: [fd00::b]:41642 -> [f00d::a11:0:0:cd56]:4240(cilium-health) from-host FORWARDED (TCP Flags: ACK)
Apr 27 14:29:45.490 [k8s2]: [fd00::b]:41642 -> [f00d::a11:0:0:cd56]:4240(cilium-health) to-endpoint FORWARDED (TCP Flags: ACK)
Apr 27 14:29:45.490 [k8s2]: 192.168.34.11:49364 -> 10.17.142.26:4240(cilium-health) from-host FORWARDED (TCP Flags: ACK)
Apr 27 14:29:45.370 [k8s1]: [fd00::b]:51018 -> [f00d::a10:0:0:4bf6]:4240(cilium-health) to-endpoint FORWARDED (TCP Flags: ACK)
Apr 27 14:29:46.899 [k8s1]: 192.168.33.11:6443(sun-sr-https) -> kube-system/coredns-7db7b4f79f-mxzrm:52518 from-host FORWARDED (TCP Flags: ACK, PSH)
Apr 27 14:29:46.899 [k8s1]: 192.168.33.11:443(https) -> kube-system/coredns-7db7b4f79f-mxzrm:52518 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Apr 27 14:29:46.899 [k8s1]: kube-system/coredns-7db7b4f79f-mxzrm:52518 -> default/kubernetes:443(https) from-endpoint FORWARDED (TCP Flags: ACK)
...

Ref https://github.com/cilium/hubble/issues/89

@rolinh rolinh added kind/feature This introduces new functionality. release-note/major This PR introduces major new functionality to Cilium. area/hubble labels Apr 27, 2020
@rolinh rolinh requested a review from a team as a code owner April 27, 2020 14:32
@rolinh rolinh requested a review from a team April 27, 2020 14:32
@coveralls
Copy link
Copy Markdown

coveralls commented Apr 27, 2020

Coverage Status

Coverage increased (+0.001%) to 44.626% when pulling 94f9cb4 on pr/rolinh/hubble-relay-multi-node into dbdf127 on master.

Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 🎉 There's a leftover piece of commented-out code, otherwise looks good!

Comment thread pkg/hubble/relay/observer.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(no action required)

I think this is the option we want to pursue in the long run. gRPC client connections seem thread-safe, so reusing the connections to the selected peers seems to be the way to go.

Comment thread pkg/hubble/relay/observer.go Outdated
Comment thread pkg/hubble/relay/peer.go Outdated
@rolinh rolinh force-pushed the pr/rolinh/hubble-relay-multi-node branch from 14e2352 to 97ac084 Compare April 27, 2020 15:50
@rolinh
Copy link
Copy Markdown
Member Author

rolinh commented Apr 28, 2020

test-me-please

Copy link
Copy Markdown
Member

@glibsm glibsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea with all the flags being on the same struct, i don't quite understand why it's global though

Comment thread hubble-relay/cmd/serve/serve.go Outdated
Comment thread hubble-relay/cmd/serve/serve.go Outdated
@rolinh rolinh force-pushed the pr/rolinh/hubble-relay-multi-node branch from 97ac084 to cc9e813 Compare April 28, 2020 19:46
@rolinh rolinh requested a review from glibsm April 28, 2020 19:47
@rolinh
Copy link
Copy Markdown
Member Author

rolinh commented Apr 28, 2020

test-me-please

@michi-covalent
Copy link
Copy Markdown
Contributor

@rolinh can you try running hubble observe without -f? for me it doesn't show anything, and then prints flows when i ctrl-c it. for example, when i do:

hubble observe --server hubble-relay.kube-system:80

it doesn't print anything until i kill the process, and it prints 20 flows when i do ctrl-c.

the service is defined like this:

kind: Service
apiVersion: v1
metadata:
  name: hubble-relay
  namespace: kube-system
  labels:
    k8s-app: hubble-relay
spec:
  type: ClusterIP
  selector:
    k8s-app: hubble-relay
  ports:
  - protocol: TCP
    port: 80
    targetPort: 4245

rolinh added 8 commits April 29, 2020 10:59
This commit fixes signal handling so that hubble-relay is properly
shut-down upon receiving a terminating signal.

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
This option is useful to set the time to wait before attempting to
reconnect when a connection to a hubble peer is lost.

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
This commit adds initial support for multi-node to hubble-relay. This
feature allows hubble-relay to connect to all hubble instances in a
cluster to answer gRPC requests. This means that when a GetFlows or
ServerStatus request is issued, the reply contains information from all
hubble peers rather than just the local instance.

Note that this is only a first step for proper multi-node support. The
current implementation has the following (major) limitations:

- No mutual TLS; this means that connection to other Hubble peers is
  insecure.
- No flows re-ordering; flows are sent as they are received from the
  various peers. This means that chronological order of flows is not
  guaranteed.
- Inefficient: a new connection is established to all peers for every
  gRPC request to process.
- When a peer joins the cluster while a request is being processed, it
  is not taken into account.

All of the above limitations need to be addressed before the multi-node
feature is deemed production ready.

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
@rolinh rolinh force-pushed the pr/rolinh/hubble-relay-multi-node branch from cc9e813 to 94f9cb4 Compare April 29, 2020 08:59
@rolinh
Copy link
Copy Markdown
Member Author

rolinh commented Apr 29, 2020

@michi-covalent I fixed this, please try again and confirm it now works as expected.

@rolinh
Copy link
Copy Markdown
Member Author

rolinh commented Apr 29, 2020

test-me-please

@michi-covalent
Copy link
Copy Markdown
Contributor

@rolinh it's working now

@qmonnet qmonnet merged commit ad4c589 into master Apr 30, 2020
@qmonnet qmonnet deleted the pr/rolinh/hubble-relay-multi-node branch April 30, 2020 09:18
@tgraf tgraf mentioned this pull request May 6, 2020
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature This introduces new functionality. release-note/major This PR introduces major new functionality to Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants