hubble: Add --hubble-monitor-events flag #24828
Conversation
cce7da8 to
d85ad93
Compare
|
Commit 3d92136363dac97bac8e4a67607185225ccf699f does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
3d92136 to
d85ad93
Compare
d85ad93 to
2497df3
Compare
--hubble-monitor-events flag--hubble-monitor-events flag
2497df3 to
9d39387
Compare
|
Commit e4632624f7a096c7fbd1792e5bc7c5a81f8bbee4 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
dcfbb3c to
236c8e8
Compare
|
Commit e4632624f7a096c7fbd1792e5bc7c5a81f8bbee4 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
236c8e8 to
c5a41c8
Compare
joestringer
left a comment
There was a problem hiding this comment.
Just a minor usability/UX suggestion below.
Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Aditya Sharma <aditya.sharma@shopify.com> Co-authored-by: Michi Mutsuzaki <michi@isovalent.com> Co-authored-by: Aditya Sharma <aditya.sharma@shopify.com>
38484ac to
f764b10
Compare
|
/test |
|
test-runtime: hit #22373 |
|
/test-runtime |
|
ci-multicluster: filed #25064 |
|
/ci-multicluster |
marqc
left a comment
There was a problem hiding this comment.
I believe you misinterpreted the meaning of return value from the OnMonitorEvent handler. Please check my comments within a PR.
| return &monitorFilter, nil | ||
| } | ||
|
|
||
| func (m *monitorFilter) OnMonitorEvent(ctx context.Context, event *observerTypes.MonitorEvent) (bool, error) { |
There was a problem hiding this comment.
the return bool flag is used as stopPropagation indicator https://github.com/cilium/cilium/blob/main/pkg/hubble/observer/local_observer.go#L119
So if this function returns true the event won't be propagated to hubble observers. If it returns false, the event will be propagated to hubble observers.
Either return value of this function needs to be reverted, or configuration flag should be changed to clearly indicate that this settings is excluding given event types from getting observed by hubble.
Currently when I use --hubble-monitor-events="drop debug capture trace policy-verdict recorder l7 agent" I expect to see everything, but actually there is nothing reaching hubble subsystem.
| flags.StringSlice(option.HubbleMonitorEvents, []string{}, | ||
| fmt.Sprintf( | ||
| "Cilium monitor events for Hubble to observe: [%s]. By default, Hubble observes all monitor events.", | ||
| strings.Join(monitorAPI.AllMessageTypeNames(), " "), |
There was a problem hiding this comment.
Current implementation is to "exclude", not to "observe"
| switch payload := event.Payload.(type) { | ||
| case *observerTypes.PerfEvent: | ||
| if len(payload.Data) == 0 { | ||
| return false, errors.ErrEmptyData |
There was a problem hiding this comment.
Related to my above comment: return false actually passes this event to the hubble subsystem. If you want to not pass empty events (which makes a lot of sense) this should return true
|
|
||
| type testEvent struct { | ||
| event *observerTypes.MonitorEvent | ||
| allowed bool |
There was a problem hiding this comment.
to be consistent with https://github.com/cilium/cilium/blob/main/pkg/hubble/observer/local_observer.go#L119 please rename allowed to stop or stopPropagation
| assert.NoError(t, err) | ||
|
|
||
| for _, event := range tc.events { | ||
| allowed, err := mf.OnMonitorEvent(context.Background(), event.event) |
There was a problem hiding this comment.
rename allowed to stop or stopPropagation
|
Can we back-port this to 1.12 / 1.13? that would really help with managing Hubble CPU usage. Cheers. |
In general, only bugfixes are backported, see the backporting policy here. This is considered a new feature. This feature will be made available in an upcoming v1.14 monthly snapshot release which you're welcome to try out and provide feedback. Furthermore, as we can see from recent posts on this PR, there's still active development on it, so it would be premature to release at this stage without further testing and feedback. |
|
@joestringer that make sense, thank you for the details. |
Hubble Monitor Filter: invert logic Cherry-picks: * cilium#24828 * cilium#25167 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Aditya Sharma <aditya.sharma@shopify.com> Co-authored-by: Michi Mutsuzaki <michi@isovalent.com> Co-authored-by: Aditya Sharma <aditya.sharma@shopify.com> Change-Id: Iaa292209dce920c19b70dd8c62db2334e3c8b514 Reviewed-on: https://gke-internal-review.googlesource.com/c/third_party/cilium/+/694178 Unit-Verified: Prow_Bot_V2 <425329972751-compute@developer.gserviceaccount.com> Reviewed-by: Alan Kutniewski <kutniewski@google.com> Upstream-PR: cilium#24828 Upstream-PR: cilium#25167 Reviewed-on: https://gke-internal-review.googlesource.com/c/third_party/cilium/+/782721 Reviewed-by: Akhil Velagapudi <avelagap@google.com>
This commit introduce a new Hubble cell. It is the first of a series of refactoring patches gradually transitioning Hubble configuration and startup code away from the Cilium daemon. Currently, all the Hubble configuration and initialization is implemented in the Cilium daemon. Hubble related flags (e.g. --enable-hubble) are member of the DaemonConfig struct, and the Hubble startup code is under daemon/cmd/hubble.go. Most notably the launchHubble() func which handle all the Hubble component startup is called from cmd.newDaemon(). Over time, launchHubble() has become increasingly complex, initializing more and more Hubble subsystems while bailing out if any of them encounter a configuration or startup error. This has lead to a situation where a misconfiguration of one part of Hubble (e.g. metrics) could effectively disable other Hubble components without crashing the Cilium daemon. The recent addition of many new Hubble feature / subsystems, for example monitor event filtering[1], L7 data redaction[2], and k8s Events integration on drop notifications[3] introducing new Hubble related flags and logic both increase the potential of startup failure and the surface of affected features. Furthermore, most of the new feature increase Hubble's reliance on the Cilium daemon for either flag setup/parsing and/or dependencies injection, making it harder to decouple Hubble from the Daemon. Fortunately, many part of Cilium already use the hive/cell framework for configuration, startup, and dependency injection. Most notably, all Hubble dependencies are now available as cell, which open the possibility to convert Hubble components as cell themselves. The goal of this patch series is to refactor Hubble into its own new cell and decouple it from the Cilium daemon. Concretely, all the code under daemon/cmd/hubble.go will be moved away into the new cell, along with the flags and Hubble startup, until the point where it would be possible to create a hive with the Hubble cell but without the Cilium daemon. Splitting the Hubble components into smaller cells is left for a later time. [1]: cilium#24828 [2]: cilium#23887 [3]: cilium#29565 Signed-off-by: Alexandre Perrin <alex@isovalent.com>
This commit introduce a new Hubble cell. It is the first of a series of refactoring patches gradually transitioning Hubble configuration and startup code away from the Cilium daemon. Currently, all the Hubble configuration and initialization is implemented in the Cilium daemon. Hubble related flags (e.g. --enable-hubble) are member of the DaemonConfig struct, and the Hubble startup code is under daemon/cmd/hubble.go. Most notably the launchHubble() func which handle all the Hubble component startup is called from cmd.newDaemon(). Over time, launchHubble() has become increasingly complex, initializing more and more Hubble subsystems while bailing out if any of them encounter a configuration or startup error. This has lead to a situation where a misconfiguration of one part of Hubble (e.g. metrics) could effectively disable other Hubble components without crashing the Cilium daemon. The recent addition of many new Hubble feature / subsystems, for example monitor event filtering[1], L7 data redaction[2], and k8s Events integration on drop notifications[3] introducing new Hubble related flags and logic both increase the potential of startup failure and the surface of affected features. Furthermore, most of the new feature increase Hubble's reliance on the Cilium daemon for either flag setup/parsing and/or dependencies injection, making it harder to decouple Hubble from the Daemon. Fortunately, many part of Cilium already use the hive/cell framework for configuration, startup, and dependency injection. Most notably, all Hubble dependencies are now available as cell, which open the possibility to convert Hubble components as cell themselves. The goal of this patch series is to refactor Hubble into its own new cell and decouple it from the Cilium daemon. Concretely, all the code under daemon/cmd/hubble.go will be moved away into the new cell, along with the flags and Hubble startup, until the point where it would be possible to create a hive with the Hubble cell but without the Cilium daemon. Splitting the Hubble components into smaller cells is left for a later time. [1]: cilium#24828 [2]: cilium#23887 [3]: cilium#29565 Signed-off-by: Alexandre Perrin <alex@isovalent.com>
This commit introduce a new Hubble cell. It is the first of a series of refactoring patches gradually transitioning Hubble configuration and startup code away from the Cilium daemon. Currently, all the Hubble configuration and initialization is implemented in the Cilium daemon. Hubble related flags (e.g. --enable-hubble) are member of the DaemonConfig struct, and the Hubble startup code is under daemon/cmd/hubble.go. Most notably the launchHubble() func which handle all the Hubble component startup is called from cmd.newDaemon(). Over time, launchHubble() has become increasingly complex, initializing more and more Hubble subsystems while bailing out if any of them encounter a configuration or startup error. This has lead to a situation where a misconfiguration of one part of Hubble (e.g. metrics) could effectively disable other Hubble components without crashing the Cilium daemon. The recent addition of many new Hubble feature / subsystems, for example monitor event filtering[1], L7 data redaction[2], and k8s Events integration on drop notifications[3] introducing new Hubble related flags and logic both increase the potential of startup failure and the surface of affected features. Furthermore, most of the new feature increase Hubble's reliance on the Cilium daemon for either flag setup/parsing and/or dependencies injection, making it harder to decouple Hubble from the Daemon. Fortunately, many part of Cilium already use the hive/cell framework for configuration, startup, and dependency injection. Most notably, all Hubble dependencies are now available as cell, which open the possibility to convert Hubble components as cell themselves. The goal of this patch series is to refactor Hubble into its own new cell and decouple it from the Cilium daemon. Concretely, all the code under daemon/cmd/hubble.go will be moved away into the new cell, along with the flags and Hubble startup, until the point where it would be possible to create a hive with the Hubble cell but without the Cilium daemon. Splitting the Hubble components into smaller cells is left for a later time. [1]: cilium#24828 [2]: cilium#23887 [3]: cilium#29565 Signed-off-by: Alexandre Perrin <alex@isovalent.com>
This commit introduce a new Hubble cell. It is the first of a series of refactoring patches gradually transitioning Hubble configuration and startup code away from the Cilium daemon. Currently, all the Hubble configuration and initialization is implemented in the Cilium daemon. Hubble related flags (e.g. --enable-hubble) are member of the DaemonConfig struct, and the Hubble startup code is under daemon/cmd/hubble.go. Most notably the launchHubble() func which handle all the Hubble component startup is called from cmd.newDaemon(). Over time, launchHubble() has become increasingly complex, initializing more and more Hubble subsystems while bailing out if any of them encounter a configuration or startup error. This has lead to a situation where a misconfiguration of one part of Hubble (e.g. metrics) could effectively disable other Hubble components without crashing the Cilium daemon. The recent addition of many new Hubble feature / subsystems, for example monitor event filtering[1], L7 data redaction[2], and k8s Events integration on drop notifications[3] introducing new Hubble related flags and logic both increase the potential of startup failure and the surface of affected features. Furthermore, most of the new feature increase Hubble's reliance on the Cilium daemon for either flag setup/parsing and/or dependencies injection, making it harder to decouple Hubble from the Daemon. Fortunately, many part of Cilium already use the hive/cell framework for configuration, startup, and dependency injection. Most notably, all Hubble dependencies are now available as cell, which open the possibility to convert Hubble components as cell themselves. The goal of this patch series is to refactor Hubble into its own new cell and decouple it from the Cilium daemon. Concretely, all the code under daemon/cmd/hubble.go will be moved away into the new cell, along with the flags and Hubble startup, until the point where it would be possible to create a hive with the Hubble cell but without the Cilium daemon. Splitting the Hubble components into smaller cells is left for a later time. [1]: cilium#24828 [2]: cilium#23887 [3]: cilium#29565 Signed-off-by: Alexandre Perrin <alex@isovalent.com>
This commit introduce a new Hubble cell. It is the first of a series of refactoring patches gradually transitioning Hubble configuration and startup code away from the Cilium daemon. Currently, all the Hubble configuration and initialization is implemented in the Cilium daemon. Hubble related flags (e.g. --enable-hubble) are member of the DaemonConfig struct, and the Hubble startup code is under daemon/cmd/hubble.go. Most notably the launchHubble() func which handle all the Hubble component startup is called from cmd.newDaemon(). Over time, launchHubble() has become increasingly complex, initializing more and more Hubble subsystems while bailing out if any of them encounter a configuration or startup error. This has lead to a situation where a misconfiguration of one part of Hubble (e.g. metrics) could effectively disable other Hubble components without crashing the Cilium daemon. The recent addition of many new Hubble feature / subsystems, for example monitor event filtering[1], L7 data redaction[2], and k8s Events integration on drop notifications[3] introducing new Hubble related flags and logic both increase the potential of startup failure and the surface of affected features. Furthermore, most of the new feature increase Hubble's reliance on the Cilium daemon for either flag setup/parsing and/or dependencies injection, making it harder to decouple Hubble from the Daemon. Fortunately, many part of Cilium already use the hive/cell framework for configuration, startup, and dependency injection. Most notably, all Hubble dependencies are now available as cell, which open the possibility to convert Hubble components as cell themselves. The goal of this patch series is to refactor Hubble into its own new cell and decouple it from the Cilium daemon. Concretely, all the code under daemon/cmd/hubble.go will be moved away into the new cell, along with the flags and Hubble startup, until the point where it would be possible to create a hive with the Hubble cell but without the Cilium daemon. Splitting the Hubble components into smaller cells is left for a later time. [1]: cilium#24828 [2]: cilium#23887 [3]: cilium#29565 Signed-off-by: Alexandre Perrin <alex@isovalent.com>
This commit introduce a new Hubble cell. It is the first of a series of refactoring patches gradually transitioning Hubble configuration and startup code away from the Cilium daemon. Currently, all the Hubble configuration and initialization is implemented in the Cilium daemon. Hubble related flags (e.g. --enable-hubble) are member of the DaemonConfig struct, and the Hubble startup code is under daemon/cmd/hubble.go. Most notably the launchHubble() func which handle all the Hubble component startup is called from cmd.newDaemon(). Over time, launchHubble() has become increasingly complex, initializing more and more Hubble subsystems while bailing out if any of them encounter a configuration or startup error. This has lead to a situation where a misconfiguration of one part of Hubble (e.g. metrics) could effectively disable other Hubble components without crashing the Cilium daemon. The recent addition of many new Hubble feature / subsystems, for example monitor event filtering[1], L7 data redaction[2], and k8s Events integration on drop notifications[3] introducing new Hubble related flags and logic both increase the potential of startup failure and the surface of affected features. Furthermore, most of the new feature increase Hubble's reliance on the Cilium daemon for either flag setup/parsing and/or dependencies injection, making it harder to decouple Hubble from the Daemon. Fortunately, many part of Cilium already use the hive/cell framework for configuration, startup, and dependency injection. Most notably, all Hubble dependencies are now available as cell, which open the possibility to convert Hubble components as cell themselves. The goal of this patch series is to refactor Hubble into its own new cell and decouple it from the Cilium daemon. Concretely, all the code under daemon/cmd/hubble.go will be moved away into the new cell, along with the flags and Hubble startup, until the point where it would be possible to create a hive with the Hubble cell but without the Cilium daemon. Splitting the Hubble components into smaller cells is left for a later time. [1]: cilium#24828 [2]: cilium#23887 [3]: cilium#29565 Signed-off-by: Alexandre Perrin <alex@isovalent.com>
This commit introduce a new Hubble cell. It is the first of a series of refactoring patches gradually transitioning Hubble configuration and startup code away from the Cilium daemon. Currently, all the Hubble configuration and initialization is implemented in the Cilium daemon. Hubble related flags (e.g. --enable-hubble) are member of the DaemonConfig struct, and the Hubble startup code is under daemon/cmd/hubble.go. Most notably the launchHubble() func which handle all the Hubble component startup is called from cmd.newDaemon(). Over time, launchHubble() has become increasingly complex, initializing more and more Hubble subsystems while bailing out if any of them encounter a configuration or startup error. This has lead to a situation where a misconfiguration of one part of Hubble (e.g. metrics) could effectively disable other Hubble components without crashing the Cilium daemon. The recent addition of many new Hubble feature / subsystems, for example monitor event filtering[1], L7 data redaction[2], and k8s Events integration on drop notifications[3] introducing new Hubble related flags and logic both increase the potential of startup failure and the surface of affected features. Furthermore, most of the new feature increase Hubble's reliance on the Cilium daemon for either flag setup/parsing and/or dependencies injection, making it harder to decouple Hubble from the Daemon. Fortunately, many part of Cilium already use the hive/cell framework for configuration, startup, and dependency injection. Most notably, all Hubble dependencies are now available as cell, which open the possibility to convert Hubble components as cell themselves. The goal of this patch series is to refactor Hubble into its own new cell and decouple it from the Cilium daemon. Concretely, all the code under daemon/cmd/hubble.go will be moved away into the new cell, along with the flags and Hubble startup, until the point where it would be possible to create a hive with the Hubble cell but without the Cilium daemon. Splitting the Hubble components into smaller cells is left for a later time. [1]: #24828 [2]: #23887 [3]: #29565 Signed-off-by: Alexandre Perrin <alex@isovalent.com>
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXXline if the commit addresses a particularGitHub issue.
Fixes: <commit-id>tag, thenplease add the commit author[s] as reviewer[s] to this issue.
Related #19929
Adds a
--hubble-monitor-eventsflag that let's a user control the event types that get to the hubble subsystem.Why
Nodes with traffic heavy workloads are doing 30k+ flows per second. This causes very high CPU utilization for the cilium-agent container. We'd like to be able to only monitor drop and other events and ignore the normal flows.
Flame graph showing that most of the CPU time is spent in the hubble subsystem decoding the events:
cc @michi-covalent