Probe for socket diag & termination functionality#42867
Probe for socket diag & termination functionality#42867tommyp1ckles merged 3 commits intocilium:mainfrom
Conversation
|
/test |
905f65c to
5bf1620
Compare
|
/test |
5bf1620 to
4129631
Compare
aditighag
left a comment
There was a problem hiding this comment.
Maybe I'm missing something, but how is this different from - https://github.com/cilium/cilium/blob/main/pkg/loadbalancer/reconciler/termination.go#L240-L240?
|
@aditighag This will probe for Specifically, Also, this will now probe prior to starting the Job and just avoid running that if it fails, so that we don't continuously retry this if it's destined to fail. (we can maybe get rid of the messages you linked to, however) |
4129631 to
59487cd
Compare
|
To add to this a bit, this is also intended to provide a single reusable way to do this probing for any other uses/vendor usage of the functionality usage in |
There was a problem hiding this comment.
I wonder whether we really need granular configuration checks? Is it not sufficient to document the additional required configurations in the kube-proxy guide?
Edit: Synced up offline. I think it makes sense to probe the configurations, and log warnings if not found. 👍 👍
09df1f5 to
780ba48
Compare
|
/test |
aditighag
left a comment
There was a problem hiding this comment.
LGTM, except for the go linter issues.
780ba48 to
08e9efc
Compare
|
/test |
08e9efc to
5ea2839
Compare
|
/test |
Functionality provided by the sockets library, namely iterating and closing sockets, requires certain kernel config options being enabled. Because there is no standard way for distributions to store kernel config values; the most reliable way to probe for these being enabled is to simply try to use the functionality. Therefore, this adds a probing function that will test for functions provided by: * CONFIG_INET_DIAG_DESTROY * CONFIG_INET_UDP_DIAG * CONFIG_INET_TCP_DIAG Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Previously we would just provide an error message when the terminate failed. Instead, we can proactively probe for this behavior using the new probe that checks for all necessary functionality required for socket termination. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
5ea2839 to
fdc4b29
Compare
|
/test |
| lo := net.IP{127, 0, 0, 1} | ||
| if err := Iterate(uint8(probe.proto), unix.AF_INET, probe.filterMask, func(s *netlink.Socket, err error) error { |
There was a problem hiding this comment.
late drive-by - is there any chance of this falling apart on an IPv6-only kernel?
The loadbalancer control plane will attempt to use the functionality in pkg/datapath/sockets/ in order to perform termination of sockets to backends that are stale/no-longer-exist.
This requires certain kernel config options to be enabled for correct functioning. Currently, when we fail to terminate a socket we will report an error indicating that this may be caused by such issues. However, this is insufficient as some of the functionality (i.e.
CONFIG_INET_UDP_DIAG&CONFIG_INET_TCP_DIAG) will not actually return an error if the config isn't enabled - instead the netlink call will silently return nothing and the socket and no sockets will be terminated).These changes add a probe that detects this functionality explicitly, and we use that to either completely avoid the loadbalancer socket termination job if it is not supported.
Testing
Hard to unit test this to any extent as this probes kernel functionality, I wrote a test utility and tested this under various kernels built with the specified config: