Skip to content

Introduce support for automatic datapath mode selection (bpf.datapathMode=auto)#43062

Merged
julianwiedmann merged 16 commits intocilium:mainfrom
ajmmm:pr/datapath-mode-auto
Jan 22, 2026
Merged

Introduce support for automatic datapath mode selection (bpf.datapathMode=auto)#43062
julianwiedmann merged 16 commits intocilium:mainfrom
ajmmm:pr/datapath-mode-auto

Conversation

@ajmmm
Copy link
Copy Markdown
Member

@ajmmm ajmmm commented Dec 1, 2025

High Level Summary:

  • Abstracts datapath mode selection logic into a new ConnectorMode type that expresses methods such as IsVeth(), IsNetkit() and IsLayer2().

  • Splits the current single datapath mode into configured mode and operational mode, which are encapsulated in the datapath ConnectorConfig.

  • Migrates all manual checks of DaemonConfig.DatapathMode to ConfigConnector via queries to the operational mode.

  • Introduces an automatic datapath mode, which probes the underlying host for netkit support. If enabled, netkit is used, with veth as the fallback, all at runtime. The default mode is still veth.

Additional notes:

  • Updates to cilium-dbg status:

    • Device Mode unchanged where configured/operational modes are equal (e.g. eth/veth, netkit/netkit).
    • Device Mode exposes split only where the modes differ (e.g. auto/veth, auto/netkit).
  • Updates to cilium-dbg metrics:

    • datapath_config expresses both operational_mode and configured_mode even if they are equal.
  • Fake datapath connector logic is provided for testing, that can still express different modes, where another component within Cilium may want to adapt its behaviour based on a specific datapath mode.

  • Previous endpoint restore logic was updated to detect incompatible datapath modes of existing pods - ref: daemon: Fail agent startup on incompatible datapath mode #42482. This logic has been updated to further probe between netkit and netkit-l2 modes, which were not previously detected by the referenced change.


Example outputs where automatic mode is used

$ cilium-dbg status | grep Device
Device Mode:             netkit [Configured: auto]

$ cilium-dbg metrics list | grep datapath_config
cilium_feature_datapath_config                                           configured_mode=auto operational_mode=netkit                                 1.000000

Log outputs

# mode=auto, netkit support present
$ kubectl -n kube-system logs cilium-wzf9p | grep connector
time=2025-12-01T16:06:30.825278627Z level=info source=/go/src/github.com/cilium/cilium/pkg/datapath/connector/config.go:156 msg="Datapath connector ready" module=agent.datapath.connector datapathMode=netkit
time=2025-12-01T16:06:36.000398588Z level=debug source=/go/src/github.com/cilium/cilium/pkg/datapath/connector/link.go:40 msg="Creating new linkpair" module=agent.datapath.connector linkConfig="{EndpointID: HostIfName:lxc_health PeerIfName:cilium PeerNamespace:0x40018ef340 GROIPv6MaxSize:65536 GSOIPv6MaxSize:65536 GROIPv4MaxSize:65536 GSOIPv4MaxSize:65536 DeviceMTU:1500 DeviceHeadroom:0 DeviceTailroom:0}" datapathMode=netkit
time=2025-12-01T16:06:36.002179755Z level=debug source=/go/src/github.com/cilium/cilium/pkg/datapath/connector/netkit.go:108 msg="Created netkit pair" module=agent.datapath.connector subsys=endpoint-connector netkitPair="[cilium lxc_health]" deviceHeadroom=0 deviceTailroom=0

# mode=auto, netkit support not present
$ kubectl -n cube-system logs cilium-5lvhn | grep connector
time=2025-12-01T16:12:30.986827195Z level=info source=/go/src/github.com/cilium/cilium/pkg/datapath/connector/config.go:123 msg="netkit probe failed, falling back to veth connector" module=agent.datapath.connector error="creating link: netkit not supported (requires >= v6.7)"
time=2025-12-01T16:12:30.986843237Z level=info source=/go/src/github.com/cilium/cilium/pkg/datapath/connector/config.go:156 msg="Datapath connector ready" module=agent.datapath.connector datapathMode=veth
time=2025-12-01T16:12:31.496641695Z level=debug source=/go/src/github.com/cilium/cilium/pkg/datapath/connector/link.go:40 msg="Creating new linkpair" module=agent.datapath.connector linkConfig="{EndpointID: HostIfName:lxc_health PeerIfName:cilium PeerNamespace:0x40007465b0 GROIPv6MaxSize:65536 GSOIPv6MaxSize:65536 GROIPv4MaxSize:65536 GSOIPv4MaxSize:65536 DeviceMTU:1500 DeviceHeadroom:0 DeviceTailroom:0}" datapathMode=veth
time=2025-12-01T16:12:31.497397737Z level=debug source=/go/src/github.com/cilium/cilium/pkg/datapath/connector/veth.go:82 msg="Created veth pair" module=agent.datapath.connector subsys=endpoint-connector vethPair="[cilium lxc_health]"

Release Notes

Introduces "auto" datapath-mode. If set, Cilium will probe the underlying host for netkit device support at startup. If supported, pods will be created with netkit devices, otherwise veth pairs will continue to be used.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 1, 2025
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Dec 1, 2025

/test

@ajmmm ajmmm force-pushed the pr/datapath-mode-auto branch from 0a22e71 to 118623c Compare December 1, 2025 15:49
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Dec 1, 2025

/test

@ajmmm ajmmm changed the title Pr/datapath mode auto Introduce support for automatic datapath mode selection (bpf.datapathMode=auto) Dec 1, 2025
@ajmmm ajmmm force-pushed the pr/datapath-mode-auto branch from 118623c to 646d15d Compare December 2, 2025 13:45
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Dec 2, 2025

/test

@ajmmm ajmmm force-pushed the pr/datapath-mode-auto branch from 646d15d to 9eef474 Compare December 2, 2025 15:44
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Dec 2, 2025

/test

@ajmmm ajmmm marked this pull request as ready for review December 2, 2025 15:56
@ajmmm ajmmm requested review from a team as code owners December 2, 2025 15:56
@tklauser tklauser added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Dec 4, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 4, 2025
ajmmm added 15 commits January 20, 2026 10:52
This commit implements a refactor to the datapath ConnectorConfig
structure so it can be embedded into the Orchestrator like other
datapath configs (Wireguard, IPsec, Tunnel, etc.)

Previously, the ConnectorConfig would have a dependency on the
Orchestrator in that the Hive startup hook entry would wait for the
datapath initialized signal exported from the Orchestrator. This was
done to provide a guarantee that we only probe for buffer margins
when the Loader has completed at least one initialization pass.

In preparation for the ConnectorConfig to express an operational
datapath mode down into the Loader, we need to reverse this dependency
so that the Loader can access it when setting up the data path.

Ultimately, the Orchestrator now just calls the ConfigConnector
Reinitialize() routine every time like other functions.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Remove the datapath mode switch in ciliumHealthManager cleanupEndpoint()
because they all do the same thing anyway.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This commit relocates the datapathMode validation logic from the daemon
component into the datapath/connector package. This seems appropriate
given we are separating 'configured' mode from 'operational' mode to
facilitate auto-discovery.

This commit also includes a minor tweak to how the daemon pulls in the
datapath ConnectorConfig interface to simplify imports.

Finally, this commit includes updates to the basic connector config
tests function better on test hosts that do not provide netkit.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Minor refactor to internal HeaderfileWriter.writeTemplateConfig()
function to just accept the local node configuration and endpoint
configuration as input parameters, rather than specific values from
these structures.

This simplifies the function signature and allows us to replace
datapath mode checking for netkit with local node configuration
in the next commit.

No functional change in this commit.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This commit introduces DatapathIsLayer2 into the local node configuration
structure. This carries true if the operational datapath mode requires
that workload-facing network interfaces process ARP, and false otherwise.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This commit migrates the /config API from DaemonConfig.DatapathMode to
the ConnectorConfig, which means this API now expresses configured mode
separately to operational mode.

The existing JSON properties ("datapathMode", "datapath-mode") now
carry the operational datapath mode. This has been done for backwards
compatibility - e.g. a newer client can interface with an older API.

The new JSON properties ("configuredDatapathMode", "configured-datapath-mode")
carry the configured datapath mode. At the time of writing, these will
carry identical values. However, in a future commit, an "auto" mode will
be added as a valid option for the configured datapath mode only.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Expands the Cilium status output to correctly differentiate between
configured and operational datapath modes. This will alter the output
when a future commit introduces support for datapathMode=auto so the
modes are visible to administrators.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Introduces the start of a streamlined datapath link creation mechanism
for use when creating new endpoints within Cilium.

The mechanism is simply a NewLinkPair() method of ConnectorConfig, which
accepts a LinkConfig and associated sysctls. The aspiration is that other
packages can remain ignorant of the operational datapath mode, and over
time, support mixed-mode datapaths for migration purposes (e.g. veth
to netkit).

The LinkConfig structure is repositioned so it can be imported by other
components via the standard datapath types package. It is also extended
with other fields, such as EndpointID, HostIfName, PeerIfName, and
PeerNamespace. These values alter the behaviour of the underlying
implementation.

- If EndpointID is not specified, a HostIfName and PeerIfName must be.

- If EndpointID is specified, Cilium will auto-generate HostIfName and
  PeerIfName.

- The peer link will be automatically be switched into the NetNS
  provided by PeerNamespace.

This commit also adds a new LinkPair type, which is returned by the
NewLinkPair() method, and provides some ancillary helpers.

Finally, this also provides a DeleteLinkPair() function which operates
on the same LinkConfig structure. This is provided for consistency
with the creation routine, to abstract translation of EndpointID.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This commit migrates the Cilium Health Manager to utilise the datapath
connector.NewLinkPair() method, rather than calling specific driver
functions directly.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This commit migrates the Cilium CNI plugin to use the new datapath
connector NewLinkPair() function, rather than calling specific driver
functions directly.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
…ector.

This commitmigrates the Cilium docker plugin to use the new datapath
connector NewLinkPair() function, rather than calling specific driver
functions directly.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
…fig.

This commit introduces new DatapathIsNetkit field into the local node
config structure, which is set by the ConfigConnector when instantiating
a new instance of the structure.

This commit also migrates the loader logic to carry the value of this
field into the loader bpf config structures, rather than using the
daemon config.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This commit migrates the endpoint restore logic to use the new datapath
connector logic to validate detected endpoints are compatible with the
current operational datapath mode of the agent.

Previous logic assumed that if the underlying link of an endpoing was
netkit, that we are compatible. However, it's not possible to derive
the mode of the driver from this.

This commit introduces logic in the connector that provides necessary
compatibility checks while also probing the netkit structure returned
by the kernel to verify mode.

This commit also tweaks the error log raised if incompatible endpoints
are detected, to include a list of "detected" incompatible modes. While
it's probably safe to assume the agent will never detect more than 1 type
of incompatible link, this was modified so failures in this assumption
(e.g. bugs) won't produce incorrect logs.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This commit introduces a new automatic datapath mode. If enabled, the
connector will probe the host kernel for netkit support when creating
a new instance of the ConnectorConfig.

If configured mode is auto: if netkit support is found, then the
operational mode will be set to netkit. Otherwise, it will default
back to veth.

The difference is visible through cilium-dbg:

  $ cilium-dbg status | grep Mode
  Attach Mode:             TCX
  Device Mode:             netkit [Configured: auto]

And REST API:

  $ curl -s --unix-socket /var/run/cilium/cilium.sock http://localhost/v1/config | jq '.status | {datapathMode, configuredDatapathMode}'
  {
    "datapathMode": "netkit",
    "configuredDatapathMode": "auto"
  {

This does not effect the behaviour of hard-coding the datapath-mode
to either netkit or netkit-l2. If either mode is manually configured,
and the netkit probe fails, cilium will fail to start.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This commit updates the feature metrics logic so we correctly report the
configured vs. operational datapath mode.

For example, when configured in automatic mode and the connector detects
netkit support, metrics will now show this as:

  $ cilium-dbg metrics list | grep datapath_config
  cilium_feature_datapath_config                                           configured_mode=auto operational_mode=netkit

This commit also updates all other feature metric tests to explicitly set
datapathMode=veth in the DaemonConfig structure, to avoid complications
with the first entry now being "auto."

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
@ajmmm ajmmm force-pushed the pr/datapath-mode-auto branch from 1451058 to e491eed Compare January 20, 2026 11:07
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Jan 20, 2026

/test

Copy link
Copy Markdown
Member

@borkmann borkmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, great work! one small nit but could also be follow-up is to document this option in https://github.com/cilium/cilium/blob/main/Documentation/operations/performance/tuning.rst#netkit-device-mode

@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Jan 21, 2026

/ci-ipsec-e2e

@julianwiedmann
Copy link
Copy Markdown
Member

lgtm, great work! one small nit but could also be follow-up is to document this option in https://github.com/cilium/cilium/blob/main/Documentation/operations/performance/tuning.rst#netkit-device-mode

I'd say let's merge this big chunk, and address docs in a follow-up PR.

@julianwiedmann julianwiedmann added this pull request to the merge queue Jan 22, 2026
Merged via the queue into cilium:main with commit ffc6d5c Jan 22, 2026
75 of 76 checks passed
@ajmmm ajmmm deleted the pr/datapath-mode-auto branch January 22, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. feature/netkit release-note/minor This PR changes functionality that users may find relevant to operating Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.