tests: remove identity manager from ignored error messages#42982
tests: remove identity manager from ignored error messages#42982pchaigno merged 1 commit intocilium:mainfrom
Conversation
|
Commit f07293f does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
|
/test |
|
/ci-integration |
1 similar comment
|
/ci-integration |
christarazi
left a comment
There was a problem hiding this comment.
LGTM. I added to the PR desc. that this PR fixes #16419.
|
/test |
f07293f to
77582d4
Compare
|
/test |
|
Wow, this triggered in the downgrade to v1.18.4 test (here)! I'm pretty sure this is triggering the issue fixed in #42662 and #42661! I guess we then either wait for a new release to merge this PR, or we start backporting the fixes. cc @pchaigno for opening the original issue here. |
|
@odinuge Nice! I've marked the mentioned PRs for backport to v1.18 since that still falls under the backporting criteria. |
|
Both PRs are now backported. Once the new v1.18.5 release is built, I'll rebase this and mark ready-for-review. |
|
Why do we need to wait for a v1.18 release? AFAIK, the up/downgrade tests always test the latest branches (v1.18 <> main), not the releases. |
|
Ahh, interesting. I based it off the loglines stating; |
|
I believe that's just the CLI trying to detect the last version to know what tests can be executed. |
77582d4 to
03925db
Compare
|
Ahh, interesting! I've rebased and fixed the conflict now, so lets see! |
|
/test |
1 similar comment
|
/test |
Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Odin Ugedal <ougedal@palantir.com>
03925db to
26a5df9
Compare
|
/test |
|
/ci-ipsec-e2e |
|
/ci-gateway-api |
|
/ci-ginkgo |
|
^ all looked like other flakes, so I'll try rerun |
|
/ci-clustermesh |
|
Looks like @pchaigno is correct that the tests runs the agent code from the latest commit on each release branch - so this looks g2g now. |
|
Thanks! |
…0 ) (#584) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [aqua:cilium/cilium-cli](https://redirect.github.com/cilium/cilium-cli) | minor | `0.18.9` → `0.19.0` | --- > [!WARNING] > Some dependencies could not be looked up. Check the Dependency Dashboard for more information. --- ### Release Notes <details> <summary>cilium/cilium-cli (aqua:cilium/cilium-cli)</summary> ### [`v0.19.0`](https://redirect.github.com/cilium/cilium-cli/releases/tag/v0.19.0) [Compare Source](https://redirect.github.com/cilium/cilium-cli/compare/v0.18.9...v0.19.0) ## Summary of Changes **CI Changes:** - Add L7 policy traffic disruption tests ([cilium/cilium#42150](https://redirect.github.com/cilium/cilium/issues/42150), [@​fristonio](https://redirect.github.com/fristonio)) - Cilium-cli SNI connectivity tests now retry expected successful operations to recover from failures due to external upstream issues. ([cilium/cilium#42980](https://redirect.github.com/cilium/cilium/issues/42980), [@​jrajahalme](https://redirect.github.com/jrajahalme)) - cli: connectivity: fix typo in L7 LB tests ([cilium/cilium#43610](https://redirect.github.com/cilium/cilium/issues/43610), [@​julianwiedmann](https://redirect.github.com/julianwiedmann)) - Fix intermittent NodePort connectivity test timeouts in dual-stack clusters by validating NodePort readiness on all node IP addresses during test setup. ([cilium/cilium#40812](https://redirect.github.com/cilium/cilium/issues/40812), [@​pillai-ashwin](https://redirect.github.com/pillai-ashwin)) - tests: remove identity manager from ignored error messages ([cilium/cilium#42982](https://redirect.github.com/cilium/cilium/issues/42982), [@​odinuge](https://redirect.github.com/odinuge)) **Misc Changes:** - chore(deps): update all-dependencies (main) ([cilium/cilium#43169](https://redirect.github.com/cilium/cilium/issues/43169), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - chore(deps): update all-dependencies (main) ([cilium/cilium#43456](https://redirect.github.com/cilium/cilium/issues/43456), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - chore(deps): update all-dependencies (main) ([cilium/cilium#43508](https://redirect.github.com/cilium/cilium/issues/43508), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - chore(deps): update base-images (main) ([cilium/cilium#43457](https://redirect.github.com/cilium/cilium/issues/43457), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - chore(deps): update base-images (main) ([cilium/cilium#43538](https://redirect.github.com/cilium/cilium/issues/43538), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - chore(deps): update docker.io/library/golang:1.25.5 docker digest to [`a22b2e6`](https://redirect.github.com/cilium/cilium-cli/commit/a22b2e6) (main) ([cilium/cilium#43303](https://redirect.github.com/cilium/cilium/issues/43303), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - chore(deps): update go to v1.25.5 (main) ([cilium/cilium#43173](https://redirect.github.com/cilium/cilium/issues/43173), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - cilium-cli/connectivity: remove matcher for bpf/init.sh errors ([cilium/cilium#43109](https://redirect.github.com/cilium/cilium/issues/43109), [@​tklauser](https://redirect.github.com/tklauser)) - cilium-cli: convert net.IP to netip.Addr ([cilium/cilium#42371](https://redirect.github.com/cilium/cilium/issues/42371), [@​phuhung273](https://redirect.github.com/phuhung273)) - cli: Update `network-perf` image ref ([cilium/cilium#43297](https://redirect.github.com/cilium/cilium/issues/43297), [@​HadrienPatte](https://redirect.github.com/HadrienPatte)) - chore(deps): update golangci/golangci-lint-action action to v9.2.0 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3148](https://redirect.github.com/cilium/cilium-cli/pull/3148) - Update stable release to v0.18.9 by [@​michi-covalent](https://redirect.github.com/michi-covalent) in [#​3149](https://redirect.github.com/cilium/cilium-cli/pull/3149) - chore(deps): update golangci/golangci-lint docker tag to v2.7.0 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3151](https://redirect.github.com/cilium/cilium-cli/pull/3151) - chore(deps): update go to v1.25.5 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3153](https://redirect.github.com/cilium/cilium-cli/pull/3153) - ci: clean up disk space in release workflow by [@​tklauser](https://redirect.github.com/tklauser) in [#​3154](https://redirect.github.com/cilium/cilium-cli/pull/3154) - chore(deps): update actions/stale action to v10.1.1 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3150](https://redirect.github.com/cilium/cilium-cli/pull/3150) - chore(deps): update gcr.io/distroless/static:latest docker digest to [`4b2a093`](https://redirect.github.com/cilium/cilium-cli/commit/4b2a093) by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3152](https://redirect.github.com/cilium/cilium-cli/pull/3152) - chore(deps): update golangci/golangci-lint docker tag to v2.7.2 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3155](https://redirect.github.com/cilium/cilium-cli/pull/3155) - chore(deps): update docker.io/library/golang:1.25.5 docker digest to [`a22b2e6`](https://redirect.github.com/cilium/cilium-cli/commit/a22b2e6) by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3156](https://redirect.github.com/cilium/cilium-cli/pull/3156) - chore(deps): update actions/upload-artifact action to v6 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3157](https://redirect.github.com/cilium/cilium-cli/pull/3157) - chore(deps): update docker.io/library/golang:1.25.5 docker digest to [`36b4f45`](https://redirect.github.com/cilium/cilium-cli/commit/36b4f45) by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3160](https://redirect.github.com/cilium/cilium-cli/pull/3160) - chore(deps): update dependency cilium/cilium to v1.18.5 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3159](https://redirect.github.com/cilium/cilium-cli/pull/3159) - chore(deps): update dependency kubernetes-sigs/kind to v0.31.0 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3158](https://redirect.github.com/cilium/cilium-cli/pull/3158) - chore(deps): update docker/setup-buildx-action action to v3.12.0 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3162](https://redirect.github.com/cilium/cilium-cli/pull/3162) - chore(deps): update golangci/golangci-lint docker tag to v2.8.0 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3163](https://redirect.github.com/cilium/cilium-cli/pull/3163) - chore(deps): update docker.io/library/golang:1.25.5 docker digest to [`6cc2338`](https://redirect.github.com/cilium/cilium-cli/commit/6cc2338) by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3164](https://redirect.github.com/cilium/cilium-cli/pull/3164) - chore(deps): update gcr.io/distroless/static:latest docker digest to [`cd64bec`](https://redirect.github.com/cilium/cilium-cli/commit/cd64bec) by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3165](https://redirect.github.com/cilium/cilium-cli/pull/3165) - chore(deps): update actions/setup-go action to v6.2.0 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3166](https://redirect.github.com/cilium/cilium-cli/pull/3166) - Prepare for v0.19.0 release by [@​tklauser](https://redirect.github.com/tklauser) in [#​3167](https://redirect.github.com/cilium/cilium-cli/pull/3167) **Full Changelog**: <cilium/cilium-cli@v0.18.9...v0.19.0> </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://redirect.github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi44MS4yIiwidXBkYXRlZEluVmVyIjoiNDIuODEuMyIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsidHlwZS9taW5vciJdfQ==--> Co-authored-by: zocimek-renovate[bot] <134739422+zocimek-renovate[bot]@users.noreply.github.com>
Due to cilium#42661 and cilium#42662 not being backported yet to v1.17, CI fails in the upgrade/downgrade test with this error. Therefore, we must add it to the ignore list until the PRs are at least backported to v1.17. The error was removed from the ignore list in cilium#42982. Suggested-by: Marco Iorio <marco.iorio@isovalent.com> Suggested-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
Due to cilium#42661 and cilium#42662 not being backported yet to v1.17, CI fails in the upgrade/downgrade test with this error. Therefore, we must add it to the ignore list until the PRs are at least backported to v1.17. The error was removed from the ignore list in cilium#42982. Suggested-by: Marco Iorio <marco.iorio@isovalent.com> Suggested-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
Due to #42661 and #42662 not being backported yet to v1.17, CI fails in the upgrade/downgrade test with this error. Therefore, we must add it to the ignore list until the PRs are at least backported to v1.17. The error was removed from the ignore list in #42982. Suggested-by: Marco Iorio <marco.iorio@isovalent.com> Suggested-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
* ci: Extend test timeout for ci-verifier
Verifier tests occasionally take a bit over 20m, so extend
the timeout to 25m.
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
* ci: e2e: enhance readability of workflow job name
This commit updates the name of the "Setup & Test" job in the
GitHub Actions workflow for e2e upgrade tests to include only the matrix
parameters "name" and "mode". This change improves the readability
of the workflow runs by providing more context about the specific
configuration being tested.
Prior to this, the name of each job contained the whole matrix combination,
which in the UI resulted to be cut off and not readable. Given that now
we use the same workflow file for running both `minor` and `patch` upgrades,
let's make the displayed name simpler.
The result will be `Setup & Test (ipsec-1, minor)`.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* ci: e2e: log matrix configuration in each job
This commits adds as a first step of the `Setup & Test` job for e2e-upgrade
a simple step to dump the current matrix configuration being tested.
The previous commit, modified the title to simply display the matrix entry
name and mode (e.g., `Setup & Test (ipsec-1, minor)`) rather than the
whole configuration. In UI, that would result to be truncated anyway.
It is true that, given the matrix.name (e.g., ipsec-1), a user can open the
specific file and lookup the configuration required, but I think that
having a step where we dump it would speed up and easy debuggability in CI.
The output would be similar to:
```
> Log Matrix Configuration
Current matrix configuration:
{
"name": "wireguard-1",
"kernel": "5.10",
"kube-proxy": "iptables",
"kpr": "true",
"devices": "{eth0,eth1}",
"secondary-network": "true",
"tunnel": "vxlan",
"encryption": "wireguard",
"encryption-node": "false",
"lb-mode": "snat",
"endpoint-routes": "true",
"egress-gateway": "true",
"ingress-controller": "true",
"mode": "minor"
}
```
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* endpoint: Log labels as structured JSON objects
Standardize logging in pkg/endpoint so that identityLabels and related fields are logged as structured JSON objects instead of comma-separated strings by explicitly casting labels.Labels to map[string]labels.Label.
Signed-off-by: Jie WU <wujie@google.com>
```release-note
endpoint: Log labels as structured JSON objects
```
* feat(helm): hubble-ui containers set to pss-restricted
This sets the hubble-ui pods/containers to match k8s
pss-restricted profile along with the optional
`readOnlyRootFilesystem: true`
Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
* helm: allow multicluster-services installCRDs to update CRDs
Previously cilium-operator fails to start if MCS/installCRDs is enabled
because it does not have permissions to update the CRD with this log
message:
level=error msg="Unable to update CRD"
module=operator.operator-controlplane.leader-lifecycle.create-crds
name=serviceimports.multicluster.x-k8s.io
error="customresourcedefinitions.apiextensions.k8s.io
\"serviceimports.multicluster.x-k8s.io\" is forbidden: User
\"system:serviceaccount:kube-system:cilium-operator\" cannot update
resource \"customresourcedefinitions\" in API group
\"apiextensions.k8s.io\" at the cluster scope"
This patch adds the necessary permissions to cilium-operator if you have
mcs/installCRDs enabled
Fixes: #44210
Fixes: 3874013329d0 ("clustermesh: add config for auto installing
MCS-API CRDs")
Signed-off-by: Florian Ströger <stroeger@youniqx.com>
* policy: (mechanical) refactor out flow lookup types
A subsequent commit will include an alternate policy iteration system,
so it will be nice to move the types to policy/types.
This also removes the now-useless Decision type, as it's not used
anywhere in the codebase.
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
* policy: add a simple iterative policy simulator
This is a simple userspace tool that executes rules step-by-step. It's
purpose will be to validate more complex policy scenarios, ideally by
fuzzing.
To ensure it's output matches that of the existing policy engine, it
matches the LookupFlow method signature, and existing tests validate
that the simulation engine returns the same verdict.
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
* Add fuzz-based policy testing
This generates random policy corpuses and compares MapState-based policy
calculation with the iterative simulator.
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
* policy: Fix fuzz testing
Avoid using *testing.F for the logger as then any log within the fuzz
test would fail.
Fix the order of expected and actual for require.Equal.
Add more debugging.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Hide precedence details better
Hide precedence details from the policymap package.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Add optional indexing by identity
Add optional mapState indexing by identity to support incremental removal
of generated keys. This is only needed for deletion pass entries, so the
index is only used if the policy has pass verdicts.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Add default deny rule when pass verdict is used
Proper processing of pass verdicts requires the default deny rule to be
explicitly added to the mapstate so that it can be seen by pass verdict
entries.
The default rule is added to the next tier if any non-default tiers or
priorities are in use, of if the traffic direction has any pass
rules. This way the pass rule can pass to the added default deny (or
allow) rule.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Fix L7Filter precedence handling for pass verdicts
Deny takes precedence over allow and pass, allow takes precedence over
pass. Define new HasPrecedenceOver() to handle this instead of using just
IsDeny() like before. Would be simpler if Allow was not the zero value,
but changing that would require changing all unit testing code that uses
it as the default.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Fix per-tier priority range allocation
Fix tier base priority calculation. When figuring out the priority range
for each tier, the full range of the remaining tiers must be included to
add enough space for pass verdicts on higher tiers. Then, when setting
the base priotity of each tier, this has to be reversed.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Add Fuzz cases
Commit fuzzer cases found during development.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Add generated L3/4 entries, multipass support
A pass of a specific identity to a lower tier rule with wildcard identity
should pass the given identity only and keep the wildcard entry at the
original precedence to take care of traffic with other identities. Since
the original entry needs to be kept, a new generated entry with the
identity from the pass entry and the L4 from the passed to entry must be
added.
We missed this case earlier due to BroaderOrEqualKeys only iterating
wildcard identity entries when the new key is a wildcard entry. Entries
that have a broader or equal L4 but more specific L3 are not as a whole
"broader or equal". To handle the need for generated entries for the pass
verdict processing "BroaderOrEqualKeys" is changed to also iterate all
specific L3 keys if the L4 is broader or equal and the given key has the
wildcard identity. The old behavior is retained with
CoveringBroaderOrEqualKeys(). Similarly, NarrowerOrEqualKeys() is renamed
as CoveringNarrowerOrEqualKeys() while NarrowerOrEqualKeys() now also
iterates keys with the wildcard identity when the given key has a
specific identity.
The addition of generated entries requires these entries to be deleted
when that identity is incrementally deleted. Since selector cache is
transactional we can delete all keys with the deleted identity, when the
first key with that identity is deleted. To make this efficient we use
the new id index.
To add support for pass verdicts at multiple tiers, the pass metadata is
now stored as a slice. Overhead to non-pass entries is reduced by storing
the slice via a pointer ('passes'), as most mapStateEntries would not
have any pass metadata.
If 'passes' is non-nil, then the pointed-to slice must have at least
one element, and all elements must have non-zero 'passPrecedence'.
When merging pass metadata we clone the slice to be mutated so that the
same slice can safely be used in multiple entries.
Split insertWithPasses() from insertWithChanges(); insertWithPasses() is
only calling it if the policy has any pass verdicts. This reduces the
chance of regressions for non-pass policies.
Log a warning if a policy with pass verdicts is also using auth
requirements, as this combination has not been implemented. Adjust a test
to not claim all features when that is not the case.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* pkg/subnet: Fix tag in config subnets field
The Subnets field in the config was declaring a json tag, leading to a
failure of the agent `hive` command (see below). This is due to the fact
that the hive relies on a mapstructure Decoder, not a Json one, and
therefore require a mapstructure tag when the config field name is not
equal to the flag name.
Fix the tag on the field using a mapstructure one.
```
make -C daemon/ && ./daemon/cilium-agent hive
...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xacd46e]
goroutine 1 [running]:
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000b540f0, 0xa?, 0xc000cba750)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x12e
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000a9bdd0, 0x6?, 0xc000cba750)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x134
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000a799b0, 0x2?, 0xc000cba750)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x134
github.com/cilium/hive.(*Hive).PrintObjects(0xc0009554a0, {0x51e1240, 0xc0000c0030}, 0x0?)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/hive.go:459 +0x18f
github.com/cilium/hive.(*Hive).Command.func1(0xc000e0fc00?, {0x4b05b96?, 0x4?, 0x4b05aca?})
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/command.go:21 +0x2d
github.com/spf13/cobra.(*Command).execute(0xc000953508, {0x86bac20, 0x0, 0x0})
/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1015 +0xb02
github.com/spf13/cobra.(*Command).ExecuteC(0xc000952f08)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1148 +0x465
github.com/spf13/cobra.(*Command).Execute(...)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1071
github.com/cilium/cilium/daemon/cmd.Execute(0x4d769d0?)
/home/ffalzoi/cilium/cilium-2/daemon/cmd/root.go:89 +0x13
main.main()
/home/ffalzoi/cilium/cilium-2/daemon/main.go:15 +0x1f
```
Fixes: d395d73ad3 ("pkg/subnet: Add subnet config watcher and manager")
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
* linux-desired-device: introducing Cilium managed devices reconciler
Adding new reconciler in Cilium datapath/linux which can manage
life cycle of linux links created by Cilium.
Created devices are persisted on disk using write-ahead-log, upon
restart the owners of the devices are expected to redo the
configuration before calling finializer. Stale devices will be pruned.
Implementation is inspired by linux/route/reconciler.
Signed-off-by: harsimran pabla <hpabla@isovalent.com>
* linux-desired-device: script tests for desired-devices
Adding script tests to validate device creation and persistence.
Signed-off-by: harsimran pabla <hpabla@isovalent.com>
* linux-desired-device: hook desired-devices cell into main infra
Signed-off-by: harsimran pabla <hpabla@isovalent.com>
* fix: helm intervalSeconds value render bug
intervalSeconds is always of an integral type, no need to check kindIs float64
Fixes: #44206
Signed-off-by: jayl1e <jayl1e@outlook.com>
* bpf: source tuple hash seeds from node config
Move IPv4/IPv6 hash init seeds into node config and wire them from
Maglev config. BPF tuple hashing now reads CONFIG(hash_init{4,6}_seed)
instead of compile-time defines, and the legacy HASH_INIT* defines are
removed from the header writer and node_config.h.
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
* feat(install) Allow hubble to run with hostUsers: false
The hubble components do not require direct mapping of container
users to system users.
Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
* docs: fix typos in comments
Signed-off-by: Yohei Yamamoto <yhymmt123@gmail.com>
* bpf: correct comments in cil_from_netdev function
Removed conditions from the comment block describing the cil_from_netdev function since the logic has been changed
Signed-off-by: Liyi Huang <liyi.huang@isovalent.com>
* chore(deps): update all lvh-images main
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all-dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all github action dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* bpf: lb: Decouple DNAT operation from LB key
Instead of passing the lb4_key/lb6_key to lb4_xlate/lb6_xlate for
checksum calculation and port translation, pass the original destination
address and port directly from the CT tuple.
This change:
1. Removes the key parameter from lb4_xlate/lb6_xlate functions
2. Removes the key parameter from lb4_dnat_request/lb6_dnat_request
The CT tuple already contains the same values that were being read from
the key structure:
- tuple->daddr == key->address (original destination)
- tuple->sport == key->dport (reversed port order in CT tuple)
By removing the xlate path's dependency on key, we can now directly
modify key->address = 0 for wildcard lookups without creating a copy,
simplifying the backend selection logic.
Co-authored-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
* bpf: lxc: Handle DSR for remote NodePort services on source node
When DSR and PER_PACKET_LB are enabled, connections fail if a client pod
sends a request to a remote node's NodePort service while the server
pod is located on the same node as the client.
The root cause is that the remote node only performs DNAT, setting the
packet's source address to client's node address, leading to a hairpin
problem. Consequently, the originating node cannot perform the necessary
REV NAT for the reply packets.
To resolve this, remote NodePort service requests are now handled on the
source node when DSR is enabled, similar to the behavior of socket-level
load balancing.
Implementation details:
- Add lb4/6_lookup_wildcard_nodeport_service(): When ENABLE_DSR is
defined and the regular service lookup fails, check if the destination
is a remote node's IP with a NodePort port range. If so, perform a
wildcard lookup (address=0) to find the NodePort service.
- Use wildcard key for backend selection: When dsr_internal flag is set,
set key->address to 0 before calling lb4/6_select_backend_id(). This
applies to both CT_NEW (new connections) and CT_REPLY (backend
re-selection for existing connections). This is needed for backend
selection algorithms that use slot lookup (e.g., Random), which look
up backend slots via lb4/6_lookup_backend_slot() using the service
key. Without a wildcard key, the lookup would fail because backend
slot entries are stored with the wildcard service key, not with the
remote node's IP.
- Store original destination in CT entry: The original destination
address and port (remote node IP and NodePort) are stored in
ct_state_new.nat_addr/nat_port, which will be written to the CT entry
for use in reply path RevNAT processing.
- Use cilium_dsr_nat_buffer per-CPU map: The NAT info is detected in
__per_packet_lb_svc_xlate_4/6(), but the CT entry is created after DNAT
when the original destination info is no longer available in the packet.
The per-CPU buffer preserves this info across the DNAT operation.
Existing connection handling:
- This change only affects DSR traffic destined to remote node's
NodePort. The wildcard lookup is triggered when lb4/6_lookup_service()
fails, but it only processes packets where the destination is a remote
node IP with a port in the NodePort range. Other traffic that fails
the regular lookup is unaffected.
Fixes: #41962
Co-authored-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
* bpf: lxc: Add RevNAT support for DSR remote NodePort connections
Add reverse NAT support for reply packets of DSR remote NodePort
connections. The forward path stores the original destination address
and port in the CT entry's nat_addr/nat_port fields, which are now
used during reply processing.
Implementation details:
- Extend lb4/6_rev_nat() signature: Add nat_addr and nat_port parameters.
When nat_port is non-zero, use these values directly for RevNAT.
When nat_port is zero, fall back to the existing rev_nat_index lookup
for backward compatibility with existing connections.
- Modify ct_lookup_fill_state(): Copy nat_addr and nat_port from the
CT entry to ct_state, making them available for reply processing.
- Update ipv4/6_policy(): Check for nat_port in addition to
rev_nat_index when deciding whether to perform RevNAT. Pass the
CT entry's NAT information to lb4/6_rev_nat().
- Update nodeport_rev_dnat_ipv4/6(): Adapt to the new lb4/6_rev_nat()
signature by passing NULL/0 for nat_addr/nat_port (these paths use
the traditional rev_nat_index lookup).
Existing connection handling:
1. New connections (created after this patch):
- Forward path stores nat_addr/nat_port in CT entry
- Reply path uses CT entry's nat_addr/nat_port for RevNAT
2. Existing connections (regular NodePort/DSR traffic):
- CT entry has nat_addr=0, nat_port=0
- lb4/6_rev_nat checks nat_port first:
- If nat_port != 0: use nat_addr/nat_port directly
- If nat_port == 0: fall back to rev_nat_index lookup
- This ensures existing connections continue to work
3. Upgrade scenario:
- Existing connections keep working via rev_nat_index fallback
- New DSR remote nodeport connections use nat_addr/nat_port
- No connection disruption during rolling upgrade
Co-authored-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
* bpf: tests: Add tests for DSR remote NodePort handling
Add BPF unit tests to verify the DSR remote NodePort functionality
for both IPv4 and IPv6.
Test scenarios for DSR mode (tc_lxc_lb4/6_dsr_nodeport.c):
1. Pod -> Remote NodePort -> Local backend (forward path)
- Client pod sends packet to remote node's NodePort
- LB selects a backend on the local node (same node as client)
- Verifies packet is DNATed to local backend IP and port
- Verifies CT entry contains correct nat_addr (remote node IP)
and nat_port (NodePort)
2. Local backend -> Pod (reply path)
- Backend sends reply packet to client
- Verifies RevNAT is applied correctly
- Source IP/port changed to remote node IP and NodePort
3. Pod -> Remote NodePort -> Self (hairpin)
- Client pod sends packet to remote node's NodePort
- LB selects the client pod itself as the backend
- Verifies DNAT to client IP and backend port
- Verifies SNAT to loopback IP for hairpin flow
4. Hairpin reply
- Pod replies to loopback IP
- Verifies RevNAT restores remote node IP and NodePort
5. Existing connection handling (UDP)
- First packet establishes CT entry via legacy path
- Second packet should use existing CT entry
- Verifies wildcard lookup is skipped for existing connections
Test scenarios for Hybrid mode (tc_lxc_lb4/6_hybrid_dsr_nodeport.c):
1. DSR service handling
- Verifies DSR-enabled service triggers wildcard lookup
- Packet is DNATed to local backend
2. SNAT service handling
- Verifies SNAT service does NOT trigger wildcard lookup
- Packet passes through without DNAT (handled by remote node)
Co-authored-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
* chore(deps): update quay.io/cilium/cilium-envoy docker tag to v1.36.5-1770440937-14da6b9c8a54244f0a67cd90a0deb83e5f110a4a
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update module sigs.k8s.io/kube-api-linter to v0.0.0-20260206102632-39e3d06a2850
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update base-images
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* images: update cilium-{runtime,builder}
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
* loadbalancer/maps: let SourceRangeKey.GetCIDR return netip.Prefix
Its only caller (apart from use in log strings) directly converts the
result to a netip.Prefix. Rather than first constructing a *cidr.CIDR
only to then convert it to netip.Prefix, construct a netip.Prefix right
away. Also rename the method accordingly.
This slightly improves pkg/loadbalancer/benchmark results:
Before:
Memory statistics from N=10 iterations:
Min: Allocated 490378kB in total, 2471963 objects / 140742kB still reachable (per service: 49 objs, 10042B alloc, 2882B in-use)
Avg: Allocated 501584kB in total, 2957195 objects / 177049kB still reachable (per service: 59 objs, 10272B alloc, 3625B in-use)
Max: Allocated 543366kB in total, 3722287 objects / 227653kB still reachable (per service: 74 objs, 11128B alloc, 4662B in-use)
After:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service: 45 objs, 10018B alloc, 2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service: 57 objs, 10256B alloc, 3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service: 74 objs, 11121B alloc, 4660B in-use)
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* loadbalancer/maps: let ServiceKey.GetAddress return netip.Addr
This simplifies call sites and avoids an unnecessary conversion in
the reconciler's (*BPFOps).pruneServiceMaps method.
pkg/loadbalancer/benchmark results:
Before:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service: 45 objs, 10018B alloc, 2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service: 57 objs, 10256B alloc, 3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service: 74 objs, 11121B alloc, 4660B in-use)
After:
Memory statistics from N=10 iterations:
Min: Allocated 487023kB in total, 1944085 objects / 99675kB still reachable (per service: 38 objs, 9974B alloc, 2041B in-use)
Avg: Allocated 498657kB in total, 2813561 objects / 166829kB still reachable (per service: 56 objs, 10212B alloc, 3416B in-use)
Max: Allocated 542794kB in total, 3722246 objects / 227646kB still reachable (per service: 74 objs, 11116B alloc, 4662B in-use)
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* loadbalancer/maps: remove unused SkipLBMap delete methods
The DeleteLB{4,6}By* methods don't have callers anymore since
commit 6fa7f8129d1a ("loadbalancer/legacy: Remove the old
control-plane"). Remove them.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* fix(deps): update all go dependencies main
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* docs: Update docsearch to v4.5.4
Pull in the latest theme with newer docsearch plugin version.
Signed-off-by: Joe Stringer <joe@cilium.io>
* ci: update docs-builder
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
* Use binary.NativeEndian instead of nl.NativeEndian
Use the NativeEndian native-endian var provided by the Go standard
library encoding/binary package instead of the version from the
netlink/nl package.
While at it, also check the length of the handle returned by
unix.NamtToHandleAt before accessing it.
Follow-up to commit 2c1b49cac70b ("byteorder: use binary.NativeEndian")
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* datapath: fix panic during datapath reinitialization
This commit fixes a cilium-agent panic during datapath reinitialization
when a DirectRouting device is required but not configured.
This can happen when the direct routing device drops, for example during
networkd restart.
```
time=2026-01-14T07:39:46.444386888Z level=info msg="Devices changed" module=agent.datapath.devices-controller devices=[]
time=2026-01-14T07:39:46.444654289Z level=info msg="Fallback node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.44474159Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.444833191Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="" device=eth0
panic: runtime error: index out of range [3] with length 0
goroutine 415 [running]:
github.com/cilium/cilium/pkg/byteorder.NetIPv4ToHost32({0x0?, 0xc000e9e5d0?, 0x49e07bb?})
/go/src/github.com/cilium/cilium/pkg/byteorder/byteorder.go:15 +0x65
github.com/cilium/cilium/pkg/datapath/linux/config.(*HeaderfileWriter).WriteNodeConfig(0xc0004280e0, {0x7ff1517a6ba8, 0xc0023ec400}, 0xc001d00508)
/go/src/github.com/cilium/cilium/pkg/datapath/linux/config/config.go:150 +0xa4b
github.com/cilium/cilium/pkg/datapath/loader.hashDatapath({0x50fbfb0, 0xc0004280e0}, 0xc001d00508)
/go/src/github.com/cilium/cilium/pkg/datapath/loader/hash.go:20 +0x9e
github.com/cilium/cilium/pkg/datapath/loader.(*objectCache).UpdateDatapathHash(0xc001d027d0, 0xc001422870?)
/go/src/github.com/cilium/cilium/pkg/datapath/loader/cache.go:62 +0x4d
github.com/cilium/cilium/pkg/datapath/loader.(*loader).Reinitialize(0xc002573580, {0x50f9c98, 0xc0008474d0}, 0xc001d00508, {{0x49d15b6, 0x4}, {0x0, 0x0}, 0x0, 0x0, ...}, ...)
/go/src/github.com/cilium/cilium/pkg/datapath/loader/base.go:377 +0x3c8
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reinitialize(0xc001d36288, {0x50f9c98?, 0xc0008474d0?}, {{0x0?, 0x0?}, 0x0?}, 0xc001d00508)
/go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:275 +0x110
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reconciler(0xc001d36288, {0x50f9c98, 0xc0008474d0}, {0x5104260, 0xc002feafc0})
/go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:219 +0x6fd
github.com/cilium/hive/job.(*jobOneShot).start(0xc002082e40, {0x50f9c98, 0xc0008474d0}, 0xc00143dce4?, {0x5104260, 0xc002082de0}, {{{0x0, 0x0, 0x0}}, 0xc001791770, ...})
/go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd
created by github.com/cilium/hive/job.(*queuedJob).Start.func1 in goroutine 1
/go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:126 +0x16f
```
With the change in this commit when a direct routing device is not found
datapath orchestrator will log a warning and wait for device updates in
the reconciliation loop, skipping reinitialization.
Fixes 8fae439710fd4b426beae9190957d2380a96bed6 ("datapath: move DirectRoutingDevice validation to orchestrator")
Signed-off-by: Deepesh Pathak <deepeshpathak09@gmail.com>
* datapath/loader: Add netkit to BPF load tests
Commit 854473726b ("bpf: Workaround for netkit + L7 policy redirect failure")
introduced the enable_netkit load-time config variable.
This commit introduces that variable to BPF loader permutation testing
for bpf_lxc.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* docs: add netkit requirement to kernel version list
Add Linux kernel requirement for netkit to the System Requirements.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* style(bpf/test): fix indentation
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* reafctor(bpf): move `icmp_wsum_accumulate` helper
This helper is shared by ICMPv6 and ICMPv4 and will be imported by both
in future refactors.
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* refactor(bpf): move ICMPv6 packet generation to a separate file
The goal is to make these functions reusable for the ICMPv6 policy
denial feature, so moving them to a shared icmp6.h file
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* refactor(bpf): reduce ifdef number
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* gateway-api: Update conformance test Make target
This updates the Gateway API conformance test Make target
to make it not require any extra setup when run as part
of local development with Kind, and adds the ability
to set which tests to run, to allow for focussed
conformance test runs.
Signed-off-by: Nick Young <nick@isovalent.com>
* bpf: introduce DECLARE_CONFIG_KIND
DECLARE_CONFIG and NODE_CONFIG only differ in the value of their
respective kind: tag. Avoid code duplication by moving the common parts
to a separate DECLARE_CONFIG_KIND macro and use it to define
DECLARE_CONFIG and NODE_CONFIG. This also allows easier downstream use
of these macros.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* bpf: wire events map rate limits through node config
Move events map rate/burst limits into node config and read them via
CONFIG(events_map_{rate,burst}_limit) in BPF helpers. This drops the
compile-time EVENTS_MAP* defines from the header writer and cleans up
the legacy defaults in bpf/node_config.h.
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
* sysdump: Use label selectors for Hubble UI/Relay deployment collection
Previously, sysdump collected Hubble UI and Hubble Relay deployments
using hardcoded deployment names ('hubble-ui', 'hubble-relay'). This
caused the collection to fail when users deployed Hubble components
with custom names (e.g., 'hubble-ui-blue'), even when the correct
labels were provided via --hubble-ui-labels or --hubble-relay-labels.
This change updates the deployment collection to use ListDeployment
with the configured label selectors instead of GetDeployment with
hardcoded names. This makes deployment collection consistent with
pod log collection, which already uses label selectors.
Fixes the issue where:
cilium sysdump --hubble-ui-labels k8s-app=hubble-ui-blue
would still fail with 'Deployment hubble-ui not found'.
Signed-off-by: darox <maderdario@gmail.com>
* bpf: lxc: remove unnecessary L3 validation
There's no code that uses the IPv4 header afterwards.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* bpf: lxc: fine-tune BPF Host Routing path
Using fib_ok() to evaluate the result of fib_redirect_v*() is a bit
awkward. We're in TC context, so we know that CTX_ACT_TX doesn't need to
be handled (and would most likely lead to a packet loop).
By structuring the code as a switch() statement we can also clean up one
of the goto paths.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* bpf: xdp: prefer CTX_ACT_TX over XDP_TX
Return the generic value, so that readers understand what macro they should
be using when handling the result.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* bpf, nat46x64: move RFC6052 prefix into node config
This commit moves NAT 46x64 RFC6052 prefix bytes into the node
configuration so BPF programs consume CONFIG(nat_46x64_prefix) value
instead of C header defines.
The Go datapath now populates this field from the NAT46x64 config, and
the headerfile writer no longer emits NAT_46X64_PREFIX_ defines.
Updates included:
- BPF nat_46x64 helpers switched to CONFIG(nat_46x64_prefix).
- Node config population moved to runtime config and legacy defines
dropped.
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
* neighbor: Fix description for L2 neighbor discovery
The flag is not used by IPsec. L2 neighbor discovery is enabled whenever
XDP is enabled. This flag allows users to enable it even if XDP is
disabled, so let's state that instead of mentioning IPsec.
Co-authored-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
* CODEOWNERS: add more specific owners for operator subsystems
Some of the operator's subsystem are currently only covered by the
catch-all rule assinging @cilium/operator. Instead, assign more specific
teams for certain subsystems which require more in-depth knowledge in
these particular areas.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* .github/workflows: eks-cluster-pool-manager: fix race condition and cleanup leaked IAM roles
When multiple parallel jobs generate cluster names within the same
second, they can produce identical names since the timestamp has only
1-second precision. This causes CloudFormation stack creation to fail
with "AlreadyExistsException", leaving orphaned IAM roles behind.
This commit adds a random suffix to cluster names to prevent race conditions
and enhances the failure cleanup step to delete CloudFormation stacks and orphaned
IAM roles when cluster creation fails
Signed-off-by: André Martins <andre@cilium.io>
* hubble: Fix typos in config/set.go
Signed-off-by: harshitghagre <harshitghagre183@gmail.com>
* test/helpers: ignore error creating lease lock message
This message was modified in k8s 1.35.0, therefore we should update the
list of messages that can be ignored in our CI.
Signed-off-by: André Martins <andre@cilium.io>
* Fix backend slot index mismatch in LB reconciler
Use slotID instead of loop index when setting backend slots to avoid
gaps when maintenance backends are skipped.
Signed-off-by: Aman-Cool <aman017102007@gmail.com>
* vendor: Bump to StateDB v0.6.3
This restores the ability to use Changes() against a WriteTxn that is
targeting the same table as Changes().
Signed-off-by: Jussi Maki <jussi@isovalent.com>
* docs: Fix upgrade note category for tproxy
There's no new option here, it's a change in behaviour. Fix the category
for the upgrade note.
CC: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Fixes: c257b3d0aca8 ("datapath/connector: do not support netkit and bpf.tproxy=true")
Signed-off-by: Joe Stringer <joe@cilium.io>
* policy: Fix PASS verdict for non-consecutive tiers
Signed-off-by: Blaz Zupan <blaz@google.com>
* loadbalancer/healthserver: refresh ProxyRedirect per request
This commit fixes stale ProxyRedirect reads in the health server by reloading Service
state from the services table on each request. This prevents incorrect
local endpoint counts when Envoy redirect state changes after the
listener is created (which is the case).
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* localnodestore: provide WaitForNodeInformation
This commit moves global function `k8s.WaitForNodeInformation`
into the `localNodeSynchronizer`. To prevent dependencies to the
synchronizer, the `LocalNodeStore` re-exposes the same method and
delegates the call to the synchronizer.
This way, the legacy daemon initialization logic no longer needs to
dependent on the (k8s/cilium)-node resources.
This step isn't final. Eventually, the wait logic should watch
the state db "local node" for the changes.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* localnodestore: pass localnodestore to synchronizer
With this commit, the `LocalNodeStore` passes a pointer
to itself to the synchronizer when calling `WaitForNodeInformation`.
This way, the synchronizer can update the ip allocation ranges without
using the global functions.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* ci: e2e: add `kernel` to workflow job names
As a follow-up of changes introduced in https://github.com/cilium/cilium/pull/44126
to improve the readability of workflow job names in the GitHub UI, this
commit adds the `kernel` parameter to the name too. In OpenSearch,
it would be good to differentiate these jobs by the used kernel.
Given the kernel is a parameter that may vary frequently, it is good to be
able to differentiate runs that used the same configuration but different
kernels. That's also good in case of regressions when changing kernel.
The result will be `Setup & Test (ipsec-1, minor, 5.10)`.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* pkg/datapath/bandwidth: optimize host endpoint QoS setup
The bandwidth manager was calling node.GetEndpointID() and performing
a table lookup on every UpdateBandwidthLimit() call, even though the
host endpoint QoS only needs to be set up once.
This commit introduces an atomic boolean flag that tracks whether the
host endpoint QoS has been configured. After the initial setup:
- Skip the GetEndpointID() call (fast path)
- Skip the Table.Get() call entirely
- Skip the redundant Insert() call for host endpoint
Additionally, based on review feedback:
- Change node.GetEndpointID() to return (uint64, bool) to indicate
whether the host endpoint ID has been set, avoiding duplicate
constants across packages
- Change node.endpointID to atomic.Uint64 for thread-safe access
during initialization
Signed-off-by: Anand Kumar Shaw <anandkrshawheritage@gmail.com>
* clustermesh: fix a few misc issue with MCS-API doc
This commit fixes two issue with mcs-api doc:
- a simple typo on the enabled keywork
- change the code-block in parsed-literal as the |SCM_WEB| "variable"
was not evaluated/replaced in the final doc with a code-block
Signed-off-by: Arthur Outhenin-Chalandre <git@mrfreezeex.fr>
* Docs: improve docs around ipsec upgrade in 1.18
Signed-off-by: darox <maderdario@gmail.com>
* docs(ztunnel): fix duplicate word (a set)
Signed-off-by: Alexis La Goutte <alexis.lagoutte@gmail.com>
* docs(ztunnel): add missing backslash
add missing backslash for install with Cilium CLI
Signed-off-by: Alexis La Goutte <alexis.lagoutte@gmail.com>
* clustermesh: helm: remove clustermesh.enableMCSAPISupport
This option was deprecated and its removal was announced for Cilium
1.20, so let's drop it now that we are in the 1.20 cycle!
Signed-off-by: Arthur Outhenin-Chalandre <git@mrfreezeex.fr>
* daemon: enforce iptable rules are present with node-port is enabled
Moving forward the node-port code path will rely on iptables to
identify locally generated traffic and create NAT reservations for this
traffic in order to avoid reservation conflicts.
We will consider it a fatal error moving forward to disable iptable rule
installation while node-port or eBPF masquerading is enabled.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* bpf,nodeport: generalize SNAT conflict detection
Prior to this commit the nodeport SNAT conflict detection assumed host
traffic was sourced from the direct routing interface's IP and always
egressed on this device.
Remove this assumption by detecting locally generated traffic from any
egress interface 'bpf_host' is attached to.
This removes the dependency on the direct routing interface in the
node-port path.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* ztunnel: introduce end to end connectivity tests
The test infrastructure deploys three dedicated namespaces with client
and echo-server pods in each: two enrolled namespaces for testing
cross-namespace mTLS scenarios and one unenrolled namespace for baseline
verification. Pod affinity rules ensure echo-same-node pods co-locate
with clients while echo-other-node pods schedule on different nodes,
enabling both intra-node and inter-node traffic validation.
The test scenarios verify ztunnel mTLS behavior by dynamically labeling
namespaces with the io.cilium/mtls-enabled label during test execution.
For enrolled pod pairs, the tests assert that traffic flows through port
15008 (the ztunnel HBONE proxy) and that no unencrypted traffic appears
on port 8080. For unenrolled pods, the assertions are inverted to confirm
traffic bypasses ztunnel entirely. Cross-namespace scenarios confirm that
mTLS works correctly when client and server reside in separately enrolled
namespaces.
Packet capture validation runs from host network pods using tcpdump,
which is required since ztunnel operates in the host network namespace.
The tests query the ztunnel admin API to verify workload registration
before generating traffic, ensuring the any state the test depends on
has converged before making assertions.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
Signed-off-by: Quang Nguyen <nguyenquang@microsoft.com>
Signed-off-by: Robin Gögge <r.goegge@isovalent.com>
* ci,ztunnel: add workflows for ztunnel encryption tests
Add a new GitHub Actions workflow to run end-to-end tests for ztunnel
encryption in Cilium.
The new /ci-ztunnel-e2e trigger is added to the Ariane configuration,
pointing to the newly created conformance-ztunnel-e2e.yaml workflow
file.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* ci,ztunnel: add ztunnel cert script to actions
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* datapath: remove GetRoutePostEncryptMTU()
The last user was removed by
f81d8964ce4a ("ipsec: don't reduce post-encrypt MTU for tunnel overhead").
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* datapath: ipsec: remove clean up code for encrypt IP rule
https://github.com/cilium/cilium/pull/41699 stopped installing this IP
rule in the v1.19 release. With v1.20 we can now stop cleaning it up.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* loadbalancer/api: include proxy-redirect as backend
Currently, listing services via `cilium-dbg service list` displays
no backends for the services that use L7 proxy redirection (e.g.
Gateway API / Ingress loadbalancer service).
```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID Frontend Service Type Backend
1 10.96.62.28:80/TCP ClusterIP 1 => 10.244.1.149:4245/TCP (active)
2 10.96.0.10:53/TCP ClusterIP 1 => 10.244.1.69:53/TCP (active)
2 => 10.244.1.109:53/TCP (active)
3 10.96.0.10:53/UDP ClusterIP 1 => 10.244.1.69:53/UDP (active)
2 => 10.244.1.109:53/UDP (active)
4 10.96.0.10:9153/TCP ClusterIP 1 => 10.244.1.69:9153/TCP (active)
2 => 10.244.1.109:9153/TCP (active)
5 10.96.0.1:443/TCP ClusterIP 1 => 172.19.0.2:6443/TCP (active)
12 10.96.162.7:443/TCP ClusterIP 1 => 172.19.0.2:4244/TCP (active)
15 10.96.106.135:80/TCP ClusterIP 1 => 10.244.1.115:80/TCP (active)
16 10.96.55.103:80/TCP ClusterIP 1 => 10.244.1.67:80/TCP (active)
17 [::]:30965/TCP NodePort
19 [::]:30965/TCP/i NodePort
21 0.0.0.0:30965/TCP NodePort
23 0.0.0.0:30965/TCP/i NodePort
25 10.96.245.249:80/TCP ClusterIP
26 172.19.255.1:80/TCP LoadBalancer
27 172.19.255.1:80/TCP/i LoadBalancer
28 [fd00:10:96::d99f]:80/TCP ClusterIP
```
Even though the actual redirection happens via BPF/iptables TPROXY, it would be
nice to display the information about the local redirection to the node-local
L7 proxy (Envoy) in some form.
Therefore, this commit changes the frontend model generation to treat the
proxy redirection as backend of the service frontend in the form
`<localhost-addr>:<proxy-port>/<proto> (active)`
Result
```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID Frontend Service Type Backend
1 10.96.62.28:80/TCP ClusterIP 1 => 10.244.1.149:4245/TCP (active)
2 10.96.0.10:53/TCP ClusterIP 1 => 10.244.1.69:53/TCP (active)
2 => 10.244.1.109:53/TCP (active)
3 10.96.0.10:53/UDP ClusterIP 1 => 10.244.1.69:53/UDP (active)
2 => 10.244.1.109:53/UDP (active)
4 10.96.0.10:9153/TCP ClusterIP 1 => 10.244.1.69:9153/TCP (active)
2 => 10.244.1.109:9153/TCP (active)
5 10.96.0.1:443/TCP ClusterIP 1 => 172.19.0.2:6443/TCP (active)
12 10.96.162.7:443/TCP ClusterIP 1 => 172.19.0.2:4244/TCP (active)
15 10.96.106.135:80/TCP ClusterIP 1 => 10.244.1.115:80/TCP (active)
16 10.96.55.103:80/TCP ClusterIP 1 => 10.244.1.67:80/TCP (active)
17 [::]:30965/TCP NodePort 1 => [::1]:14543/TCP (active)
19 [::]:30965/TCP/i NodePort 1 => [::1]:14543/TCP (active)
21 0.0.0.0:30965/TCP NodePort 1 => 127.0.0.1:14543/TCP (active)
23 0.0.0.0:30965/TCP/i NodePort 1 => 127.0.0.1:14543/TCP (active)
25 10.96.245.249:80/TCP ClusterIP 1 => 127.0.0.1:14543/TCP (active)
26 172.19.255.1:80/TCP LoadBalancer 1 => 127.0.0.1:14543/TCP (active)
27 172.19.255.1:80/TCP/i LoadBalancer 1 => 127.0.0.1:14543/TCP (active)
28 [fd00:10:96::d99f]:80/TCP ClusterIP 1 => [::1]:14543/TCP (active)
```
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* datapath/ipcachelistener: use injected localnodestore
This commit refactors the datapath ipcache BPF listener to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.
Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* datapath/linuxnodehandler: retrieve node ips from localnodestore
This commit refactors the `linuxNodeHandler` to fetch the node ips
from the injected `LocalNodeStore` instead of using the global
functions `node.GetIP[v4/v6]`.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* identity/cache: use injected localnodestore
This commit refactors the identity cache to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.
Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* node/address: remove global functions `GetIP[v4/v6]`
This commit removes the unused global functions `GetIPv4` & `GetIPv6`.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* test: remove K8sDatapathBandwidthTest
The Ginkgo bandwidth test was already not running in CI. It only ran on
net-next kernels, but the focus group that included it
(f10-agent-hubble-bandwidth) was excluded from the net-next job.
Bandwidth enforcement is independently tested in
cilium-cli/connectivity/builder/network_bandwidth_limit.go.
Fixes: #44165
Related: #37837
Signed-off-by: Pavan More <pavansmore05@gmail.com>
* wireguard: remove cleanup code for old userspace devices
2578e8ff4dfa ("wireguard: remove deprecated userspace fallback") removed
the userspace mode in v1.17, we can now stop cleaning up any network device
created for that mode.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* loadbalancer: Check for equality and skip insert when not changed
This adds DeepEqual() implementations for Service and FrontendParams and uses
it to skip inserting the service and updating the frontend if nothing has changed.
As we now don't forcefully update the frontend the orphan detection has to actually
build a set to check if a frontend has been removed.
Benchmarks before:
Benchmark_UpsertServiceAndFrontends_100-6 3549 317691 ns/op 314771 objects/sec
BenchmarkInsertBackend-6 2818 423975 ns/op 235863 objects/sec
BenchmarkReplaceBackend-6 326682 3793 ns/op 263669 objects/sec
BenchmarkReplaceService-6 2327074 509.4 ns/op 1963230 objects/sec
After:
Benchmark_UpsertServiceAndFrontends_100-6 3464 331791 ns/op 301395 objects/sec
Benchmark_UpsertServiceAndFrontends_100_Unchanged-6 14652 81250 ns/op 1230766 objects/sec
BenchmarkInsertBackend-6 2956 401100 ns/op 249315 objects/sec
BenchmarkReplaceBackend-6 3402430 360.9 ns/op 2771038 objects/sec
BenchmarkReplaceService-6 2068555 556.6 ns/op 1796743 objects/sec
Signed-off-by: Jussi Maki <jussi@isovalent.com>
* loadbalancer: Remove dummy ingress endpoint workaround
Remove the skipping of the 192.192.192.192:9999 dummy ingress
endpoint as the next commit will remove the creation of it.
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* operator/helm: Remove creation of dummy ingress endpoint
With the new agent load-balancer implementation the dummy endpoint
is no longer necessary. Now that v1.18 has been released with that
implementation we can remove the creation of the dummy endpoint
without breaking upgrade.
Note: This commit removes the dummy endpoint in the Operator-
(Gateway API & dedicated Ingress) & Helm- (shared Ingress) managed
`EndpointSlice`'s. The creation of the `EndpointSlice` itself will
will be done in a separate PR.
Fixes: #19262
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* monitor: report 3rd argument in DBG_GENERIC debug monitor messages
Currently, in case cilium_dbg3(ctx, DBG_GENERIC, arg1, arg2, arg3) is used in
datapath code, the 3rd argument will not be reported in the monitor
message, e.g.:
bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac)
Also report the 3rd argument, so the monitor message will look as
follows:
bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac) arg3=170 (0xaa)
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* policy: cleanup label selector parsing and validation
This commit cleans up the parsing and validation of policy label
selectors by removing the dependency on k8s labels library for
validation.
With this change we no longer convert cilium representation
of label keys with source prefixed using ':' delimiter to k8s specific
representation with source prefixed using '.' for validation.
This avoids the additional complexity in policy code to convert between
the two representations. It also simplifies marshalling of selectors as
the objects now have a unified format.
This is not a functional change and does not have any associated user
impact.
Signed-off-by: Deepesh Pathak <deepesh.pathak@isovalent.com>
* helm/ztunnel: bind health check to localhost
Security hardening for ztunnel running with hostNetwork: true:
Add host field to readiness probe to bind the health check port 15021
to 127.0.0.1 instead of 0.0.0.0. This reduces attack surface by ensuring
the health check endpoint is only accessible from localhost (kubelet
runs on same node).
Signed-off-by: Quang Nguyen <nguyenquang@microsoft.com>
* ci:wireguard: enable Host Firewall in native routing e2e tests
This enabled Host Firewall in the `wireguard-3` config.
This helps us validating that host-related packets go through the host
firewall hook in `cilium_host` when WireGuard is enabled in native routing.
In overlay, even if we'd miss a redirect we'd see the packet in bpf_overlay,
which will then redirect the packet to bpf_host for HostFW validation.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* mcsapi: Add namespace filtering conditions to ServiceImport controller
Add namespace global status checking to the ServiceImport controller:
- Set Invalid condition on ServiceExport when namespace is not global
- Set NamespaceNotGlobal condition on ServiceImport when namespace is not global
- Signal service controller to skip/delete derived Service for non-global namespaces
by setting SupportedIPFamilies annotation to empty
This implements the MCS-API portion of the ClusterMesh global namespace
filtering feature as described in CFP-39876.
Signed-off-by: Jacques Massa <jac.massa0908@gmail.com>
* docs: split up network policy language page
Up until now, the page title had been "Layer 3 Examples", which is
a section headline and confusing, since it is among other examples.
Splitting up into several pages, similar to `network/kubernetes/`,
keeps the ToC as it is, and makes it easier to navigate compared to
the lengthy page it was, while also giving each page a suitable
headline.
Since examples are mixed with the language specification, change
headings from "Layer 3 Examples" to "Layer 3 Policies", etc.
Drop the old page and redirect to the overview to keep links working.
Signed-off-by: Daniel Maslowski <info@orangecms.org>
* golangci-lint: fix and simplify golangci-lint.sh
golangci-lint.sh would always rebuild the binary. The script was over-engineered
and didn't use the `golangci-lint version` subcommand, which doesn't need the
version string parsing logic. Simplify the script and fix the rebuild trigger.
Prepare the directory structure for splitting the main .golangci-lint.yaml in a
subsequent commit.
Signed-off-by: Timo Beckers <timo@isovalent.com>
* golangci-lint: split kubeapi configuration into separate file
The addition of kubeapilinter forced all users of the main .golangci.yaml to be
running a custom-built version of the tool and make sure it's kept in sync.
VSCode runs `golangci-lint run` in the repo by default and requires explicit
configuration to use another tool, like a script in the Cilium repository.
Before this PR, the script was broken and would always rebuilt custom-gcl.
Breaking the default tooling is not great. This commit moves kubeapi-specific
configuration to a dedicated config file in tools/golangci-lint-kubeapi and
specifies it explicitly when starting the kubeapi linter.
Signed-off-by: Timo Beckers <timo@isovalent.com>
* policy: Import the ClusterNetworkPolicy v1alpha2 API go modules
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Add parser function for Cluster Network Policy
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Add flags to enable or disable Cluster Network Policy
Disabled by default. A new Makefile target is added that enables it in kind clusters.
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Add watcher for Cluster Network Policy
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Move NetworkPolicy and CNP/CCNP to the 'Normal' tier
Signed-off-by: Blaz Zupan <blaz@google.com>
* CODEOWNERS: re-assign operator/identitygc, adjust endpoint team description
Assign operator/identitygc to @cilium/sig-policy instead of
@cilium/endpoint based on Joe's feedback. This seems like the more
appropriate team based on the description.
Also update the description for the @cilium/endpoint team to cover
responsibilities for cluster-wide/k8s considerations and endpoint
lifecycle.
Follow-up to commit 11376d5c715b ("CODEOWNERS: add more specific owners
for operator subsystems").
Ref. https://github.com/cilium/cilium/pull/44279#pullrequestreview-3780398303
Suggested-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* nodemap: converted net.IP to netip.Addr, Part of #24246
- Converted net.IP to netip.Addr in node map functions
- Added validation for zero netip.Addr values in newNodeKey
- Wrapped parse errors with %w for better error context
Signed-off-by: Sanjeevliv <sanjeevsethilive@gmail.com>
* bpf/tests: fix byte ordering for for TCP seq/win values
The pktgen logic for BPF tests sets default values on TCP packets in
host byte order only. This feeds into the TCP checksum calculation
and while these inputs are technically incorrect, tests that assert
the TCP checksum are correctly testing the invalid output values.
With the introduction of scapy packet definitions, matching these values
to existing tests is difficult because scapy does convert seq and win
values into network byte order. The effect is that L4 sequence numbers
between pktgen and scapy are wrong.
This commit updates pktgen and existing tests that assert these values
to use the appropriate byte conversion routines so that both pktgen
and scapy definitions produce the same results.
This causes all affected BPF tests to fail. This will be addressed
in the next commit.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* bpf/tests: fix TCP checksum assertions in all tests
This commit updates every test that asserts TCP checksums to reflect new
values following the change to the default TCP seq/win values, such that
the assertions for this should succeed with a scapy packet definition.
As part of this change, test failure logs have been updated to correctly
show value observed in a packet, and the expected value, both in network
byte order. This is to simplify future debugging. This has also been
applied to UDP checksum assertions for consistency.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* bpf/tests: fix default_data definition for scapy tests
The pktgen code defines default_data macro as a string literal which
includes a NUL-terminating character. The size of this string is 20
bytes.
The scapy code defines the same value without the NUL-terminating
character. The size of this string is 19 bytes.
As a result, when converting a test from pktgen to scapy, the packet
length changes, which causes assertion failures beause the change leads
to different L3 checksum values.
This commit modifies the scapy default_data variable to include a NUL
terminating character to match pktgen definitions. It is anticipated
this can be removed once all BPF tests are migrated to scapy.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* endpoint, fqdn: remove restoration of deprecated V1 DNSRules
Now that 1.16 is EOL, the code restoring PortProto V1 (without protocol)
can be removed.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* endpoint: rename DNS rules field
The V2 suffix is unnecessary now that we stop restoring V1 rules and
there is only a single supported version/field.
Also add a test that verifies that V2 rules are restored and V1 rules
are ignored while retaining backwards compatibility.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* tests: Ignore identity manager related error in versions < 1.18
Due to https://github.com/cilium/cilium/pull/42661 and
https://github.com/cilium/cilium/pull/42662 not being backported yet to
v1.17, CI fails in the upgrade/downgrade test with this error.
Therefore, we must add it to the ignore list until the PRs are at least
backported to v1.17.
The error was removed from the ignore list in
https://github.com/cilium/cilium/pull/42982.
Suggested-by: Marco Iorio <marco.iorio@isovalent.com>
Suggested-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
* metrics: remove agent bootstrap metrics
This commit removes the deprecated agent bootstrap metrics.
Hive job metrics can be used to inspect the duration of the legacy
daemon initialization job. In special cases it might be worth to
introduce context specific metrics - similar to what has been
introduced for the endpoint restoration logic.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* policy: fix policy tests
This fixes a policy break due to how label source is handled that
recently changed.
Fixes: 270daa4067 ("policy: Add parser function for Cluster Network Policy")
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
* resource/test: let TestResource_WithFakeClient set resource version
Update the TestResource_WithFakeClient test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* cid/test: let TestUpdatePodLabels set resource version
Update the TestUpdatePodLabels test to correctly specify the expected
resource version during updates, in preparation for extending the fake
client to actually enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* ipam/test: let Test_LocalNodeCIDRsSyncer set resource version
Update the Test_LocalNodeCIDRsSyncer test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* bgp/test: correctly set resource version when updating test resources
Update the bgp tests to correctly specify the expected resource version
during updates, in preparation for extending the fake client to actually
enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* test/controlplane: adaptation for optimistic concurrency control
Update the UpdateObjects helper to use the [ObjectTracker.Patch],
instead of [ObjectTracker.Update], in preparation for the subsequent
commit that will make the latter implement optimistic concurrency
control, and validate resource version mismatches, which is not
required in this context.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* k8s/client/fake: fix resource version configuration in tracker
Currently, the object tracker is affected by a bug that causes the
resource version to not be set on creation or update if the object
does not have the [metav1.TypeMeta] set. Indeed, in that case, the
function updating the TypeMeta creates a deep copy of the object,
causing operations performed via [meta.Accessor] to act on the old
copy, and not have effect. Let's get this fixed by changing the
[fillTypeMetaIfNeeded] function to not create a deep copy, given
that it already operates on a copy of the original object.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* k8s/client/fake: let update operations respect resource versioning
Currently, the statedb object tracker backing the fake kubernetes client
used for testing purposes does not respect resource versioning, and allows
update operations to succeed regardless of the provided resource version.
While convenient for the `k8s/update` command itself, this approach is
problematic in case of controllers acting on the same resources, as it
can lead to objects being unexpectedly reverted to incorrect versions,
due to the missing optimistic concurrency control.
Let's get this fixed by extending the update implementation to additionally
compare the resource version of the stored and provided objects, and reject
the update in case they do not match, as the real Kubernetes API Server
would do. By default, the k8s/update command still ignores the provided
resource version, letting the update succeed regardless: this matches the
desired behavior in the vast majority of the tests, and avoids the need
for complex operations to set the expected resource version. Still, if
necessary, the stricter behavior can be enabled via the dedicated flag.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* chore(deps): update base-images to v1.26.0
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* images: update cilium-{runtime,builder}
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
* chore(deps): update cilium/cilium-cli action to v0.19.1
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all github action dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update gcr.io/etcd-development/etcd docker tag to v3.6.8
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all lvh-images main
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all github action dependencies
Signed-off-by: cilium-renovate[b…
* ci: e2e: enhance readability of workflow job name
This commit updates the name of the "Setup & Test" job in the
GitHub Actions workflow for e2e upgrade tests to include only the matrix
parameters "name" and "mode". This change improves the readability
of the workflow runs by providing more context about the specific
configuration being tested.
Prior to this, the name of each job contained the whole matrix combination,
which in the UI resulted to be cut off and not readable. Given that now
we use the same workflow file for running both `minor` and `patch` upgrades,
let's make the displayed name simpler.
The result will be `Setup & Test (ipsec-1, minor)`.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* ci: e2e: log matrix configuration in each job
This commits adds as a first step of the `Setup & Test` job for e2e-upgrade
a simple step to dump the current matrix configuration being tested.
The previous commit, modified the title to simply display the matrix entry
name and mode (e.g., `Setup & Test (ipsec-1, minor)`) rather than the
whole configuration. In UI, that would result to be truncated anyway.
It is true that, given the matrix.name (e.g., ipsec-1), a user can open the
specific file and lookup the configuration required, but I think that
having a step where we dump it would speed up and easy debuggability in CI.
The output would be similar to:
```
> Log Matrix Configuration
Current matrix configuration:
{
"name": "wireguard-1",
"kernel": "5.10",
"kube-proxy": "iptables",
"kpr": "true",
"devices": "{eth0,eth1}",
"secondary-network": "true",
"tunnel": "vxlan",
"encryption": "wireguard",
"encryption-node": "false",
"lb-mode": "snat",
"endpoint-routes": "true",
"egress-gateway": "true",
"ingress-controller": "true",
"mode": "minor"
}
```
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* endpoint: Log labels as structured JSON objects
Standardize logging in pkg/endpoint so that identityLabels and related fields are logged as structured JSON objects instead of comma-separated strings by explicitly casting labels.Labels to map[string]labels.Label.
Signed-off-by: Jie WU <wujie@google.com>
```release-note
endpoint: Log labels as structured JSON objects
```
* feat(helm): hubble-ui containers set to pss-restricted
This sets the hubble-ui pods/containers to match k8s
pss-restricted profile along with the optional
`readOnlyRootFilesystem: true`
Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
* helm: allow multicluster-services installCRDs to update CRDs
Previously cilium-operator fails to start if MCS/installCRDs is enabled
because it does not have permissions to update the CRD with this log
message:
level=error msg="Unable to update CRD"
module=operator.operator-controlplane.leader-lifecycle.create-crds
name=serviceimports.multicluster.x-k8s.io
error="customresourcedefinitions.apiextensions.k8s.io
\"serviceimports.multicluster.x-k8s.io\" is forbidden: User
\"system:serviceaccount:kube-system:cilium-operator\" cannot update
resource \"customresourcedefinitions\" in API group
\"apiextensions.k8s.io\" at the cluster scope"
This patch adds the necessary permissions to cilium-operator if you have
mcs/installCRDs enabled
Fixes: #44210
Fixes: 3874013329d0 ("clustermesh: add config for auto installing
MCS-API CRDs")
Signed-off-by: Florian Ströger <stroeger@youniqx.com>
* policy: (mechanical) refactor out flow lookup types
A subsequent commit will include an alternate policy iteration system,
so it will be nice to move the types to policy/types.
This also removes the now-useless Decision type, as it's not used
anywhere in the codebase.
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
* policy: add a simple iterative policy simulator
This is a simple userspace tool that executes rules step-by-step. It's
purpose will be to validate more complex policy scenarios, ideally by
fuzzing.
To ensure it's output matches that of the existing policy engine, it
matches the LookupFlow method signature, and existing tests validate
that the simulation engine returns the same verdict.
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
* Add fuzz-based policy testing
This generates random policy corpuses and compares MapState-based policy
calculation with the iterative simulator.
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
* policy: Fix fuzz testing
Avoid using *testing.F for the logger as then any log within the fuzz
test would fail.
Fix the order of expected and actual for require.Equal.
Add more debugging.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Hide precedence details better
Hide precedence details from the policymap package.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Add optional indexing by identity
Add optional mapState indexing by identity to support incremental removal
of generated keys. This is only needed for deletion pass entries, so the
index is only used if the policy has pass verdicts.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Add default deny rule when pass verdict is used
Proper processing of pass verdicts requires the default deny rule to be
explicitly added to the mapstate so that it can be seen by pass verdict
entries.
The default rule is added to the next tier if any non-default tiers or
priorities are in use, of if the traffic direction has any pass
rules. This way the pass rule can pass to the added default deny (or
allow) rule.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Fix L7Filter precedence handling for pass verdicts
Deny takes precedence over allow and pass, allow takes precedence over
pass. Define new HasPrecedenceOver() to handle this instead of using just
IsDeny() like before. Would be simpler if Allow was not the zero value,
but changing that would require changing all unit testing code that uses
it as the default.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Fix per-tier priority range allocation
Fix tier base priority calculation. When figuring out the priority range
for each tier, the full range of the remaining tiers must be included to
add enough space for pass verdicts on higher tiers. Then, when setting
the base priotity of each tier, this has to be reversed.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Add Fuzz cases
Commit fuzzer cases found during development.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Add generated L3/4 entries, multipass support
A pass of a specific identity to a lower tier rule with wildcard identity
should pass the given identity only and keep the wildcard entry at the
original precedence to take care of traffic with other identities. Since
the original entry needs to be kept, a new generated entry with the
identity from the pass entry and the L4 from the passed to entry must be
added.
We missed this case earlier due to BroaderOrEqualKeys only iterating
wildcard identity entries when the new key is a wildcard entry. Entries
that have a broader or equal L4 but more specific L3 are not as a whole
"broader or equal". To handle the need for generated entries for the pass
verdict processing "BroaderOrEqualKeys" is changed to also iterate all
specific L3 keys if the L4 is broader or equal and the given key has the
wildcard identity. The old behavior is retained with
CoveringBroaderOrEqualKeys(). Similarly, NarrowerOrEqualKeys() is renamed
as CoveringNarrowerOrEqualKeys() while NarrowerOrEqualKeys() now also
iterates keys with the wildcard identity when the given key has a
specific identity.
The addition of generated entries requires these entries to be deleted
when that identity is incrementally deleted. Since selector cache is
transactional we can delete all keys with the deleted identity, when the
first key with that identity is deleted. To make this efficient we use
the new id index.
To add support for pass verdicts at multiple tiers, the pass metadata is
now stored as a slice. Overhead to non-pass entries is reduced by storing
the slice via a pointer ('passes'), as most mapStateEntries would not
have any pass metadata.
If 'passes' is non-nil, then the pointed-to slice must have at least
one element, and all elements must have non-zero 'passPrecedence'.
When merging pass metadata we clone the slice to be mutated so that the
same slice can safely be used in multiple entries.
Split insertWithPasses() from insertWithChanges(); insertWithPasses() is
only calling it if the policy has any pass verdicts. This reduces the
chance of regressions for non-pass policies.
Log a warning if a policy with pass verdicts is also using auth
requirements, as this combination has not been implemented. Adjust a test
to not claim all features when that is not the case.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* pkg/subnet: Fix tag in config subnets field
The Subnets field in the config was declaring a json tag, leading to a
failure of the agent `hive` command (see below). This is due to the fact
that the hive relies on a mapstructure Decoder, not a Json one, and
therefore require a mapstructure tag when the config field name is not
equal to the flag name.
Fix the tag on the field using a mapstructure one.
```
make -C daemon/ && ./daemon/cilium-agent hive
...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xacd46e]
goroutine 1 [running]:
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000b540f0, 0xa?, 0xc000cba750)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x12e
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000a9bdd0, 0x6?, 0xc000cba750)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x134
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000a799b0, 0x2?, 0xc000cba750)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x134
github.com/cilium/hive.(*Hive).PrintObjects(0xc0009554a0, {0x51e1240, 0xc0000c0030}, 0x0?)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/hive.go:459 +0x18f
github.com/cilium/hive.(*Hive).Command.func1(0xc000e0fc00?, {0x4b05b96?, 0x4?, 0x4b05aca?})
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/command.go:21 +0x2d
github.com/spf13/cobra.(*Command).execute(0xc000953508, {0x86bac20, 0x0, 0x0})
/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1015 +0xb02
github.com/spf13/cobra.(*Command).ExecuteC(0xc000952f08)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1148 +0x465
github.com/spf13/cobra.(*Command).Execute(...)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1071
github.com/cilium/cilium/daemon/cmd.Execute(0x4d769d0?)
/home/ffalzoi/cilium/cilium-2/daemon/cmd/root.go:89 +0x13
main.main()
/home/ffalzoi/cilium/cilium-2/daemon/main.go:15 +0x1f
```
Fixes: d395d73ad3 ("pkg/subnet: Add subnet config watcher and manager")
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
* linux-desired-device: introducing Cilium managed devices reconciler
Adding new reconciler in Cilium datapath/linux which can manage
life cycle of linux links created by Cilium.
Created devices are persisted on disk using write-ahead-log, upon
restart the owners of the devices are expected to redo the
configuration before calling finializer. Stale devices will be pruned.
Implementation is inspired by linux/route/reconciler.
Signed-off-by: harsimran pabla <hpabla@isovalent.com>
* linux-desired-device: script tests for desired-devices
Adding script tests to validate device creation and persistence.
Signed-off-by: harsimran pabla <hpabla@isovalent.com>
* linux-desired-device: hook desired-devices cell into main infra
Signed-off-by: harsimran pabla <hpabla@isovalent.com>
* fix: helm intervalSeconds value render bug
intervalSeconds is always of an integral type, no need to check kindIs float64
Fixes: #44206
Signed-off-by: jayl1e <jayl1e@outlook.com>
* bpf: source tuple hash seeds from node config
Move IPv4/IPv6 hash init seeds into node config and wire them from
Maglev config. BPF tuple hashing now reads CONFIG(hash_init{4,6}_seed)
instead of compile-time defines, and the legacy HASH_INIT* defines are
removed from the header writer and node_config.h.
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
* feat(install) Allow hubble to run with hostUsers: false
The hubble components do not require direct mapping of container
users to system users.
Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
* docs: fix typos in comments
Signed-off-by: Yohei Yamamoto <yhymmt123@gmail.com>
* bpf: correct comments in cil_from_netdev function
Removed conditions from the comment block describing the cil_from_netdev function since the logic has been changed
Signed-off-by: Liyi Huang <liyi.huang@isovalent.com>
* chore(deps): update all lvh-images main
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all-dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all github action dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* bpf: lb: Decouple DNAT operation from LB key
Instead of passing the lb4_key/lb6_key to lb4_xlate/lb6_xlate for
checksum calculation and port translation, pass the original destination
address and port directly from the CT tuple.
This change:
1. Removes the key parameter from lb4_xlate/lb6_xlate functions
2. Removes the key parameter from lb4_dnat_request/lb6_dnat_request
The CT tuple already contains the same values that were being read from
the key structure:
- tuple->daddr == key->address (original destination)
- tuple->sport == key->dport (reversed port order in CT tuple)
By removing the xlate path's dependency on key, we can now directly
modify key->address = 0 for wildcard lookups without creating a copy,
simplifying the backend selection logic.
Co-authored-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
* bpf: lxc: Handle DSR for remote NodePort services on source node
When DSR and PER_PACKET_LB are enabled, connections fail if a client pod
sends a request to a remote node's NodePort service while the server
pod is located on the same node as the client.
The root cause is that the remote node only performs DNAT, setting the
packet's source address to client's node address, leading to a hairpin
problem. Consequently, the originating node cannot perform the necessary
REV NAT for the reply packets.
To resolve this, remote NodePort service requests are now handled on the
source node when DSR is enabled, similar to the behavior of socket-level
load balancing.
Implementation details:
- Add lb4/6_lookup_wildcard_nodeport_service(): When ENABLE_DSR is
defined and the regular service lookup fails, check if the destination
is a remote node's IP with a NodePort port range. If so, perform a
wildcard lookup (address=0) to find the NodePort service.
- Use wildcard key for backend selection: When dsr_internal flag is set,
set key->address to 0 before calling lb4/6_select_backend_id(). This
applies to both CT_NEW (new connections) and CT_REPLY (backend
re-selection for existing connections). This is needed for backend
selection algorithms that use slot lookup (e.g., Random), which look
up backend slots via lb4/6_lookup_backend_slot() using the service
key. Without a wildcard key, the lookup would fail because backend
slot entries are stored with the wildcard service key, not with the
remote node's IP.
- Store original destination in CT entry: The original destination
address and port (remote node IP and NodePort) are stored in
ct_state_new.nat_addr/nat_port, which will be written to the CT entry
for use in reply path RevNAT processing.
- Use cilium_dsr_nat_buffer per-CPU map: The NAT info is detected in
__per_packet_lb_svc_xlate_4/6(), but the CT entry is created after DNAT
when the original destination info is no longer available in the packet.
The per-CPU buffer preserves this info across the DNAT operation.
Existing connection handling:
- This change only affects DSR traffic destined to remote node's
NodePort. The wildcard lookup is triggered when lb4/6_lookup_service()
fails, but it only processes packets where the destination is a remote
node IP with a port in the NodePort range. Other traffic that fails
the regular lookup is unaffected.
Fixes: #41962
Co-authored-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
* bpf: lxc: Add RevNAT support for DSR remote NodePort connections
Add reverse NAT support for reply packets of DSR remote NodePort
connections. The forward path stores the original destination address
and port in the CT entry's nat_addr/nat_port fields, which are now
used during reply processing.
Implementation details:
- Extend lb4/6_rev_nat() signature: Add nat_addr and nat_port parameters.
When nat_port is non-zero, use these values directly for RevNAT.
When nat_port is zero, fall back to the existing rev_nat_index lookup
for backward compatibility with existing connections.
- Modify ct_lookup_fill_state(): Copy nat_addr and nat_port from the
CT entry to ct_state, making them available for reply processing.
- Update ipv4/6_policy(): Check for nat_port in addition to
rev_nat_index when deciding whether to perform RevNAT. Pass the
CT entry's NAT information to lb4/6_rev_nat().
- Update nodeport_rev_dnat_ipv4/6(): Adapt to the new lb4/6_rev_nat()
signature by passing NULL/0 for nat_addr/nat_port (these paths use
the traditional rev_nat_index lookup).
Existing connection handling:
1. New connections (created after this patch):
- Forward path stores nat_addr/nat_port in CT entry
- Reply path uses CT entry's nat_addr/nat_port for RevNAT
2. Existing connections (regular NodePort/DSR traffic):
- CT entry has nat_addr=0, nat_port=0
- lb4/6_rev_nat checks nat_port first:
- If nat_port != 0: use nat_addr/nat_port directly
- If nat_port == 0: fall back to rev_nat_index lookup
- This ensures existing connections continue to work
3. Upgrade scenario:
- Existing connections keep working via rev_nat_index fallback
- New DSR remote nodeport connections use nat_addr/nat_port
- No connection disruption during rolling upgrade
Co-authored-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
* bpf: tests: Add tests for DSR remote NodePort handling
Add BPF unit tests to verify the DSR remote NodePort functionality
for both IPv4 and IPv6.
Test scenarios for DSR mode (tc_lxc_lb4/6_dsr_nodeport.c):
1. Pod -> Remote NodePort -> Local backend (forward path)
- Client pod sends packet to remote node's NodePort
- LB selects a backend on the local node (same node as client)
- Verifies packet is DNATed to local backend IP and port
- Verifies CT entry contains correct nat_addr (remote node IP)
and nat_port (NodePort)
2. Local backend -> Pod (reply path)
- Backend sends reply packet to client
- Verifies RevNAT is applied correctly
- Source IP/port changed to remote node IP and NodePort
3. Pod -> Remote NodePort -> Self (hairpin)
- Client pod sends packet to remote node's NodePort
- LB selects the client pod itself as the backend
- Verifies DNAT to client IP and backend port
- Verifies SNAT to loopback IP for hairpin flow
4. Hairpin reply
- Pod replies to loopback IP
- Verifies RevNAT restores remote node IP and NodePort
5. Existing connection handling (UDP)
- First packet establishes CT entry via legacy path
- Second packet should use existing CT entry
- Verifies wildcard lookup is skipped for existing connections
Test scenarios for Hybrid mode (tc_lxc_lb4/6_hybrid_dsr_nodeport.c):
1. DSR service handling
- Verifies DSR-enabled service triggers wildcard lookup
- Packet is DNATed to local backend
2. SNAT service handling
- Verifies SNAT service does NOT trigger wildcard lookup
- Packet passes through without DNAT (handled by remote node)
Co-authored-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
* chore(deps): update quay.io/cilium/cilium-envoy docker tag to v1.36.5-1770440937-14da6b9c8a54244f0a67cd90a0deb83e5f110a4a
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update module sigs.k8s.io/kube-api-linter to v0.0.0-20260206102632-39e3d06a2850
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update base-images
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* images: update cilium-{runtime,builder}
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
* loadbalancer/maps: let SourceRangeKey.GetCIDR return netip.Prefix
Its only caller (apart from use in log strings) directly converts the
result to a netip.Prefix. Rather than first constructing a *cidr.CIDR
only to then convert it to netip.Prefix, construct a netip.Prefix right
away. Also rename the method accordingly.
This slightly improves pkg/loadbalancer/benchmark results:
Before:
Memory statistics from N=10 iterations:
Min: Allocated 490378kB in total, 2471963 objects / 140742kB still reachable (per service: 49 objs, 10042B alloc, 2882B in-use)
Avg: Allocated 501584kB in total, 2957195 objects / 177049kB still reachable (per service: 59 objs, 10272B alloc, 3625B in-use)
Max: Allocated 543366kB in total, 3722287 objects / 227653kB still reachable (per service: 74 objs, 11128B alloc, 4662B in-use)
After:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service: 45 objs, 10018B alloc, 2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service: 57 objs, 10256B alloc, 3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service: 74 objs, 11121B alloc, 4660B in-use)
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* loadbalancer/maps: let ServiceKey.GetAddress return netip.Addr
This simplifies call sites and avoids an unnecessary conversion in
the reconciler's (*BPFOps).pruneServiceMaps method.
pkg/loadbalancer/benchmark results:
Before:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service: 45 objs, 10018B alloc, 2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service: 57 objs, 10256B alloc, 3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service: 74 objs, 11121B alloc, 4660B in-use)
After:
Memory statistics from N=10 iterations:
Min: Allocated 487023kB in total, 1944085 objects / 99675kB still reachable (per service: 38 objs, 9974B alloc, 2041B in-use)
Avg: Allocated 498657kB in total, 2813561 objects / 166829kB still reachable (per service: 56 objs, 10212B alloc, 3416B in-use)
Max: Allocated 542794kB in total, 3722246 objects / 227646kB still reachable (per service: 74 objs, 11116B alloc, 4662B in-use)
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* loadbalancer/maps: remove unused SkipLBMap delete methods
The DeleteLB{4,6}By* methods don't have callers anymore since
commit 6fa7f8129d1a ("loadbalancer/legacy: Remove the old
control-plane"). Remove them.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* fix(deps): update all go dependencies main
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* docs: Update docsearch to v4.5.4
Pull in the latest theme with newer docsearch plugin version.
Signed-off-by: Joe Stringer <joe@cilium.io>
* ci: update docs-builder
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
* Use binary.NativeEndian instead of nl.NativeEndian
Use the NativeEndian native-endian var provided by the Go standard
library encoding/binary package instead of the version from the
netlink/nl package.
While at it, also check the length of the handle returned by
unix.NamtToHandleAt before accessing it.
Follow-up to commit 2c1b49cac70b ("byteorder: use binary.NativeEndian")
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* datapath: fix panic during datapath reinitialization
This commit fixes a cilium-agent panic during datapath reinitialization
when a DirectRouting device is required but not configured.
This can happen when the direct routing device drops, for example during
networkd restart.
```
time=2026-01-14T07:39:46.444386888Z level=info msg="Devices changed" module=agent.datapath.devices-controller devices=[]
time=2026-01-14T07:39:46.444654289Z level=info msg="Fallback node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.44474159Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.444833191Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="" device=eth0
panic: runtime error: index out of range [3] with length 0
goroutine 415 [running]:
github.com/cilium/cilium/pkg/byteorder.NetIPv4ToHost32({0x0?, 0xc000e9e5d0?, 0x49e07bb?})
/go/src/github.com/cilium/cilium/pkg/byteorder/byteorder.go:15 +0x65
github.com/cilium/cilium/pkg/datapath/linux/config.(*HeaderfileWriter).WriteNodeConfig(0xc0004280e0, {0x7ff1517a6ba8, 0xc0023ec400}, 0xc001d00508)
/go/src/github.com/cilium/cilium/pkg/datapath/linux/config/config.go:150 +0xa4b
github.com/cilium/cilium/pkg/datapath/loader.hashDatapath({0x50fbfb0, 0xc0004280e0}, 0xc001d00508)
/go/src/github.com/cilium/cilium/pkg/datapath/loader/hash.go:20 +0x9e
github.com/cilium/cilium/pkg/datapath/loader.(*objectCache).UpdateDatapathHash(0xc001d027d0, 0xc001422870?)
/go/src/github.com/cilium/cilium/pkg/datapath/loader/cache.go:62 +0x4d
github.com/cilium/cilium/pkg/datapath/loader.(*loader).Reinitialize(0xc002573580, {0x50f9c98, 0xc0008474d0}, 0xc001d00508, {{0x49d15b6, 0x4}, {0x0, 0x0}, 0x0, 0x0, ...}, ...)
/go/src/github.com/cilium/cilium/pkg/datapath/loader/base.go:377 +0x3c8
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reinitialize(0xc001d36288, {0x50f9c98?, 0xc0008474d0?}, {{0x0?, 0x0?}, 0x0?}, 0xc001d00508)
/go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:275 +0x110
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reconciler(0xc001d36288, {0x50f9c98, 0xc0008474d0}, {0x5104260, 0xc002feafc0})
/go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:219 +0x6fd
github.com/cilium/hive/job.(*jobOneShot).start(0xc002082e40, {0x50f9c98, 0xc0008474d0}, 0xc00143dce4?, {0x5104260, 0xc002082de0}, {{{0x0, 0x0, 0x0}}, 0xc001791770, ...})
/go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd
created by github.com/cilium/hive/job.(*queuedJob).Start.func1 in goroutine 1
/go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:126 +0x16f
```
With the change in this commit when a direct routing device is not found
datapath orchestrator will log a warning and wait for device updates in
the reconciliation loop, skipping reinitialization.
Fixes 8fae439710fd4b426beae9190957d2380a96bed6 ("datapath: move DirectRoutingDevice validation to orchestrator")
Signed-off-by: Deepesh Pathak <deepeshpathak09@gmail.com>
* datapath/loader: Add netkit to BPF load tests
Commit 854473726b ("bpf: Workaround for netkit + L7 policy redirect failure")
introduced the enable_netkit load-time config variable.
This commit introduces that variable to BPF loader permutation testing
for bpf_lxc.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* docs: add netkit requirement to kernel version list
Add Linux kernel requirement for netkit to the System Requirements.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* style(bpf/test): fix indentation
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* reafctor(bpf): move `icmp_wsum_accumulate` helper
This helper is shared by ICMPv6 and ICMPv4 and will be imported by both
in future refactors.
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* refactor(bpf): move ICMPv6 packet generation to a separate file
The goal is to make these functions reusable for the ICMPv6 policy
denial feature, so moving them to a shared icmp6.h file
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* refactor(bpf): reduce ifdef number
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* gateway-api: Update conformance test Make target
This updates the Gateway API conformance test Make target
to make it not require any extra setup when run as part
of local development with Kind, and adds the ability
to set which tests to run, to allow for focussed
conformance test runs.
Signed-off-by: Nick Young <nick@isovalent.com>
* bpf: introduce DECLARE_CONFIG_KIND
DECLARE_CONFIG and NODE_CONFIG only differ in the value of their
respective kind: tag. Avoid code duplication by moving the common parts
to a separate DECLARE_CONFIG_KIND macro and use it to define
DECLARE_CONFIG and NODE_CONFIG. This also allows easier downstream use
of these macros.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* bpf: wire events map rate limits through node config
Move events map rate/burst limits into node config and read them via
CONFIG(events_map_{rate,burst}_limit) in BPF helpers. This drops the
compile-time EVENTS_MAP* defines from the header writer and cleans up
the legacy defaults in bpf/node_config.h.
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
* sysdump: Use label selectors for Hubble UI/Relay deployment collection
Previously, sysdump collected Hubble UI and Hubble Relay deployments
using hardcoded deployment names ('hubble-ui', 'hubble-relay'). This
caused the collection to fail when users deployed Hubble components
with custom names (e.g., 'hubble-ui-blue'), even when the correct
labels were provided via --hubble-ui-labels or --hubble-relay-labels.
This change updates the deployment collection to use ListDeployment
with the configured label selectors instead of GetDeployment with
hardcoded names. This makes deployment collection consistent with
pod log collection, which already uses label selectors.
Fixes the issue where:
cilium sysdump --hubble-ui-labels k8s-app=hubble-ui-blue
would still fail with 'Deployment hubble-ui not found'.
Signed-off-by: darox <maderdario@gmail.com>
* bpf: lxc: remove unnecessary L3 validation
There's no code that uses the IPv4 header afterwards.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* bpf: lxc: fine-tune BPF Host Routing path
Using fib_ok() to evaluate the result of fib_redirect_v*() is a bit
awkward. We're in TC context, so we know that CTX_ACT_TX doesn't need to
be handled (and would most likely lead to a packet loop).
By structuring the code as a switch() statement we can also clean up one
of the goto paths.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* bpf: xdp: prefer CTX_ACT_TX over XDP_TX
Return the generic value, so that readers understand what macro they should
be using when handling the result.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* bpf, nat46x64: move RFC6052 prefix into node config
This commit moves NAT 46x64 RFC6052 prefix bytes into the node
configuration so BPF programs consume CONFIG(nat_46x64_prefix) value
instead of C header defines.
The Go datapath now populates this field from the NAT46x64 config, and
the headerfile writer no longer emits NAT_46X64_PREFIX_ defines.
Updates included:
- BPF nat_46x64 helpers switched to CONFIG(nat_46x64_prefix).
- Node config population moved to runtime config and legacy defines
dropped.
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
* neighbor: Fix description for L2 neighbor discovery
The flag is not used by IPsec. L2 neighbor discovery is enabled whenever
XDP is enabled. This flag allows users to enable it even if XDP is
disabled, so let's state that instead of mentioning IPsec.
Co-authored-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
* CODEOWNERS: add more specific owners for operator subsystems
Some of the operator's subsystem are currently only covered by the
catch-all rule assinging @cilium/operator. Instead, assign more specific
teams for certain subsystems which require more in-depth knowledge in
these particular areas.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* .github/workflows: eks-cluster-pool-manager: fix race condition and cleanup leaked IAM roles
When multiple parallel jobs generate cluster names within the same
second, they can produce identical names since the timestamp has only
1-second precision. This causes CloudFormation stack creation to fail
with "AlreadyExistsException", leaving orphaned IAM roles behind.
This commit adds a random suffix to cluster names to prevent race conditions
and enhances the failure cleanup step to delete CloudFormation stacks and orphaned
IAM roles when cluster creation fails
Signed-off-by: André Martins <andre@cilium.io>
* hubble: Fix typos in config/set.go
Signed-off-by: harshitghagre <harshitghagre183@gmail.com>
* test/helpers: ignore error creating lease lock message
This message was modified in k8s 1.35.0, therefore we should update the
list of messages that can be ignored in our CI.
Signed-off-by: André Martins <andre@cilium.io>
* Fix backend slot index mismatch in LB reconciler
Use slotID instead of loop index when setting backend slots to avoid
gaps when maintenance backends are skipped.
Signed-off-by: Aman-Cool <aman017102007@gmail.com>
* vendor: Bump to StateDB v0.6.3
This restores the ability to use Changes() against a WriteTxn that is
targeting the same table as Changes().
Signed-off-by: Jussi Maki <jussi@isovalent.com>
* docs: Fix upgrade note category for tproxy
There's no new option here, it's a change in behaviour. Fix the category
for the upgrade note.
CC: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Fixes: c257b3d0aca8 ("datapath/connector: do not support netkit and bpf.tproxy=true")
Signed-off-by: Joe Stringer <joe@cilium.io>
* policy: Fix PASS verdict for non-consecutive tiers
Signed-off-by: Blaz Zupan <blaz@google.com>
* loadbalancer/healthserver: refresh ProxyRedirect per request
This commit fixes stale ProxyRedirect reads in the health server by reloading Service
state from the services table on each request. This prevents incorrect
local endpoint counts when Envoy redirect state changes after the
listener is created (which is the case).
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* localnodestore: provide WaitForNodeInformation
This commit moves global function `k8s.WaitForNodeInformation`
into the `localNodeSynchronizer`. To prevent dependencies to the
synchronizer, the `LocalNodeStore` re-exposes the same method and
delegates the call to the synchronizer.
This way, the legacy daemon initialization logic no longer needs to
dependent on the (k8s/cilium)-node resources.
This step isn't final. Eventually, the wait logic should watch
the state db "local node" for the changes.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* localnodestore: pass localnodestore to synchronizer
With this commit, the `LocalNodeStore` passes a pointer
to itself to the synchronizer when calling `WaitForNodeInformation`.
This way, the synchronizer can update the ip allocation ranges without
using the global functions.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* ci: e2e: add `kernel` to workflow job names
As a follow-up of changes introduced in https://github.com/cilium/cilium/pull/44126
to improve the readability of workflow job names in the GitHub UI, this
commit adds the `kernel` parameter to the name too. In OpenSearch,
it would be good to differentiate these jobs by the used kernel.
Given the kernel is a parameter that may vary frequently, it is good to be
able to differentiate runs that used the same configuration but different
kernels. That's also good in case of regressions when changing kernel.
The result will be `Setup & Test (ipsec-1, minor, 5.10)`.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* pkg/datapath/bandwidth: optimize host endpoint QoS setup
The bandwidth manager was calling node.GetEndpointID() and performing
a table lookup on every UpdateBandwidthLimit() call, even though the
host endpoint QoS only needs to be set up once.
This commit introduces an atomic boolean flag that tracks whether the
host endpoint QoS has been configured. After the initial setup:
- Skip the GetEndpointID() call (fast path)
- Skip the Table.Get() call entirely
- Skip the redundant Insert() call for host endpoint
Additionally, based on review feedback:
- Change node.GetEndpointID() to return (uint64, bool) to indicate
whether the host endpoint ID has been set, avoiding duplicate
constants across packages
- Change node.endpointID to atomic.Uint64 for thread-safe access
during initialization
Signed-off-by: Anand Kumar Shaw <anandkrshawheritage@gmail.com>
* clustermesh: fix a few misc issue with MCS-API doc
This commit fixes two issue with mcs-api doc:
- a simple typo on the enabled keywork
- change the code-block in parsed-literal as the |SCM_WEB| "variable"
was not evaluated/replaced in the final doc with a code-block
Signed-off-by: Arthur Outhenin-Chalandre <git@mrfreezeex.fr>
* Docs: improve docs around ipsec upgrade in 1.18
Signed-off-by: darox <maderdario@gmail.com>
* docs(ztunnel): fix duplicate word (a set)
Signed-off-by: Alexis La Goutte <alexis.lagoutte@gmail.com>
* docs(ztunnel): add missing backslash
add missing backslash for install with Cilium CLI
Signed-off-by: Alexis La Goutte <alexis.lagoutte@gmail.com>
* clustermesh: helm: remove clustermesh.enableMCSAPISupport
This option was deprecated and its removal was announced for Cilium
1.20, so let's drop it now that we are in the 1.20 cycle!
Signed-off-by: Arthur Outhenin-Chalandre <git@mrfreezeex.fr>
* daemon: enforce iptable rules are present with node-port is enabled
Moving forward the node-port code path will rely on iptables to
identify locally generated traffic and create NAT reservations for this
traffic in order to avoid reservation conflicts.
We will consider it a fatal error moving forward to disable iptable rule
installation while node-port or eBPF masquerading is enabled.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* bpf,nodeport: generalize SNAT conflict detection
Prior to this commit the nodeport SNAT conflict detection assumed host
traffic was sourced from the direct routing interface's IP and always
egressed on this device.
Remove this assumption by detecting locally generated traffic from any
egress interface 'bpf_host' is attached to.
This removes the dependency on the direct routing interface in the
node-port path.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* ztunnel: introduce end to end connectivity tests
The test infrastructure deploys three dedicated namespaces with client
and echo-server pods in each: two enrolled namespaces for testing
cross-namespace mTLS scenarios and one unenrolled namespace for baseline
verification. Pod affinity rules ensure echo-same-node pods co-locate
with clients while echo-other-node pods schedule on different nodes,
enabling both intra-node and inter-node traffic validation.
The test scenarios verify ztunnel mTLS behavior by dynamically labeling
namespaces with the io.cilium/mtls-enabled label during test execution.
For enrolled pod pairs, the tests assert that traffic flows through port
15008 (the ztunnel HBONE proxy) and that no unencrypted traffic appears
on port 8080. For unenrolled pods, the assertions are inverted to confirm
traffic bypasses ztunnel entirely. Cross-namespace scenarios confirm that
mTLS works correctly when client and server reside in separately enrolled
namespaces.
Packet capture validation runs from host network pods using tcpdump,
which is required since ztunnel operates in the host network namespace.
The tests query the ztunnel admin API to verify workload registration
before generating traffic, ensuring the any state the test depends on
has converged before making assertions.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
Signed-off-by: Quang Nguyen <nguyenquang@microsoft.com>
Signed-off-by: Robin Gögge <r.goegge@isovalent.com>
* ci,ztunnel: add workflows for ztunnel encryption tests
Add a new GitHub Actions workflow to run end-to-end tests for ztunnel
encryption in Cilium.
The new /ci-ztunnel-e2e trigger is added to the Ariane configuration,
pointing to the newly created conformance-ztunnel-e2e.yaml workflow
file.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* ci,ztunnel: add ztunnel cert script to actions
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* datapath: remove GetRoutePostEncryptMTU()
The last user was removed by
f81d8964ce4a ("ipsec: don't reduce post-encrypt MTU for tunnel overhead").
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* datapath: ipsec: remove clean up code for encrypt IP rule
https://github.com/cilium/cilium/pull/41699 stopped installing this IP
rule in the v1.19 release. With v1.20 we can now stop cleaning it up.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* loadbalancer/api: include proxy-redirect as backend
Currently, listing services via `cilium-dbg service list` displays
no backends for the services that use L7 proxy redirection (e.g.
Gateway API / Ingress loadbalancer service).
```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID Frontend Service Type Backend
1 10.96.62.28:80/TCP ClusterIP 1 => 10.244.1.149:4245/TCP (active)
2 10.96.0.10:53/TCP ClusterIP 1 => 10.244.1.69:53/TCP (active)
2 => 10.244.1.109:53/TCP (active)
3 10.96.0.10:53/UDP ClusterIP 1 => 10.244.1.69:53/UDP (active)
2 => 10.244.1.109:53/UDP (active)
4 10.96.0.10:9153/TCP ClusterIP 1 => 10.244.1.69:9153/TCP (active)
2 => 10.244.1.109:9153/TCP (active)
5 10.96.0.1:443/TCP ClusterIP 1 => 172.19.0.2:6443/TCP (active)
12 10.96.162.7:443/TCP ClusterIP 1 => 172.19.0.2:4244/TCP (active)
15 10.96.106.135:80/TCP ClusterIP 1 => 10.244.1.115:80/TCP (active)
16 10.96.55.103:80/TCP ClusterIP 1 => 10.244.1.67:80/TCP (active)
17 [::]:30965/TCP NodePort
19 [::]:30965/TCP/i NodePort
21 0.0.0.0:30965/TCP NodePort
23 0.0.0.0:30965/TCP/i NodePort
25 10.96.245.249:80/TCP ClusterIP
26 172.19.255.1:80/TCP LoadBalancer
27 172.19.255.1:80/TCP/i LoadBalancer
28 [fd00:10:96::d99f]:80/TCP ClusterIP
```
Even though the actual redirection happens via BPF/iptables TPROXY, it would be
nice to display the information about the local redirection to the node-local
L7 proxy (Envoy) in some form.
Therefore, this commit changes the frontend model generation to treat the
proxy redirection as backend of the service frontend in the form
`<localhost-addr>:<proxy-port>/<proto> (active)`
Result
```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID Frontend Service Type Backend
1 10.96.62.28:80/TCP ClusterIP 1 => 10.244.1.149:4245/TCP (active)
2 10.96.0.10:53/TCP ClusterIP 1 => 10.244.1.69:53/TCP (active)
2 => 10.244.1.109:53/TCP (active)
3 10.96.0.10:53/UDP ClusterIP 1 => 10.244.1.69:53/UDP (active)
2 => 10.244.1.109:53/UDP (active)
4 10.96.0.10:9153/TCP ClusterIP 1 => 10.244.1.69:9153/TCP (active)
2 => 10.244.1.109:9153/TCP (active)
5 10.96.0.1:443/TCP ClusterIP 1 => 172.19.0.2:6443/TCP (active)
12 10.96.162.7:443/TCP ClusterIP 1 => 172.19.0.2:4244/TCP (active)
15 10.96.106.135:80/TCP ClusterIP 1 => 10.244.1.115:80/TCP (active)
16 10.96.55.103:80/TCP ClusterIP 1 => 10.244.1.67:80/TCP (active)
17 [::]:30965/TCP NodePort 1 => [::1]:14543/TCP (active)
19 [::]:30965/TCP/i NodePort 1 => [::1]:14543/TCP (active)
21 0.0.0.0:30965/TCP NodePort 1 => 127.0.0.1:14543/TCP (active)
23 0.0.0.0:30965/TCP/i NodePort 1 => 127.0.0.1:14543/TCP (active)
25 10.96.245.249:80/TCP ClusterIP 1 => 127.0.0.1:14543/TCP (active)
26 172.19.255.1:80/TCP LoadBalancer 1 => 127.0.0.1:14543/TCP (active)
27 172.19.255.1:80/TCP/i LoadBalancer 1 => 127.0.0.1:14543/TCP (active)
28 [fd00:10:96::d99f]:80/TCP ClusterIP 1 => [::1]:14543/TCP (active)
```
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* datapath/ipcachelistener: use injected localnodestore
This commit refactors the datapath ipcache BPF listener to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.
Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* datapath/linuxnodehandler: retrieve node ips from localnodestore
This commit refactors the `linuxNodeHandler` to fetch the node ips
from the injected `LocalNodeStore` instead of using the global
functions `node.GetIP[v4/v6]`.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* identity/cache: use injected localnodestore
This commit refactors the identity cache to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.
Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* node/address: remove global functions `GetIP[v4/v6]`
This commit removes the unused global functions `GetIPv4` & `GetIPv6`.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* test: remove K8sDatapathBandwidthTest
The Ginkgo bandwidth test was already not running in CI. It only ran on
net-next kernels, but the focus group that included it
(f10-agent-hubble-bandwidth) was excluded from the net-next job.
Bandwidth enforcement is independently tested in
cilium-cli/connectivity/builder/network_bandwidth_limit.go.
Fixes: #44165
Related: #37837
Signed-off-by: Pavan More <pavansmore05@gmail.com>
* wireguard: remove cleanup code for old userspace devices
2578e8ff4dfa ("wireguard: remove deprecated userspace fallback") removed
the userspace mode in v1.17, we can now stop cleaning up any network device
created for that mode.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* loadbalancer: Check for equality and skip insert when not changed
This adds DeepEqual() implementations for Service and FrontendParams and uses
it to skip inserting the service and updating the frontend if nothing has changed.
As we now don't forcefully update the frontend the orphan detection has to actually
build a set to check if a frontend has been removed.
Benchmarks before:
Benchmark_UpsertServiceAndFrontends_100-6 3549 317691 ns/op 314771 objects/sec
BenchmarkInsertBackend-6 2818 423975 ns/op 235863 objects/sec
BenchmarkReplaceBackend-6 326682 3793 ns/op 263669 objects/sec
BenchmarkReplaceService-6 2327074 509.4 ns/op 1963230 objects/sec
After:
Benchmark_UpsertServiceAndFrontends_100-6 3464 331791 ns/op 301395 objects/sec
Benchmark_UpsertServiceAndFrontends_100_Unchanged-6 14652 81250 ns/op 1230766 objects/sec
BenchmarkInsertBackend-6 2956 401100 ns/op 249315 objects/sec
BenchmarkReplaceBackend-6 3402430 360.9 ns/op 2771038 objects/sec
BenchmarkReplaceService-6 2068555 556.6 ns/op 1796743 objects/sec
Signed-off-by: Jussi Maki <jussi@isovalent.com>
* loadbalancer: Remove dummy ingress endpoint workaround
Remove the skipping of the 192.192.192.192:9999 dummy ingress
endpoint as the next commit will remove the creation of it.
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* operator/helm: Remove creation of dummy ingress endpoint
With the new agent load-balancer implementation the dummy endpoint
is no longer necessary. Now that v1.18 has been released with that
implementation we can remove the creation of the dummy endpoint
without breaking upgrade.
Note: This commit removes the dummy endpoint in the Operator-
(Gateway API & dedicated Ingress) & Helm- (shared Ingress) managed
`EndpointSlice`'s. The creation of the `EndpointSlice` itself will
will be done in a separate PR.
Fixes: #19262
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* monitor: report 3rd argument in DBG_GENERIC debug monitor messages
Currently, in case cilium_dbg3(ctx, DBG_GENERIC, arg1, arg2, arg3) is used in
datapath code, the 3rd argument will not be reported in the monitor
message, e.g.:
bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac)
Also report the 3rd argument, so the monitor message will look as
follows:
bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac) arg3=170 (0xaa)
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* policy: cleanup label selector parsing and validation
This commit cleans up the parsing and validation of policy label
selectors by removing the dependency on k8s labels library for
validation.
With this change we no longer convert cilium representation
of label keys with source prefixed using ':' delimiter to k8s specific
representation with source prefixed using '.' for validation.
This avoids the additional complexity in policy code to convert between
the two representations. It also simplifies marshalling of selectors as
the objects now have a unified format.
This is not a functional change and does not have any associated user
impact.
Signed-off-by: Deepesh Pathak <deepesh.pathak@isovalent.com>
* helm/ztunnel: bind health check to localhost
Security hardening for ztunnel running with hostNetwork: true:
Add host field to readiness probe to bind the health check port 15021
to 127.0.0.1 instead of 0.0.0.0. This reduces attack surface by ensuring
the health check endpoint is only accessible from localhost (kubelet
runs on same node).
Signed-off-by: Quang Nguyen <nguyenquang@microsoft.com>
* ci:wireguard: enable Host Firewall in native routing e2e tests
This enabled Host Firewall in the `wireguard-3` config.
This helps us validating that host-related packets go through the host
firewall hook in `cilium_host` when WireGuard is enabled in native routing.
In overlay, even if we'd miss a redirect we'd see the packet in bpf_overlay,
which will then redirect the packet to bpf_host for HostFW validation.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* mcsapi: Add namespace filtering conditions to ServiceImport controller
Add namespace global status checking to the ServiceImport controller:
- Set Invalid condition on ServiceExport when namespace is not global
- Set NamespaceNotGlobal condition on ServiceImport when namespace is not global
- Signal service controller to skip/delete derived Service for non-global namespaces
by setting SupportedIPFamilies annotation to empty
This implements the MCS-API portion of the ClusterMesh global namespace
filtering feature as described in CFP-39876.
Signed-off-by: Jacques Massa <jac.massa0908@gmail.com>
* docs: split up network policy language page
Up until now, the page title had been "Layer 3 Examples", which is
a section headline and confusing, since it is among other examples.
Splitting up into several pages, similar to `network/kubernetes/`,
keeps the ToC as it is, and makes it easier to navigate compared to
the lengthy page it was, while also giving each page a suitable
headline.
Since examples are mixed with the language specification, change
headings from "Layer 3 Examples" to "Layer 3 Policies", etc.
Drop the old page and redirect to the overview to keep links working.
Signed-off-by: Daniel Maslowski <info@orangecms.org>
* golangci-lint: fix and simplify golangci-lint.sh
golangci-lint.sh would always rebuild the binary. The script was over-engineered
and didn't use the `golangci-lint version` subcommand, which doesn't need the
version string parsing logic. Simplify the script and fix the rebuild trigger.
Prepare the directory structure for splitting the main .golangci-lint.yaml in a
subsequent commit.
Signed-off-by: Timo Beckers <timo@isovalent.com>
* golangci-lint: split kubeapi configuration into separate file
The addition of kubeapilinter forced all users of the main .golangci.yaml to be
running a custom-built version of the tool and make sure it's kept in sync.
VSCode runs `golangci-lint run` in the repo by default and requires explicit
configuration to use another tool, like a script in the Cilium repository.
Before this PR, the script was broken and would always rebuilt custom-gcl.
Breaking the default tooling is not great. This commit moves kubeapi-specific
configuration to a dedicated config file in tools/golangci-lint-kubeapi and
specifies it explicitly when starting the kubeapi linter.
Signed-off-by: Timo Beckers <timo@isovalent.com>
* policy: Import the ClusterNetworkPolicy v1alpha2 API go modules
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Add parser function for Cluster Network Policy
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Add flags to enable or disable Cluster Network Policy
Disabled by default. A new Makefile target is added that enables it in kind clusters.
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Add watcher for Cluster Network Policy
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Move NetworkPolicy and CNP/CCNP to the 'Normal' tier
Signed-off-by: Blaz Zupan <blaz@google.com>
* CODEOWNERS: re-assign operator/identitygc, adjust endpoint team description
Assign operator/identitygc to @cilium/sig-policy instead of
@cilium/endpoint based on Joe's feedback. This seems like the more
appropriate team based on the description.
Also update the description for the @cilium/endpoint team to cover
responsibilities for cluster-wide/k8s considerations and endpoint
lifecycle.
Follow-up to commit 11376d5c715b ("CODEOWNERS: add more specific owners
for operator subsystems").
Ref. https://github.com/cilium/cilium/pull/44279#pullrequestreview-3780398303
Suggested-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* nodemap: converted net.IP to netip.Addr, Part of #24246
- Converted net.IP to netip.Addr in node map functions
- Added validation for zero netip.Addr values in newNodeKey
- Wrapped parse errors with %w for better error context
Signed-off-by: Sanjeevliv <sanjeevsethilive@gmail.com>
* bpf/tests: fix byte ordering for for TCP seq/win values
The pktgen logic for BPF tests sets default values on TCP packets in
host byte order only. This feeds into the TCP checksum calculation
and while these inputs are technically incorrect, tests that assert
the TCP checksum are correctly testing the invalid output values.
With the introduction of scapy packet definitions, matching these values
to existing tests is difficult because scapy does convert seq and win
values into network byte order. The effect is that L4 sequence numbers
between pktgen and scapy are wrong.
This commit updates pktgen and existing tests that assert these values
to use the appropriate byte conversion routines so that both pktgen
and scapy definitions produce the same results.
This causes all affected BPF tests to fail. This will be addressed
in the next commit.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* bpf/tests: fix TCP checksum assertions in all tests
This commit updates every test that asserts TCP checksums to reflect new
values following the change to the default TCP seq/win values, such that
the assertions for this should succeed with a scapy packet definition.
As part of this change, test failure logs have been updated to correctly
show value observed in a packet, and the expected value, both in network
byte order. This is to simplify future debugging. This has also been
applied to UDP checksum assertions for consistency.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* bpf/tests: fix default_data definition for scapy tests
The pktgen code defines default_data macro as a string literal which
includes a NUL-terminating character. The size of this string is 20
bytes.
The scapy code defines the same value without the NUL-terminating
character. The size of this string is 19 bytes.
As a result, when converting a test from pktgen to scapy, the packet
length changes, which causes assertion failures beause the change leads
to different L3 checksum values.
This commit modifies the scapy default_data variable to include a NUL
terminating character to match pktgen definitions. It is anticipated
this can be removed once all BPF tests are migrated to scapy.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* endpoint, fqdn: remove restoration of deprecated V1 DNSRules
Now that 1.16 is EOL, the code restoring PortProto V1 (without protocol)
can be removed.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* endpoint: rename DNS rules field
The V2 suffix is unnecessary now that we stop restoring V1 rules and
there is only a single supported version/field.
Also add a test that verifies that V2 rules are restored and V1 rules
are ignored while retaining backwards compatibility.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* tests: Ignore identity manager related error in versions < 1.18
Due to https://github.com/cilium/cilium/pull/42661 and
https://github.com/cilium/cilium/pull/42662 not being backported yet to
v1.17, CI fails in the upgrade/downgrade test with this error.
Therefore, we must add it to the ignore list until the PRs are at least
backported to v1.17.
The error was removed from the ignore list in
https://github.com/cilium/cilium/pull/42982.
Suggested-by: Marco Iorio <marco.iorio@isovalent.com>
Suggested-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
* metrics: remove agent bootstrap metrics
This commit removes the deprecated agent bootstrap metrics.
Hive job metrics can be used to inspect the duration of the legacy
daemon initialization job. In special cases it might be worth to
introduce context specific metrics - similar to what has been
introduced for the endpoint restoration logic.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* policy: fix policy tests
This fixes a policy break due to how label source is handled that
recently changed.
Fixes: 270daa4067 ("policy: Add parser function for Cluster Network Policy")
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
* resource/test: let TestResource_WithFakeClient set resource version
Update the TestResource_WithFakeClient test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* cid/test: let TestUpdatePodLabels set resource version
Update the TestUpdatePodLabels test to correctly specify the expected
resource version during updates, in preparation for extending the fake
client to actually enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* ipam/test: let Test_LocalNodeCIDRsSyncer set resource version
Update the Test_LocalNodeCIDRsSyncer test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* bgp/test: correctly set resource version when updating test resources
Update the bgp tests to correctly specify the expected resource version
during updates, in preparation for extending the fake client to actually
enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* test/controlplane: adaptation for optimistic concurrency control
Update the UpdateObjects helper to use the [ObjectTracker.Patch],
instead of [ObjectTracker.Update], in preparation for the subsequent
commit that will make the latter implement optimistic concurrency
control, and validate resource version mismatches, which is not
required in this context.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* k8s/client/fake: fix resource version configuration in tracker
Currently, the object tracker is affected by a bug that causes the
resource version to not be set on creation or update if the object
does not have the [metav1.TypeMeta] set. Indeed, in that case, the
function updating the TypeMeta creates a deep copy of the object,
causing operations performed via [meta.Accessor] to act on the old
copy, and not have effect. Let's get this fixed by changing the
[fillTypeMetaIfNeeded] function to not create a deep copy, given
that it already operates on a copy of the original object.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* k8s/client/fake: let update operations respect resource versioning
Currently, the statedb object tracker backing the fake kubernetes client
used for testing purposes does not respect resource versioning, and allows
update operations to succeed regardless of the provided resource version.
While convenient for the `k8s/update` command itself, this approach is
problematic in case of controllers acting on the same resources, as it
can lead to objects being unexpectedly reverted to incorrect versions,
due to the missing optimistic concurrency control.
Let's get this fixed by extending the update implementation to additionally
compare the resource version of the stored and provided objects, and reject
the update in case they do not match, as the real Kubernetes API Server
would do. By default, the k8s/update command still ignores the provided
resource version, letting the update succeed regardless: this matches the
desired behavior in the vast majority of the tests, and avoids the need
for complex operations to set the expected resource version. Still, if
necessary, the stricter behavior can be enabled via the dedicated flag.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* chore(deps): update base-images to v1.26.0
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* images: update cilium-{runtime,builder}
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
* chore(deps): update cilium/cilium-cli action to v0.19.1
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all github action dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update gcr.io/etcd-development/etcd docker tag to v3.6.8
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all lvh-images main
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all github action dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all-dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.gi…
* ci: e2e: enhance readability of workflow job name
This commit updates the name of the "Setup & Test" job in the
GitHub Actions workflow for e2e upgrade tests to include only the matrix
parameters "name" and "mode". This change improves the readability
of the workflow runs by providing more context about the specific
configuration being tested.
Prior to this, the name of each job contained the whole matrix combination,
which in the UI resulted to be cut off and not readable. Given that now
we use the same workflow file for running both `minor` and `patch` upgrades,
let's make the displayed name simpler.
The result will be `Setup & Test (ipsec-1, minor)`.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* ci: e2e: log matrix configuration in each job
This commits adds as a first step of the `Setup & Test` job for e2e-upgrade
a simple step to dump the current matrix configuration being tested.
The previous commit, modified the title to simply display the matrix entry
name and mode (e.g., `Setup & Test (ipsec-1, minor)`) rather than the
whole configuration. In UI, that would result to be truncated anyway.
It is true that, given the matrix.name (e.g., ipsec-1), a user can open the
specific file and lookup the configuration required, but I think that
having a step where we dump it would speed up and easy debuggability in CI.
The output would be similar to:
```
> Log Matrix Configuration
Current matrix configuration:
{
"name": "wireguard-1",
"kernel": "5.10",
"kube-proxy": "iptables",
"kpr": "true",
"devices": "{eth0,eth1}",
"secondary-network": "true",
"tunnel": "vxlan",
"encryption": "wireguard",
"encryption-node": "false",
"lb-mode": "snat",
"endpoint-routes": "true",
"egress-gateway": "true",
"ingress-controller": "true",
"mode": "minor"
}
```
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* endpoint: Log labels as structured JSON objects
Standardize logging in pkg/endpoint so that identityLabels and related fields are logged as structured JSON objects instead of comma-separated strings by explicitly casting labels.Labels to map[string]labels.Label.
Signed-off-by: Jie WU <wujie@google.com>
```release-note
endpoint: Log labels as structured JSON objects
```
* feat(helm): hubble-ui containers set to pss-restricted
This sets the hubble-ui pods/containers to match k8s
pss-restricted profile along with the optional
`readOnlyRootFilesystem: true`
Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
* helm: allow multicluster-services installCRDs to update CRDs
Previously cilium-operator fails to start if MCS/installCRDs is enabled
because it does not have permissions to update the CRD with this log
message:
level=error msg="Unable to update CRD"
module=operator.operator-controlplane.leader-lifecycle.create-crds
name=serviceimports.multicluster.x-k8s.io
error="customresourcedefinitions.apiextensions.k8s.io
\"serviceimports.multicluster.x-k8s.io\" is forbidden: User
\"system:serviceaccount:kube-system:cilium-operator\" cannot update
resource \"customresourcedefinitions\" in API group
\"apiextensions.k8s.io\" at the cluster scope"
This patch adds the necessary permissions to cilium-operator if you have
mcs/installCRDs enabled
Fixes: #44210
Fixes: 3874013329d0 ("clustermesh: add config for auto installing
MCS-API CRDs")
Signed-off-by: Florian Ströger <stroeger@youniqx.com>
* policy: (mechanical) refactor out flow lookup types
A subsequent commit will include an alternate policy iteration system,
so it will be nice to move the types to policy/types.
This also removes the now-useless Decision type, as it's not used
anywhere in the codebase.
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
* policy: add a simple iterative policy simulator
This is a simple userspace tool that executes rules step-by-step. It's
purpose will be to validate more complex policy scenarios, ideally by
fuzzing.
To ensure it's output matches that of the existing policy engine, it
matches the LookupFlow method signature, and existing tests validate
that the simulation engine returns the same verdict.
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
* Add fuzz-based policy testing
This generates random policy corpuses and compares MapState-based policy
calculation with the iterative simulator.
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
* policy: Fix fuzz testing
Avoid using *testing.F for the logger as then any log within the fuzz
test would fail.
Fix the order of expected and actual for require.Equal.
Add more debugging.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Hide precedence details better
Hide precedence details from the policymap package.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Add optional indexing by identity
Add optional mapState indexing by identity to support incremental removal
of generated keys. This is only needed for deletion pass entries, so the
index is only used if the policy has pass verdicts.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Add default deny rule when pass verdict is used
Proper processing of pass verdicts requires the default deny rule to be
explicitly added to the mapstate so that it can be seen by pass verdict
entries.
The default rule is added to the next tier if any non-default tiers or
priorities are in use, of if the traffic direction has any pass
rules. This way the pass rule can pass to the added default deny (or
allow) rule.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Fix L7Filter precedence handling for pass verdicts
Deny takes precedence over allow and pass, allow takes precedence over
pass. Define new HasPrecedenceOver() to handle this instead of using just
IsDeny() like before. Would be simpler if Allow was not the zero value,
but changing that would require changing all unit testing code that uses
it as the default.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Fix per-tier priority range allocation
Fix tier base priority calculation. When figuring out the priority range
for each tier, the full range of the remaining tiers must be included to
add enough space for pass verdicts on higher tiers. Then, when setting
the base priotity of each tier, this has to be reversed.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Add Fuzz cases
Commit fuzzer cases found during development.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* policy: Add generated L3/4 entries, multipass support
A pass of a specific identity to a lower tier rule with wildcard identity
should pass the given identity only and keep the wildcard entry at the
original precedence to take care of traffic with other identities. Since
the original entry needs to be kept, a new generated entry with the
identity from the pass entry and the L4 from the passed to entry must be
added.
We missed this case earlier due to BroaderOrEqualKeys only iterating
wildcard identity entries when the new key is a wildcard entry. Entries
that have a broader or equal L4 but more specific L3 are not as a whole
"broader or equal". To handle the need for generated entries for the pass
verdict processing "BroaderOrEqualKeys" is changed to also iterate all
specific L3 keys if the L4 is broader or equal and the given key has the
wildcard identity. The old behavior is retained with
CoveringBroaderOrEqualKeys(). Similarly, NarrowerOrEqualKeys() is renamed
as CoveringNarrowerOrEqualKeys() while NarrowerOrEqualKeys() now also
iterates keys with the wildcard identity when the given key has a
specific identity.
The addition of generated entries requires these entries to be deleted
when that identity is incrementally deleted. Since selector cache is
transactional we can delete all keys with the deleted identity, when the
first key with that identity is deleted. To make this efficient we use
the new id index.
To add support for pass verdicts at multiple tiers, the pass metadata is
now stored as a slice. Overhead to non-pass entries is reduced by storing
the slice via a pointer ('passes'), as most mapStateEntries would not
have any pass metadata.
If 'passes' is non-nil, then the pointed-to slice must have at least
one element, and all elements must have non-zero 'passPrecedence'.
When merging pass metadata we clone the slice to be mutated so that the
same slice can safely be used in multiple entries.
Split insertWithPasses() from insertWithChanges(); insertWithPasses() is
only calling it if the policy has any pass verdicts. This reduces the
chance of regressions for non-pass policies.
Log a warning if a policy with pass verdicts is also using auth
requirements, as this combination has not been implemented. Adjust a test
to not claim all features when that is not the case.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* pkg/subnet: Fix tag in config subnets field
The Subnets field in the config was declaring a json tag, leading to a
failure of the agent `hive` command (see below). This is due to the fact
that the hive relies on a mapstructure Decoder, not a Json one, and
therefore require a mapstructure tag when the config field name is not
equal to the flag name.
Fix the tag on the field using a mapstructure one.
```
make -C daemon/ && ./daemon/cilium-agent hive
...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xacd46e]
goroutine 1 [running]:
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000b540f0, 0xa?, 0xc000cba750)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x12e
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000a9bdd0, 0x6?, 0xc000cba750)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x134
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000a799b0, 0x2?, 0xc000cba750)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x134
github.com/cilium/hive.(*Hive).PrintObjects(0xc0009554a0, {0x51e1240, 0xc0000c0030}, 0x0?)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/hive.go:459 +0x18f
github.com/cilium/hive.(*Hive).Command.func1(0xc000e0fc00?, {0x4b05b96?, 0x4?, 0x4b05aca?})
/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/command.go:21 +0x2d
github.com/spf13/cobra.(*Command).execute(0xc000953508, {0x86bac20, 0x0, 0x0})
/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1015 +0xb02
github.com/spf13/cobra.(*Command).ExecuteC(0xc000952f08)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1148 +0x465
github.com/spf13/cobra.(*Command).Execute(...)
/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1071
github.com/cilium/cilium/daemon/cmd.Execute(0x4d769d0?)
/home/ffalzoi/cilium/cilium-2/daemon/cmd/root.go:89 +0x13
main.main()
/home/ffalzoi/cilium/cilium-2/daemon/main.go:15 +0x1f
```
Fixes: d395d73ad3 ("pkg/subnet: Add subnet config watcher and manager")
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
* linux-desired-device: introducing Cilium managed devices reconciler
Adding new reconciler in Cilium datapath/linux which can manage
life cycle of linux links created by Cilium.
Created devices are persisted on disk using write-ahead-log, upon
restart the owners of the devices are expected to redo the
configuration before calling finializer. Stale devices will be pruned.
Implementation is inspired by linux/route/reconciler.
Signed-off-by: harsimran pabla <hpabla@isovalent.com>
* linux-desired-device: script tests for desired-devices
Adding script tests to validate device creation and persistence.
Signed-off-by: harsimran pabla <hpabla@isovalent.com>
* linux-desired-device: hook desired-devices cell into main infra
Signed-off-by: harsimran pabla <hpabla@isovalent.com>
* fix: helm intervalSeconds value render bug
intervalSeconds is always of an integral type, no need to check kindIs float64
Fixes: #44206
Signed-off-by: jayl1e <jayl1e@outlook.com>
* bpf: source tuple hash seeds from node config
Move IPv4/IPv6 hash init seeds into node config and wire them from
Maglev config. BPF tuple hashing now reads CONFIG(hash_init{4,6}_seed)
instead of compile-time defines, and the legacy HASH_INIT* defines are
removed from the header writer and node_config.h.
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
* feat(install) Allow hubble to run with hostUsers: false
The hubble components do not require direct mapping of container
users to system users.
Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
* docs: fix typos in comments
Signed-off-by: Yohei Yamamoto <yhymmt123@gmail.com>
* bpf: correct comments in cil_from_netdev function
Removed conditions from the comment block describing the cil_from_netdev function since the logic has been changed
Signed-off-by: Liyi Huang <liyi.huang@isovalent.com>
* chore(deps): update all lvh-images main
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all-dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all github action dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* bpf: lb: Decouple DNAT operation from LB key
Instead of passing the lb4_key/lb6_key to lb4_xlate/lb6_xlate for
checksum calculation and port translation, pass the original destination
address and port directly from the CT tuple.
This change:
1. Removes the key parameter from lb4_xlate/lb6_xlate functions
2. Removes the key parameter from lb4_dnat_request/lb6_dnat_request
The CT tuple already contains the same values that were being read from
the key structure:
- tuple->daddr == key->address (original destination)
- tuple->sport == key->dport (reversed port order in CT tuple)
By removing the xlate path's dependency on key, we can now directly
modify key->address = 0 for wildcard lookups without creating a copy,
simplifying the backend selection logic.
Co-authored-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
* bpf: lxc: Handle DSR for remote NodePort services on source node
When DSR and PER_PACKET_LB are enabled, connections fail if a client pod
sends a request to a remote node's NodePort service while the server
pod is located on the same node as the client.
The root cause is that the remote node only performs DNAT, setting the
packet's source address to client's node address, leading to a hairpin
problem. Consequently, the originating node cannot perform the necessary
REV NAT for the reply packets.
To resolve this, remote NodePort service requests are now handled on the
source node when DSR is enabled, similar to the behavior of socket-level
load balancing.
Implementation details:
- Add lb4/6_lookup_wildcard_nodeport_service(): When ENABLE_DSR is
defined and the regular service lookup fails, check if the destination
is a remote node's IP with a NodePort port range. If so, perform a
wildcard lookup (address=0) to find the NodePort service.
- Use wildcard key for backend selection: When dsr_internal flag is set,
set key->address to 0 before calling lb4/6_select_backend_id(). This
applies to both CT_NEW (new connections) and CT_REPLY (backend
re-selection for existing connections). This is needed for backend
selection algorithms that use slot lookup (e.g., Random), which look
up backend slots via lb4/6_lookup_backend_slot() using the service
key. Without a wildcard key, the lookup would fail because backend
slot entries are stored with the wildcard service key, not with the
remote node's IP.
- Store original destination in CT entry: The original destination
address and port (remote node IP and NodePort) are stored in
ct_state_new.nat_addr/nat_port, which will be written to the CT entry
for use in reply path RevNAT processing.
- Use cilium_dsr_nat_buffer per-CPU map: The NAT info is detected in
__per_packet_lb_svc_xlate_4/6(), but the CT entry is created after DNAT
when the original destination info is no longer available in the packet.
The per-CPU buffer preserves this info across the DNAT operation.
Existing connection handling:
- This change only affects DSR traffic destined to remote node's
NodePort. The wildcard lookup is triggered when lb4/6_lookup_service()
fails, but it only processes packets where the destination is a remote
node IP with a port in the NodePort range. Other traffic that fails
the regular lookup is unaffected.
Fixes: #41962
Co-authored-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
* bpf: lxc: Add RevNAT support for DSR remote NodePort connections
Add reverse NAT support for reply packets of DSR remote NodePort
connections. The forward path stores the original destination address
and port in the CT entry's nat_addr/nat_port fields, which are now
used during reply processing.
Implementation details:
- Extend lb4/6_rev_nat() signature: Add nat_addr and nat_port parameters.
When nat_port is non-zero, use these values directly for RevNAT.
When nat_port is zero, fall back to the existing rev_nat_index lookup
for backward compatibility with existing connections.
- Modify ct_lookup_fill_state(): Copy nat_addr and nat_port from the
CT entry to ct_state, making them available for reply processing.
- Update ipv4/6_policy(): Check for nat_port in addition to
rev_nat_index when deciding whether to perform RevNAT. Pass the
CT entry's NAT information to lb4/6_rev_nat().
- Update nodeport_rev_dnat_ipv4/6(): Adapt to the new lb4/6_rev_nat()
signature by passing NULL/0 for nat_addr/nat_port (these paths use
the traditional rev_nat_index lookup).
Existing connection handling:
1. New connections (created after this patch):
- Forward path stores nat_addr/nat_port in CT entry
- Reply path uses CT entry's nat_addr/nat_port for RevNAT
2. Existing connections (regular NodePort/DSR traffic):
- CT entry has nat_addr=0, nat_port=0
- lb4/6_rev_nat checks nat_port first:
- If nat_port != 0: use nat_addr/nat_port directly
- If nat_port == 0: fall back to rev_nat_index lookup
- This ensures existing connections continue to work
3. Upgrade scenario:
- Existing connections keep working via rev_nat_index fallback
- New DSR remote nodeport connections use nat_addr/nat_port
- No connection disruption during rolling upgrade
Co-authored-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
* bpf: tests: Add tests for DSR remote NodePort handling
Add BPF unit tests to verify the DSR remote NodePort functionality
for both IPv4 and IPv6.
Test scenarios for DSR mode (tc_lxc_lb4/6_dsr_nodeport.c):
1. Pod -> Remote NodePort -> Local backend (forward path)
- Client pod sends packet to remote node's NodePort
- LB selects a backend on the local node (same node as client)
- Verifies packet is DNATed to local backend IP and port
- Verifies CT entry contains correct nat_addr (remote node IP)
and nat_port (NodePort)
2. Local backend -> Pod (reply path)
- Backend sends reply packet to client
- Verifies RevNAT is applied correctly
- Source IP/port changed to remote node IP and NodePort
3. Pod -> Remote NodePort -> Self (hairpin)
- Client pod sends packet to remote node's NodePort
- LB selects the client pod itself as the backend
- Verifies DNAT to client IP and backend port
- Verifies SNAT to loopback IP for hairpin flow
4. Hairpin reply
- Pod replies to loopback IP
- Verifies RevNAT restores remote node IP and NodePort
5. Existing connection handling (UDP)
- First packet establishes CT entry via legacy path
- Second packet should use existing CT entry
- Verifies wildcard lookup is skipped for existing connections
Test scenarios for Hybrid mode (tc_lxc_lb4/6_hybrid_dsr_nodeport.c):
1. DSR service handling
- Verifies DSR-enabled service triggers wildcard lookup
- Packet is DNATed to local backend
2. SNAT service handling
- Verifies SNAT service does NOT trigger wildcard lookup
- Packet passes through without DNAT (handled by remote node)
Co-authored-by: Siwan Kim <siwan.kim@navercorp.com>
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
* chore(deps): update quay.io/cilium/cilium-envoy docker tag to v1.36.5-1770440937-14da6b9c8a54244f0a67cd90a0deb83e5f110a4a
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update module sigs.k8s.io/kube-api-linter to v0.0.0-20260206102632-39e3d06a2850
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update base-images
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* images: update cilium-{runtime,builder}
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
* loadbalancer/maps: let SourceRangeKey.GetCIDR return netip.Prefix
Its only caller (apart from use in log strings) directly converts the
result to a netip.Prefix. Rather than first constructing a *cidr.CIDR
only to then convert it to netip.Prefix, construct a netip.Prefix right
away. Also rename the method accordingly.
This slightly improves pkg/loadbalancer/benchmark results:
Before:
Memory statistics from N=10 iterations:
Min: Allocated 490378kB in total, 2471963 objects / 140742kB still reachable (per service: 49 objs, 10042B alloc, 2882B in-use)
Avg: Allocated 501584kB in total, 2957195 objects / 177049kB still reachable (per service: 59 objs, 10272B alloc, 3625B in-use)
Max: Allocated 543366kB in total, 3722287 objects / 227653kB still reachable (per service: 74 objs, 11128B alloc, 4662B in-use)
After:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service: 45 objs, 10018B alloc, 2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service: 57 objs, 10256B alloc, 3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service: 74 objs, 11121B alloc, 4660B in-use)
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* loadbalancer/maps: let ServiceKey.GetAddress return netip.Addr
This simplifies call sites and avoids an unnecessary conversion in
the reconciler's (*BPFOps).pruneServiceMaps method.
pkg/loadbalancer/benchmark results:
Before:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service: 45 objs, 10018B alloc, 2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service: 57 objs, 10256B alloc, 3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service: 74 objs, 11121B alloc, 4660B in-use)
After:
Memory statistics from N=10 iterations:
Min: Allocated 487023kB in total, 1944085 objects / 99675kB still reachable (per service: 38 objs, 9974B alloc, 2041B in-use)
Avg: Allocated 498657kB in total, 2813561 objects / 166829kB still reachable (per service: 56 objs, 10212B alloc, 3416B in-use)
Max: Allocated 542794kB in total, 3722246 objects / 227646kB still reachable (per service: 74 objs, 11116B alloc, 4662B in-use)
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* loadbalancer/maps: remove unused SkipLBMap delete methods
The DeleteLB{4,6}By* methods don't have callers anymore since
commit 6fa7f8129d1a ("loadbalancer/legacy: Remove the old
control-plane"). Remove them.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* fix(deps): update all go dependencies main
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* docs: Update docsearch to v4.5.4
Pull in the latest theme with newer docsearch plugin version.
Signed-off-by: Joe Stringer <joe@cilium.io>
* ci: update docs-builder
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
* Use binary.NativeEndian instead of nl.NativeEndian
Use the NativeEndian native-endian var provided by the Go standard
library encoding/binary package instead of the version from the
netlink/nl package.
While at it, also check the length of the handle returned by
unix.NamtToHandleAt before accessing it.
Follow-up to commit 2c1b49cac70b ("byteorder: use binary.NativeEndian")
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* datapath: fix panic during datapath reinitialization
This commit fixes a cilium-agent panic during datapath reinitialization
when a DirectRouting device is required but not configured.
This can happen when the direct routing device drops, for example during
networkd restart.
```
time=2026-01-14T07:39:46.444386888Z level=info msg="Devices changed" module=agent.datapath.devices-controller devices=[]
time=2026-01-14T07:39:46.444654289Z level=info msg="Fallback node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.44474159Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.444833191Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="" device=eth0
panic: runtime error: index out of range [3] with length 0
goroutine 415 [running]:
github.com/cilium/cilium/pkg/byteorder.NetIPv4ToHost32({0x0?, 0xc000e9e5d0?, 0x49e07bb?})
/go/src/github.com/cilium/cilium/pkg/byteorder/byteorder.go:15 +0x65
github.com/cilium/cilium/pkg/datapath/linux/config.(*HeaderfileWriter).WriteNodeConfig(0xc0004280e0, {0x7ff1517a6ba8, 0xc0023ec400}, 0xc001d00508)
/go/src/github.com/cilium/cilium/pkg/datapath/linux/config/config.go:150 +0xa4b
github.com/cilium/cilium/pkg/datapath/loader.hashDatapath({0x50fbfb0, 0xc0004280e0}, 0xc001d00508)
/go/src/github.com/cilium/cilium/pkg/datapath/loader/hash.go:20 +0x9e
github.com/cilium/cilium/pkg/datapath/loader.(*objectCache).UpdateDatapathHash(0xc001d027d0, 0xc001422870?)
/go/src/github.com/cilium/cilium/pkg/datapath/loader/cache.go:62 +0x4d
github.com/cilium/cilium/pkg/datapath/loader.(*loader).Reinitialize(0xc002573580, {0x50f9c98, 0xc0008474d0}, 0xc001d00508, {{0x49d15b6, 0x4}, {0x0, 0x0}, 0x0, 0x0, ...}, ...)
/go/src/github.com/cilium/cilium/pkg/datapath/loader/base.go:377 +0x3c8
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reinitialize(0xc001d36288, {0x50f9c98?, 0xc0008474d0?}, {{0x0?, 0x0?}, 0x0?}, 0xc001d00508)
/go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:275 +0x110
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reconciler(0xc001d36288, {0x50f9c98, 0xc0008474d0}, {0x5104260, 0xc002feafc0})
/go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:219 +0x6fd
github.com/cilium/hive/job.(*jobOneShot).start(0xc002082e40, {0x50f9c98, 0xc0008474d0}, 0xc00143dce4?, {0x5104260, 0xc002082de0}, {{{0x0, 0x0, 0x0}}, 0xc001791770, ...})
/go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd
created by github.com/cilium/hive/job.(*queuedJob).Start.func1 in goroutine 1
/go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:126 +0x16f
```
With the change in this commit when a direct routing device is not found
datapath orchestrator will log a warning and wait for device updates in
the reconciliation loop, skipping reinitialization.
Fixes 8fae439710fd4b426beae9190957d2380a96bed6 ("datapath: move DirectRoutingDevice validation to orchestrator")
Signed-off-by: Deepesh Pathak <deepeshpathak09@gmail.com>
* datapath/loader: Add netkit to BPF load tests
Commit 854473726b ("bpf: Workaround for netkit + L7 policy redirect failure")
introduced the enable_netkit load-time config variable.
This commit introduces that variable to BPF loader permutation testing
for bpf_lxc.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* docs: add netkit requirement to kernel version list
Add Linux kernel requirement for netkit to the System Requirements.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* style(bpf/test): fix indentation
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* reafctor(bpf): move `icmp_wsum_accumulate` helper
This helper is shared by ICMPv6 and ICMPv4 and will be imported by both
in future refactors.
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* refactor(bpf): move ICMPv6 packet generation to a separate file
The goal is to make these functions reusable for the ICMPv6 policy
denial feature, so moving them to a shared icmp6.h file
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* refactor(bpf): reduce ifdef number
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* gateway-api: Update conformance test Make target
This updates the Gateway API conformance test Make target
to make it not require any extra setup when run as part
of local development with Kind, and adds the ability
to set which tests to run, to allow for focussed
conformance test runs.
Signed-off-by: Nick Young <nick@isovalent.com>
* bpf: introduce DECLARE_CONFIG_KIND
DECLARE_CONFIG and NODE_CONFIG only differ in the value of their
respective kind: tag. Avoid code duplication by moving the common parts
to a separate DECLARE_CONFIG_KIND macro and use it to define
DECLARE_CONFIG and NODE_CONFIG. This also allows easier downstream use
of these macros.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* bpf: wire events map rate limits through node config
Move events map rate/burst limits into node config and read them via
CONFIG(events_map_{rate,burst}_limit) in BPF helpers. This drops the
compile-time EVENTS_MAP* defines from the header writer and cleans up
the legacy defaults in bpf/node_config.h.
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
* sysdump: Use label selectors for Hubble UI/Relay deployment collection
Previously, sysdump collected Hubble UI and Hubble Relay deployments
using hardcoded deployment names ('hubble-ui', 'hubble-relay'). This
caused the collection to fail when users deployed Hubble components
with custom names (e.g., 'hubble-ui-blue'), even when the correct
labels were provided via --hubble-ui-labels or --hubble-relay-labels.
This change updates the deployment collection to use ListDeployment
with the configured label selectors instead of GetDeployment with
hardcoded names. This makes deployment collection consistent with
pod log collection, which already uses label selectors.
Fixes the issue where:
cilium sysdump --hubble-ui-labels k8s-app=hubble-ui-blue
would still fail with 'Deployment hubble-ui not found'.
Signed-off-by: darox <maderdario@gmail.com>
* bpf: lxc: remove unnecessary L3 validation
There's no code that uses the IPv4 header afterwards.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* bpf: lxc: fine-tune BPF Host Routing path
Using fib_ok() to evaluate the result of fib_redirect_v*() is a bit
awkward. We're in TC context, so we know that CTX_ACT_TX doesn't need to
be handled (and would most likely lead to a packet loop).
By structuring the code as a switch() statement we can also clean up one
of the goto paths.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* bpf: xdp: prefer CTX_ACT_TX over XDP_TX
Return the generic value, so that readers understand what macro they should
be using when handling the result.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* bpf, nat46x64: move RFC6052 prefix into node config
This commit moves NAT 46x64 RFC6052 prefix bytes into the node
configuration so BPF programs consume CONFIG(nat_46x64_prefix) value
instead of C header defines.
The Go datapath now populates this field from the NAT46x64 config, and
the headerfile writer no longer emits NAT_46X64_PREFIX_ defines.
Updates included:
- BPF nat_46x64 helpers switched to CONFIG(nat_46x64_prefix).
- Node config population moved to runtime config and legacy defines
dropped.
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
* neighbor: Fix description for L2 neighbor discovery
The flag is not used by IPsec. L2 neighbor discovery is enabled whenever
XDP is enabled. This flag allows users to enable it even if XDP is
disabled, so let's state that instead of mentioning IPsec.
Co-authored-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
* CODEOWNERS: add more specific owners for operator subsystems
Some of the operator's subsystem are currently only covered by the
catch-all rule assinging @cilium/operator. Instead, assign more specific
teams for certain subsystems which require more in-depth knowledge in
these particular areas.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* .github/workflows: eks-cluster-pool-manager: fix race condition and cleanup leaked IAM roles
When multiple parallel jobs generate cluster names within the same
second, they can produce identical names since the timestamp has only
1-second precision. This causes CloudFormation stack creation to fail
with "AlreadyExistsException", leaving orphaned IAM roles behind.
This commit adds a random suffix to cluster names to prevent race conditions
and enhances the failure cleanup step to delete CloudFormation stacks and orphaned
IAM roles when cluster creation fails
Signed-off-by: André Martins <andre@cilium.io>
* hubble: Fix typos in config/set.go
Signed-off-by: harshitghagre <harshitghagre183@gmail.com>
* test/helpers: ignore error creating lease lock message
This message was modified in k8s 1.35.0, therefore we should update the
list of messages that can be ignored in our CI.
Signed-off-by: André Martins <andre@cilium.io>
* Fix backend slot index mismatch in LB reconciler
Use slotID instead of loop index when setting backend slots to avoid
gaps when maintenance backends are skipped.
Signed-off-by: Aman-Cool <aman017102007@gmail.com>
* vendor: Bump to StateDB v0.6.3
This restores the ability to use Changes() against a WriteTxn that is
targeting the same table as Changes().
Signed-off-by: Jussi Maki <jussi@isovalent.com>
* docs: Fix upgrade note category for tproxy
There's no new option here, it's a change in behaviour. Fix the category
for the upgrade note.
CC: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Fixes: c257b3d0aca8 ("datapath/connector: do not support netkit and bpf.tproxy=true")
Signed-off-by: Joe Stringer <joe@cilium.io>
* policy: Fix PASS verdict for non-consecutive tiers
Signed-off-by: Blaz Zupan <blaz@google.com>
* loadbalancer/healthserver: refresh ProxyRedirect per request
This commit fixes stale ProxyRedirect reads in the health server by reloading Service
state from the services table on each request. This prevents incorrect
local endpoint counts when Envoy redirect state changes after the
listener is created (which is the case).
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* localnodestore: provide WaitForNodeInformation
This commit moves global function `k8s.WaitForNodeInformation`
into the `localNodeSynchronizer`. To prevent dependencies to the
synchronizer, the `LocalNodeStore` re-exposes the same method and
delegates the call to the synchronizer.
This way, the legacy daemon initialization logic no longer needs to
dependent on the (k8s/cilium)-node resources.
This step isn't final. Eventually, the wait logic should watch
the state db "local node" for the changes.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* localnodestore: pass localnodestore to synchronizer
With this commit, the `LocalNodeStore` passes a pointer
to itself to the synchronizer when calling `WaitForNodeInformation`.
This way, the synchronizer can update the ip allocation ranges without
using the global functions.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* ci: e2e: add `kernel` to workflow job names
As a follow-up of changes introduced in https://github.com/cilium/cilium/pull/44126
to improve the readability of workflow job names in the GitHub UI, this
commit adds the `kernel` parameter to the name too. In OpenSearch,
it would be good to differentiate these jobs by the used kernel.
Given the kernel is a parameter that may vary frequently, it is good to be
able to differentiate runs that used the same configuration but different
kernels. That's also good in case of regressions when changing kernel.
The result will be `Setup & Test (ipsec-1, minor, 5.10)`.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* pkg/datapath/bandwidth: optimize host endpoint QoS setup
The bandwidth manager was calling node.GetEndpointID() and performing
a table lookup on every UpdateBandwidthLimit() call, even though the
host endpoint QoS only needs to be set up once.
This commit introduces an atomic boolean flag that tracks whether the
host endpoint QoS has been configured. After the initial setup:
- Skip the GetEndpointID() call (fast path)
- Skip the Table.Get() call entirely
- Skip the redundant Insert() call for host endpoint
Additionally, based on review feedback:
- Change node.GetEndpointID() to return (uint64, bool) to indicate
whether the host endpoint ID has been set, avoiding duplicate
constants across packages
- Change node.endpointID to atomic.Uint64 for thread-safe access
during initialization
Signed-off-by: Anand Kumar Shaw <anandkrshawheritage@gmail.com>
* clustermesh: fix a few misc issue with MCS-API doc
This commit fixes two issue with mcs-api doc:
- a simple typo on the enabled keywork
- change the code-block in parsed-literal as the |SCM_WEB| "variable"
was not evaluated/replaced in the final doc with a code-block
Signed-off-by: Arthur Outhenin-Chalandre <git@mrfreezeex.fr>
* Docs: improve docs around ipsec upgrade in 1.18
Signed-off-by: darox <maderdario@gmail.com>
* docs(ztunnel): fix duplicate word (a set)
Signed-off-by: Alexis La Goutte <alexis.lagoutte@gmail.com>
* docs(ztunnel): add missing backslash
add missing backslash for install with Cilium CLI
Signed-off-by: Alexis La Goutte <alexis.lagoutte@gmail.com>
* clustermesh: helm: remove clustermesh.enableMCSAPISupport
This option was deprecated and its removal was announced for Cilium
1.20, so let's drop it now that we are in the 1.20 cycle!
Signed-off-by: Arthur Outhenin-Chalandre <git@mrfreezeex.fr>
* daemon: enforce iptable rules are present with node-port is enabled
Moving forward the node-port code path will rely on iptables to
identify locally generated traffic and create NAT reservations for this
traffic in order to avoid reservation conflicts.
We will consider it a fatal error moving forward to disable iptable rule
installation while node-port or eBPF masquerading is enabled.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* bpf,nodeport: generalize SNAT conflict detection
Prior to this commit the nodeport SNAT conflict detection assumed host
traffic was sourced from the direct routing interface's IP and always
egressed on this device.
Remove this assumption by detecting locally generated traffic from any
egress interface 'bpf_host' is attached to.
This removes the dependency on the direct routing interface in the
node-port path.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* ztunnel: introduce end to end connectivity tests
The test infrastructure deploys three dedicated namespaces with client
and echo-server pods in each: two enrolled namespaces for testing
cross-namespace mTLS scenarios and one unenrolled namespace for baseline
verification. Pod affinity rules ensure echo-same-node pods co-locate
with clients while echo-other-node pods schedule on different nodes,
enabling both intra-node and inter-node traffic validation.
The test scenarios verify ztunnel mTLS behavior by dynamically labeling
namespaces with the io.cilium/mtls-enabled label during test execution.
For enrolled pod pairs, the tests assert that traffic flows through port
15008 (the ztunnel HBONE proxy) and that no unencrypted traffic appears
on port 8080. For unenrolled pods, the assertions are inverted to confirm
traffic bypasses ztunnel entirely. Cross-namespace scenarios confirm that
mTLS works correctly when client and server reside in separately enrolled
namespaces.
Packet capture validation runs from host network pods using tcpdump,
which is required since ztunnel operates in the host network namespace.
The tests query the ztunnel admin API to verify workload registration
before generating traffic, ensuring the any state the test depends on
has converged before making assertions.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
Signed-off-by: Quang Nguyen <nguyenquang@microsoft.com>
Signed-off-by: Robin Gögge <r.goegge@isovalent.com>
* ci,ztunnel: add workflows for ztunnel encryption tests
Add a new GitHub Actions workflow to run end-to-end tests for ztunnel
encryption in Cilium.
The new /ci-ztunnel-e2e trigger is added to the Ariane configuration,
pointing to the newly created conformance-ztunnel-e2e.yaml workflow
file.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* ci,ztunnel: add ztunnel cert script to actions
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* datapath: remove GetRoutePostEncryptMTU()
The last user was removed by
f81d8964ce4a ("ipsec: don't reduce post-encrypt MTU for tunnel overhead").
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* datapath: ipsec: remove clean up code for encrypt IP rule
https://github.com/cilium/cilium/pull/41699 stopped installing this IP
rule in the v1.19 release. With v1.20 we can now stop cleaning it up.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* loadbalancer/api: include proxy-redirect as backend
Currently, listing services via `cilium-dbg service list` displays
no backends for the services that use L7 proxy redirection (e.g.
Gateway API / Ingress loadbalancer service).
```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID Frontend Service Type Backend
1 10.96.62.28:80/TCP ClusterIP 1 => 10.244.1.149:4245/TCP (active)
2 10.96.0.10:53/TCP ClusterIP 1 => 10.244.1.69:53/TCP (active)
2 => 10.244.1.109:53/TCP (active)
3 10.96.0.10:53/UDP ClusterIP 1 => 10.244.1.69:53/UDP (active)
2 => 10.244.1.109:53/UDP (active)
4 10.96.0.10:9153/TCP ClusterIP 1 => 10.244.1.69:9153/TCP (active)
2 => 10.244.1.109:9153/TCP (active)
5 10.96.0.1:443/TCP ClusterIP 1 => 172.19.0.2:6443/TCP (active)
12 10.96.162.7:443/TCP ClusterIP 1 => 172.19.0.2:4244/TCP (active)
15 10.96.106.135:80/TCP ClusterIP 1 => 10.244.1.115:80/TCP (active)
16 10.96.55.103:80/TCP ClusterIP 1 => 10.244.1.67:80/TCP (active)
17 [::]:30965/TCP NodePort
19 [::]:30965/TCP/i NodePort
21 0.0.0.0:30965/TCP NodePort
23 0.0.0.0:30965/TCP/i NodePort
25 10.96.245.249:80/TCP ClusterIP
26 172.19.255.1:80/TCP LoadBalancer
27 172.19.255.1:80/TCP/i LoadBalancer
28 [fd00:10:96::d99f]:80/TCP ClusterIP
```
Even though the actual redirection happens via BPF/iptables TPROXY, it would be
nice to display the information about the local redirection to the node-local
L7 proxy (Envoy) in some form.
Therefore, this commit changes the frontend model generation to treat the
proxy redirection as backend of the service frontend in the form
`<localhost-addr>:<proxy-port>/<proto> (active)`
Result
```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID Frontend Service Type Backend
1 10.96.62.28:80/TCP ClusterIP 1 => 10.244.1.149:4245/TCP (active)
2 10.96.0.10:53/TCP ClusterIP 1 => 10.244.1.69:53/TCP (active)
2 => 10.244.1.109:53/TCP (active)
3 10.96.0.10:53/UDP ClusterIP 1 => 10.244.1.69:53/UDP (active)
2 => 10.244.1.109:53/UDP (active)
4 10.96.0.10:9153/TCP ClusterIP 1 => 10.244.1.69:9153/TCP (active)
2 => 10.244.1.109:9153/TCP (active)
5 10.96.0.1:443/TCP ClusterIP 1 => 172.19.0.2:6443/TCP (active)
12 10.96.162.7:443/TCP ClusterIP 1 => 172.19.0.2:4244/TCP (active)
15 10.96.106.135:80/TCP ClusterIP 1 => 10.244.1.115:80/TCP (active)
16 10.96.55.103:80/TCP ClusterIP 1 => 10.244.1.67:80/TCP (active)
17 [::]:30965/TCP NodePort 1 => [::1]:14543/TCP (active)
19 [::]:30965/TCP/i NodePort 1 => [::1]:14543/TCP (active)
21 0.0.0.0:30965/TCP NodePort 1 => 127.0.0.1:14543/TCP (active)
23 0.0.0.0:30965/TCP/i NodePort 1 => 127.0.0.1:14543/TCP (active)
25 10.96.245.249:80/TCP ClusterIP 1 => 127.0.0.1:14543/TCP (active)
26 172.19.255.1:80/TCP LoadBalancer 1 => 127.0.0.1:14543/TCP (active)
27 172.19.255.1:80/TCP/i LoadBalancer 1 => 127.0.0.1:14543/TCP (active)
28 [fd00:10:96::d99f]:80/TCP ClusterIP 1 => [::1]:14543/TCP (active)
```
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* datapath/ipcachelistener: use injected localnodestore
This commit refactors the datapath ipcache BPF listener to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.
Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* datapath/linuxnodehandler: retrieve node ips from localnodestore
This commit refactors the `linuxNodeHandler` to fetch the node ips
from the injected `LocalNodeStore` instead of using the global
functions `node.GetIP[v4/v6]`.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* identity/cache: use injected localnodestore
This commit refactors the identity cache to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.
Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* node/address: remove global functions `GetIP[v4/v6]`
This commit removes the unused global functions `GetIPv4` & `GetIPv6`.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* test: remove K8sDatapathBandwidthTest
The Ginkgo bandwidth test was already not running in CI. It only ran on
net-next kernels, but the focus group that included it
(f10-agent-hubble-bandwidth) was excluded from the net-next job.
Bandwidth enforcement is independently tested in
cilium-cli/connectivity/builder/network_bandwidth_limit.go.
Fixes: #44165
Related: #37837
Signed-off-by: Pavan More <pavansmore05@gmail.com>
* wireguard: remove cleanup code for old userspace devices
2578e8ff4dfa ("wireguard: remove deprecated userspace fallback") removed
the userspace mode in v1.17, we can now stop cleaning up any network device
created for that mode.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* loadbalancer: Check for equality and skip insert when not changed
This adds DeepEqual() implementations for Service and FrontendParams and uses
it to skip inserting the service and updating the frontend if nothing has changed.
As we now don't forcefully update the frontend the orphan detection has to actually
build a set to check if a frontend has been removed.
Benchmarks before:
Benchmark_UpsertServiceAndFrontends_100-6 3549 317691 ns/op 314771 objects/sec
BenchmarkInsertBackend-6 2818 423975 ns/op 235863 objects/sec
BenchmarkReplaceBackend-6 326682 3793 ns/op 263669 objects/sec
BenchmarkReplaceService-6 2327074 509.4 ns/op 1963230 objects/sec
After:
Benchmark_UpsertServiceAndFrontends_100-6 3464 331791 ns/op 301395 objects/sec
Benchmark_UpsertServiceAndFrontends_100_Unchanged-6 14652 81250 ns/op 1230766 objects/sec
BenchmarkInsertBackend-6 2956 401100 ns/op 249315 objects/sec
BenchmarkReplaceBackend-6 3402430 360.9 ns/op 2771038 objects/sec
BenchmarkReplaceService-6 2068555 556.6 ns/op 1796743 objects/sec
Signed-off-by: Jussi Maki <jussi@isovalent.com>
* loadbalancer: Remove dummy ingress endpoint workaround
Remove the skipping of the 192.192.192.192:9999 dummy ingress
endpoint as the next commit will remove the creation of it.
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* operator/helm: Remove creation of dummy ingress endpoint
With the new agent load-balancer implementation the dummy endpoint
is no longer necessary. Now that v1.18 has been released with that
implementation we can remove the creation of the dummy endpoint
without breaking upgrade.
Note: This commit removes the dummy endpoint in the Operator-
(Gateway API & dedicated Ingress) & Helm- (shared Ingress) managed
`EndpointSlice`'s. The creation of the `EndpointSlice` itself will
will be done in a separate PR.
Fixes: #19262
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* monitor: report 3rd argument in DBG_GENERIC debug monitor messages
Currently, in case cilium_dbg3(ctx, DBG_GENERIC, arg1, arg2, arg3) is used in
datapath code, the 3rd argument will not be reported in the monitor
message, e.g.:
bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac)
Also report the 3rd argument, so the monitor message will look as
follows:
bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac) arg3=170 (0xaa)
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* policy: cleanup label selector parsing and validation
This commit cleans up the parsing and validation of policy label
selectors by removing the dependency on k8s labels library for
validation.
With this change we no longer convert cilium representation
of label keys with source prefixed using ':' delimiter to k8s specific
representation with source prefixed using '.' for validation.
This avoids the additional complexity in policy code to convert between
the two representations. It also simplifies marshalling of selectors as
the objects now have a unified format.
This is not a functional change and does not have any associated user
impact.
Signed-off-by: Deepesh Pathak <deepesh.pathak@isovalent.com>
* helm/ztunnel: bind health check to localhost
Security hardening for ztunnel running with hostNetwork: true:
Add host field to readiness probe to bind the health check port 15021
to 127.0.0.1 instead of 0.0.0.0. This reduces attack surface by ensuring
the health check endpoint is only accessible from localhost (kubelet
runs on same node).
Signed-off-by: Quang Nguyen <nguyenquang@microsoft.com>
* ci:wireguard: enable Host Firewall in native routing e2e tests
This enabled Host Firewall in the `wireguard-3` config.
This helps us validating that host-related packets go through the host
firewall hook in `cilium_host` when WireGuard is enabled in native routing.
In overlay, even if we'd miss a redirect we'd see the packet in bpf_overlay,
which will then redirect the packet to bpf_host for HostFW validation.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* mcsapi: Add namespace filtering conditions to ServiceImport controller
Add namespace global status checking to the ServiceImport controller:
- Set Invalid condition on ServiceExport when namespace is not global
- Set NamespaceNotGlobal condition on ServiceImport when namespace is not global
- Signal service controller to skip/delete derived Service for non-global namespaces
by setting SupportedIPFamilies annotation to empty
This implements the MCS-API portion of the ClusterMesh global namespace
filtering feature as described in CFP-39876.
Signed-off-by: Jacques Massa <jac.massa0908@gmail.com>
* docs: split up network policy language page
Up until now, the page title had been "Layer 3 Examples", which is
a section headline and confusing, since it is among other examples.
Splitting up into several pages, similar to `network/kubernetes/`,
keeps the ToC as it is, and makes it easier to navigate compared to
the lengthy page it was, while also giving each page a suitable
headline.
Since examples are mixed with the language specification, change
headings from "Layer 3 Examples" to "Layer 3 Policies", etc.
Drop the old page and redirect to the overview to keep links working.
Signed-off-by: Daniel Maslowski <info@orangecms.org>
* golangci-lint: fix and simplify golangci-lint.sh
golangci-lint.sh would always rebuild the binary. The script was over-engineered
and didn't use the `golangci-lint version` subcommand, which doesn't need the
version string parsing logic. Simplify the script and fix the rebuild trigger.
Prepare the directory structure for splitting the main .golangci-lint.yaml in a
subsequent commit.
Signed-off-by: Timo Beckers <timo@isovalent.com>
* golangci-lint: split kubeapi configuration into separate file
The addition of kubeapilinter forced all users of the main .golangci.yaml to be
running a custom-built version of the tool and make sure it's kept in sync.
VSCode runs `golangci-lint run` in the repo by default and requires explicit
configuration to use another tool, like a script in the Cilium repository.
Before this PR, the script was broken and would always rebuilt custom-gcl.
Breaking the default tooling is not great. This commit moves kubeapi-specific
configuration to a dedicated config file in tools/golangci-lint-kubeapi and
specifies it explicitly when starting the kubeapi linter.
Signed-off-by: Timo Beckers <timo@isovalent.com>
* policy: Import the ClusterNetworkPolicy v1alpha2 API go modules
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Add parser function for Cluster Network Policy
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Add flags to enable or disable Cluster Network Policy
Disabled by default. A new Makefile target is added that enables it in kind clusters.
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Add watcher for Cluster Network Policy
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Move NetworkPolicy and CNP/CCNP to the 'Normal' tier
Signed-off-by: Blaz Zupan <blaz@google.com>
* CODEOWNERS: re-assign operator/identitygc, adjust endpoint team description
Assign operator/identitygc to @cilium/sig-policy instead of
@cilium/endpoint based on Joe's feedback. This seems like the more
appropriate team based on the description.
Also update the description for the @cilium/endpoint team to cover
responsibilities for cluster-wide/k8s considerations and endpoint
lifecycle.
Follow-up to commit 11376d5c715b ("CODEOWNERS: add more specific owners
for operator subsystems").
Ref. https://github.com/cilium/cilium/pull/44279#pullrequestreview-3780398303
Suggested-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* nodemap: converted net.IP to netip.Addr, Part of #24246
- Converted net.IP to netip.Addr in node map functions
- Added validation for zero netip.Addr values in newNodeKey
- Wrapped parse errors with %w for better error context
Signed-off-by: Sanjeevliv <sanjeevsethilive@gmail.com>
* bpf/tests: fix byte ordering for for TCP seq/win values
The pktgen logic for BPF tests sets default values on TCP packets in
host byte order only. This feeds into the TCP checksum calculation
and while these inputs are technically incorrect, tests that assert
the TCP checksum are correctly testing the invalid output values.
With the introduction of scapy packet definitions, matching these values
to existing tests is difficult because scapy does convert seq and win
values into network byte order. The effect is that L4 sequence numbers
between pktgen and scapy are wrong.
This commit updates pktgen and existing tests that assert these values
to use the appropriate byte conversion routines so that both pktgen
and scapy definitions produce the same results.
This causes all affected BPF tests to fail. This will be addressed
in the next commit.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* bpf/tests: fix TCP checksum assertions in all tests
This commit updates every test that asserts TCP checksums to reflect new
values following the change to the default TCP seq/win values, such that
the assertions for this should succeed with a scapy packet definition.
As part of this change, test failure logs have been updated to correctly
show value observed in a packet, and the expected value, both in network
byte order. This is to simplify future debugging. This has also been
applied to UDP checksum assertions for consistency.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* bpf/tests: fix default_data definition for scapy tests
The pktgen code defines default_data macro as a string literal which
includes a NUL-terminating character. The size of this string is 20
bytes.
The scapy code defines the same value without the NUL-terminating
character. The size of this string is 19 bytes.
As a result, when converting a test from pktgen to scapy, the packet
length changes, which causes assertion failures beause the change leads
to different L3 checksum values.
This commit modifies the scapy default_data variable to include a NUL
terminating character to match pktgen definitions. It is anticipated
this can be removed once all BPF tests are migrated to scapy.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* endpoint, fqdn: remove restoration of deprecated V1 DNSRules
Now that 1.16 is EOL, the code restoring PortProto V1 (without protocol)
can be removed.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* endpoint: rename DNS rules field
The V2 suffix is unnecessary now that we stop restoring V1 rules and
there is only a single supported version/field.
Also add a test that verifies that V2 rules are restored and V1 rules
are ignored while retaining backwards compatibility.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* tests: Ignore identity manager related error in versions < 1.18
Due to https://github.com/cilium/cilium/pull/42661 and
https://github.com/cilium/cilium/pull/42662 not being backported yet to
v1.17, CI fails in the upgrade/downgrade test with this error.
Therefore, we must add it to the ignore list until the PRs are at least
backported to v1.17.
The error was removed from the ignore list in
https://github.com/cilium/cilium/pull/42982.
Suggested-by: Marco Iorio <marco.iorio@isovalent.com>
Suggested-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
* metrics: remove agent bootstrap metrics
This commit removes the deprecated agent bootstrap metrics.
Hive job metrics can be used to inspect the duration of the legacy
daemon initialization job. In special cases it might be worth to
introduce context specific metrics - similar to what has been
introduced for the endpoint restoration logic.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* policy: fix policy tests
This fixes a policy break due to how label source is handled that
recently changed.
Fixes: 270daa4067 ("policy: Add parser function for Cluster Network Policy")
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
* resource/test: let TestResource_WithFakeClient set resource version
Update the TestResource_WithFakeClient test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* cid/test: let TestUpdatePodLabels set resource version
Update the TestUpdatePodLabels test to correctly specify the expected
resource version during updates, in preparation for extending the fake
client to actually enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* ipam/test: let Test_LocalNodeCIDRsSyncer set resource version
Update the Test_LocalNodeCIDRsSyncer test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* bgp/test: correctly set resource version when updating test resources
Update the bgp tests to correctly specify the expected resource version
during updates, in preparation for extending the fake client to actually
enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* test/controlplane: adaptation for optimistic concurrency control
Update the UpdateObjects helper to use the [ObjectTracker.Patch],
instead of [ObjectTracker.Update], in preparation for the subsequent
commit that will make the latter implement optimistic concurrency
control, and validate resource version mismatches, which is not
required in this context.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* k8s/client/fake: fix resource version configuration in tracker
Currently, the object tracker is affected by a bug that causes the
resource version to not be set on creation or update if the object
does not have the [metav1.TypeMeta] set. Indeed, in that case, the
function updating the TypeMeta creates a deep copy of the object,
causing operations performed via [meta.Accessor] to act on the old
copy, and not have effect. Let's get this fixed by changing the
[fillTypeMetaIfNeeded] function to not create a deep copy, given
that it already operates on a copy of the original object.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* k8s/client/fake: let update operations respect resource versioning
Currently, the statedb object tracker backing the fake kubernetes client
used for testing purposes does not respect resource versioning, and allows
update operations to succeed regardless of the provided resource version.
While convenient for the `k8s/update` command itself, this approach is
problematic in case of controllers acting on the same resources, as it
can lead to objects being unexpectedly reverted to incorrect versions,
due to the missing optimistic concurrency control.
Let's get this fixed by extending the update implementation to additionally
compare the resource version of the stored and provided objects, and reject
the update in case they do not match, as the real Kubernetes API Server
would do. By default, the k8s/update command still ignores the provided
resource version, letting the update succeed regardless: this matches the
desired behavior in the vast majority of the tests, and avoids the need
for complex operations to set the expected resource version. Still, if
necessary, the stricter behavior can be enabled via the dedicated flag.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* chore(deps): update base-images to v1.26.0
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* images: update cilium-{runtime,builder}
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
* chore(deps): update cilium/cilium-cli action to v0.19.1
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all github action dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update gcr.io/etcd-development/etcd docker tag to v3.6.8
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all lvh-images main
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all github action dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all-dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.gi…
Due to cilium#42661 and cilium#42662 not being backported yet to v1.17, CI fails in the upgrade/downgrade test with this error. Therefore, we must add it to the ignore list until the PRs are at least backported to v1.17. The error was removed from the ignore list in cilium#42982. Suggested-by: Marco Iorio <marco.iorio@isovalent.com> Suggested-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
* chore(deps): update module sigs.k8s.io/kube-api-linter to v0.0.0-20260206102632-39e3d06a2850
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update base-images
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* images: update cilium-{runtime,builder}
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
* loadbalancer/maps: let SourceRangeKey.GetCIDR return netip.Prefix
Its only caller (apart from use in log strings) directly converts the
result to a netip.Prefix. Rather than first constructing a *cidr.CIDR
only to then convert it to netip.Prefix, construct a netip.Prefix right
away. Also rename the method accordingly.
This slightly improves pkg/loadbalancer/benchmark results:
Before:
Memory statistics from N=10 iterations:
Min: Allocated 490378kB in total, 2471963 objects / 140742kB still reachable (per service: 49 objs, 10042B alloc, 2882B in-use)
Avg: Allocated 501584kB in total, 2957195 objects / 177049kB still reachable (per service: 59 objs, 10272B alloc, 3625B in-use)
Max: Allocated 543366kB in total, 3722287 objects / 227653kB still reachable (per service: 74 objs, 11128B alloc, 4662B in-use)
After:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service: 45 objs, 10018B alloc, 2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service: 57 objs, 10256B alloc, 3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service: 74 objs, 11121B alloc, 4660B in-use)
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* loadbalancer/maps: let ServiceKey.GetAddress return netip.Addr
This simplifies call sites and avoids an unnecessary conversion in
the reconciler's (*BPFOps).pruneServiceMaps method.
pkg/loadbalancer/benchmark results:
Before:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service: 45 objs, 10018B alloc, 2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service: 57 objs, 10256B alloc, 3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service: 74 objs, 11121B alloc, 4660B in-use)
After:
Memory statistics from N=10 iterations:
Min: Allocated 487023kB in total, 1944085 objects / 99675kB still reachable (per service: 38 objs, 9974B alloc, 2041B in-use)
Avg: Allocated 498657kB in total, 2813561 objects / 166829kB still reachable (per service: 56 objs, 10212B alloc, 3416B in-use)
Max: Allocated 542794kB in total, 3722246 objects / 227646kB still reachable (per service: 74 objs, 11116B alloc, 4662B in-use)
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* loadbalancer/maps: remove unused SkipLBMap delete methods
The DeleteLB{4,6}By* methods don't have callers anymore since
commit 6fa7f8129d1a ("loadbalancer/legacy: Remove the old
control-plane"). Remove them.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* fix(deps): update all go dependencies main
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* docs: Update docsearch to v4.5.4
Pull in the latest theme with newer docsearch plugin version.
Signed-off-by: Joe Stringer <joe@cilium.io>
* ci: update docs-builder
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
* Use binary.NativeEndian instead of nl.NativeEndian
Use the NativeEndian native-endian var provided by the Go standard
library encoding/binary package instead of the version from the
netlink/nl package.
While at it, also check the length of the handle returned by
unix.NamtToHandleAt before accessing it.
Follow-up to commit 2c1b49cac70b ("byteorder: use binary.NativeEndian")
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* datapath: fix panic during datapath reinitialization
This commit fixes a cilium-agent panic during datapath reinitialization
when a DirectRouting device is required but not configured.
This can happen when the direct routing device drops, for example during
networkd restart.
```
time=2026-01-14T07:39:46.444386888Z level=info msg="Devices changed" module=agent.datapath.devices-controller devices=[]
time=2026-01-14T07:39:46.444654289Z level=info msg="Fallback node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.44474159Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.444833191Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="" device=eth0
panic: runtime error: index out of range [3] with length 0
goroutine 415 [running]:
github.com/cilium/cilium/pkg/byteorder.NetIPv4ToHost32({0x0?, 0xc000e9e5d0?, 0x49e07bb?})
/go/src/github.com/cilium/cilium/pkg/byteorder/byteorder.go:15 +0x65
github.com/cilium/cilium/pkg/datapath/linux/config.(*HeaderfileWriter).WriteNodeConfig(0xc0004280e0, {0x7ff1517a6ba8, 0xc0023ec400}, 0xc001d00508)
/go/src/github.com/cilium/cilium/pkg/datapath/linux/config/config.go:150 +0xa4b
github.com/cilium/cilium/pkg/datapath/loader.hashDatapath({0x50fbfb0, 0xc0004280e0}, 0xc001d00508)
/go/src/github.com/cilium/cilium/pkg/datapath/loader/hash.go:20 +0x9e
github.com/cilium/cilium/pkg/datapath/loader.(*objectCache).UpdateDatapathHash(0xc001d027d0, 0xc001422870?)
/go/src/github.com/cilium/cilium/pkg/datapath/loader/cache.go:62 +0x4d
github.com/cilium/cilium/pkg/datapath/loader.(*loader).Reinitialize(0xc002573580, {0x50f9c98, 0xc0008474d0}, 0xc001d00508, {{0x49d15b6, 0x4}, {0x0, 0x0}, 0x0, 0x0, ...}, ...)
/go/src/github.com/cilium/cilium/pkg/datapath/loader/base.go:377 +0x3c8
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reinitialize(0xc001d36288, {0x50f9c98?, 0xc0008474d0?}, {{0x0?, 0x0?}, 0x0?}, 0xc001d00508)
/go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:275 +0x110
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reconciler(0xc001d36288, {0x50f9c98, 0xc0008474d0}, {0x5104260, 0xc002feafc0})
/go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:219 +0x6fd
github.com/cilium/hive/job.(*jobOneShot).start(0xc002082e40, {0x50f9c98, 0xc0008474d0}, 0xc00143dce4?, {0x5104260, 0xc002082de0}, {{{0x0, 0x0, 0x0}}, 0xc001791770, ...})
/go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd
created by github.com/cilium/hive/job.(*queuedJob).Start.func1 in goroutine 1
/go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:126 +0x16f
```
With the change in this commit when a direct routing device is not found
datapath orchestrator will log a warning and wait for device updates in
the reconciliation loop, skipping reinitialization.
Fixes 8fae439710fd4b426beae9190957d2380a96bed6 ("datapath: move DirectRoutingDevice validation to orchestrator")
Signed-off-by: Deepesh Pathak <deepeshpathak09@gmail.com>
* datapath/loader: Add netkit to BPF load tests
Commit 854473726b ("bpf: Workaround for netkit + L7 policy redirect failure")
introduced the enable_netkit load-time config variable.
This commit introduces that variable to BPF loader permutation testing
for bpf_lxc.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* docs: add netkit requirement to kernel version list
Add Linux kernel requirement for netkit to the System Requirements.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* style(bpf/test): fix indentation
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* reafctor(bpf): move `icmp_wsum_accumulate` helper
This helper is shared by ICMPv6 and ICMPv4 and will be imported by both
in future refactors.
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* refactor(bpf): move ICMPv6 packet generation to a separate file
The goal is to make these functions reusable for the ICMPv6 policy
denial feature, so moving them to a shared icmp6.h file
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* refactor(bpf): reduce ifdef number
Signed-off-by: Andrea Terzolo <andreaterzolo3@gmail.com>
* gateway-api: Update conformance test Make target
This updates the Gateway API conformance test Make target
to make it not require any extra setup when run as part
of local development with Kind, and adds the ability
to set which tests to run, to allow for focussed
conformance test runs.
Signed-off-by: Nick Young <nick@isovalent.com>
* bpf: introduce DECLARE_CONFIG_KIND
DECLARE_CONFIG and NODE_CONFIG only differ in the value of their
respective kind: tag. Avoid code duplication by moving the common parts
to a separate DECLARE_CONFIG_KIND macro and use it to define
DECLARE_CONFIG and NODE_CONFIG. This also allows easier downstream use
of these macros.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* bpf: wire events map rate limits through node config
Move events map rate/burst limits into node config and read them via
CONFIG(events_map_{rate,burst}_limit) in BPF helpers. This drops the
compile-time EVENTS_MAP* defines from the header writer and cleans up
the legacy defaults in bpf/node_config.h.
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
* sysdump: Use label selectors for Hubble UI/Relay deployment collection
Previously, sysdump collected Hubble UI and Hubble Relay deployments
using hardcoded deployment names ('hubble-ui', 'hubble-relay'). This
caused the collection to fail when users deployed Hubble components
with custom names (e.g., 'hubble-ui-blue'), even when the correct
labels were provided via --hubble-ui-labels or --hubble-relay-labels.
This change updates the deployment collection to use ListDeployment
with the configured label selectors instead of GetDeployment with
hardcoded names. This makes deployment collection consistent with
pod log collection, which already uses label selectors.
Fixes the issue where:
cilium sysdump --hubble-ui-labels k8s-app=hubble-ui-blue
would still fail with 'Deployment hubble-ui not found'.
Signed-off-by: darox <maderdario@gmail.com>
* bpf: lxc: remove unnecessary L3 validation
There's no code that uses the IPv4 header afterwards.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* bpf: lxc: fine-tune BPF Host Routing path
Using fib_ok() to evaluate the result of fib_redirect_v*() is a bit
awkward. We're in TC context, so we know that CTX_ACT_TX doesn't need to
be handled (and would most likely lead to a packet loop).
By structuring the code as a switch() statement we can also clean up one
of the goto paths.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* bpf: xdp: prefer CTX_ACT_TX over XDP_TX
Return the generic value, so that readers understand what macro they should
be using when handling the result.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* bpf, nat46x64: move RFC6052 prefix into node config
This commit moves NAT 46x64 RFC6052 prefix bytes into the node
configuration so BPF programs consume CONFIG(nat_46x64_prefix) value
instead of C header defines.
The Go datapath now populates this field from the NAT46x64 config, and
the headerfile writer no longer emits NAT_46X64_PREFIX_ defines.
Updates included:
- BPF nat_46x64 helpers switched to CONFIG(nat_46x64_prefix).
- Node config population moved to runtime config and legacy defines
dropped.
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
* neighbor: Fix description for L2 neighbor discovery
The flag is not used by IPsec. L2 neighbor discovery is enabled whenever
XDP is enabled. This flag allows users to enable it even if XDP is
disabled, so let's state that instead of mentioning IPsec.
Co-authored-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
* CODEOWNERS: add more specific owners for operator subsystems
Some of the operator's subsystem are currently only covered by the
catch-all rule assinging @cilium/operator. Instead, assign more specific
teams for certain subsystems which require more in-depth knowledge in
these particular areas.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* .github/workflows: eks-cluster-pool-manager: fix race condition and cleanup leaked IAM roles
When multiple parallel jobs generate cluster names within the same
second, they can produce identical names since the timestamp has only
1-second precision. This causes CloudFormation stack creation to fail
with "AlreadyExistsException", leaving orphaned IAM roles behind.
This commit adds a random suffix to cluster names to prevent race conditions
and enhances the failure cleanup step to delete CloudFormation stacks and orphaned
IAM roles when cluster creation fails
Signed-off-by: André Martins <andre@cilium.io>
* hubble: Fix typos in config/set.go
Signed-off-by: harshitghagre <harshitghagre183@gmail.com>
* test/helpers: ignore error creating lease lock message
This message was modified in k8s 1.35.0, therefore we should update the
list of messages that can be ignored in our CI.
Signed-off-by: André Martins <andre@cilium.io>
* Fix backend slot index mismatch in LB reconciler
Use slotID instead of loop index when setting backend slots to avoid
gaps when maintenance backends are skipped.
Signed-off-by: Aman-Cool <aman017102007@gmail.com>
* vendor: Bump to StateDB v0.6.3
This restores the ability to use Changes() against a WriteTxn that is
targeting the same table as Changes().
Signed-off-by: Jussi Maki <jussi@isovalent.com>
* docs: Fix upgrade note category for tproxy
There's no new option here, it's a change in behaviour. Fix the category
for the upgrade note.
CC: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Fixes: c257b3d0aca8 ("datapath/connector: do not support netkit and bpf.tproxy=true")
Signed-off-by: Joe Stringer <joe@cilium.io>
* policy: Fix PASS verdict for non-consecutive tiers
Signed-off-by: Blaz Zupan <blaz@google.com>
* loadbalancer/healthserver: refresh ProxyRedirect per request
This commit fixes stale ProxyRedirect reads in the health server by reloading Service
state from the services table on each request. This prevents incorrect
local endpoint counts when Envoy redirect state changes after the
listener is created (which is the case).
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* localnodestore: provide WaitForNodeInformation
This commit moves global function `k8s.WaitForNodeInformation`
into the `localNodeSynchronizer`. To prevent dependencies to the
synchronizer, the `LocalNodeStore` re-exposes the same method and
delegates the call to the synchronizer.
This way, the legacy daemon initialization logic no longer needs to
dependent on the (k8s/cilium)-node resources.
This step isn't final. Eventually, the wait logic should watch
the state db "local node" for the changes.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* localnodestore: pass localnodestore to synchronizer
With this commit, the `LocalNodeStore` passes a pointer
to itself to the synchronizer when calling `WaitForNodeInformation`.
This way, the synchronizer can update the ip allocation ranges without
using the global functions.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* ci: e2e: add `kernel` to workflow job names
As a follow-up of changes introduced in https://github.com/cilium/cilium/pull/44126
to improve the readability of workflow job names in the GitHub UI, this
commit adds the `kernel` parameter to the name too. In OpenSearch,
it would be good to differentiate these jobs by the used kernel.
Given the kernel is a parameter that may vary frequently, it is good to be
able to differentiate runs that used the same configuration but different
kernels. That's also good in case of regressions when changing kernel.
The result will be `Setup & Test (ipsec-1, minor, 5.10)`.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* pkg/datapath/bandwidth: optimize host endpoint QoS setup
The bandwidth manager was calling node.GetEndpointID() and performing
a table lookup on every UpdateBandwidthLimit() call, even though the
host endpoint QoS only needs to be set up once.
This commit introduces an atomic boolean flag that tracks whether the
host endpoint QoS has been configured. After the initial setup:
- Skip the GetEndpointID() call (fast path)
- Skip the Table.Get() call entirely
- Skip the redundant Insert() call for host endpoint
Additionally, based on review feedback:
- Change node.GetEndpointID() to return (uint64, bool) to indicate
whether the host endpoint ID has been set, avoiding duplicate
constants across packages
- Change node.endpointID to atomic.Uint64 for thread-safe access
during initialization
Signed-off-by: Anand Kumar Shaw <anandkrshawheritage@gmail.com>
* clustermesh: fix a few misc issue with MCS-API doc
This commit fixes two issue with mcs-api doc:
- a simple typo on the enabled keywork
- change the code-block in parsed-literal as the |SCM_WEB| "variable"
was not evaluated/replaced in the final doc with a code-block
Signed-off-by: Arthur Outhenin-Chalandre <git@mrfreezeex.fr>
* Docs: improve docs around ipsec upgrade in 1.18
Signed-off-by: darox <maderdario@gmail.com>
* docs(ztunnel): fix duplicate word (a set)
Signed-off-by: Alexis La Goutte <alexis.lagoutte@gmail.com>
* docs(ztunnel): add missing backslash
add missing backslash for install with Cilium CLI
Signed-off-by: Alexis La Goutte <alexis.lagoutte@gmail.com>
* clustermesh: helm: remove clustermesh.enableMCSAPISupport
This option was deprecated and its removal was announced for Cilium
1.20, so let's drop it now that we are in the 1.20 cycle!
Signed-off-by: Arthur Outhenin-Chalandre <git@mrfreezeex.fr>
* daemon: enforce iptable rules are present with node-port is enabled
Moving forward the node-port code path will rely on iptables to
identify locally generated traffic and create NAT reservations for this
traffic in order to avoid reservation conflicts.
We will consider it a fatal error moving forward to disable iptable rule
installation while node-port or eBPF masquerading is enabled.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* bpf,nodeport: generalize SNAT conflict detection
Prior to this commit the nodeport SNAT conflict detection assumed host
traffic was sourced from the direct routing interface's IP and always
egressed on this device.
Remove this assumption by detecting locally generated traffic from any
egress interface 'bpf_host' is attached to.
This removes the dependency on the direct routing interface in the
node-port path.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* ztunnel: introduce end to end connectivity tests
The test infrastructure deploys three dedicated namespaces with client
and echo-server pods in each: two enrolled namespaces for testing
cross-namespace mTLS scenarios and one unenrolled namespace for baseline
verification. Pod affinity rules ensure echo-same-node pods co-locate
with clients while echo-other-node pods schedule on different nodes,
enabling both intra-node and inter-node traffic validation.
The test scenarios verify ztunnel mTLS behavior by dynamically labeling
namespaces with the io.cilium/mtls-enabled label during test execution.
For enrolled pod pairs, the tests assert that traffic flows through port
15008 (the ztunnel HBONE proxy) and that no unencrypted traffic appears
on port 8080. For unenrolled pods, the assertions are inverted to confirm
traffic bypasses ztunnel entirely. Cross-namespace scenarios confirm that
mTLS works correctly when client and server reside in separately enrolled
namespaces.
Packet capture validation runs from host network pods using tcpdump,
which is required since ztunnel operates in the host network namespace.
The tests query the ztunnel admin API to verify workload registration
before generating traffic, ensuring the any state the test depends on
has converged before making assertions.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
Signed-off-by: Quang Nguyen <nguyenquang@microsoft.com>
Signed-off-by: Robin Gögge <r.goegge@isovalent.com>
* ci,ztunnel: add workflows for ztunnel encryption tests
Add a new GitHub Actions workflow to run end-to-end tests for ztunnel
encryption in Cilium.
The new /ci-ztunnel-e2e trigger is added to the Ariane configuration,
pointing to the newly created conformance-ztunnel-e2e.yaml workflow
file.
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* ci,ztunnel: add ztunnel cert script to actions
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
* datapath: remove GetRoutePostEncryptMTU()
The last user was removed by
f81d8964ce4a ("ipsec: don't reduce post-encrypt MTU for tunnel overhead").
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* datapath: ipsec: remove clean up code for encrypt IP rule
https://github.com/cilium/cilium/pull/41699 stopped installing this IP
rule in the v1.19 release. With v1.20 we can now stop cleaning it up.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* loadbalancer/api: include proxy-redirect as backend
Currently, listing services via `cilium-dbg service list` displays
no backends for the services that use L7 proxy redirection (e.g.
Gateway API / Ingress loadbalancer service).
```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID Frontend Service Type Backend
1 10.96.62.28:80/TCP ClusterIP 1 => 10.244.1.149:4245/TCP (active)
2 10.96.0.10:53/TCP ClusterIP 1 => 10.244.1.69:53/TCP (active)
2 => 10.244.1.109:53/TCP (active)
3 10.96.0.10:53/UDP ClusterIP 1 => 10.244.1.69:53/UDP (active)
2 => 10.244.1.109:53/UDP (active)
4 10.96.0.10:9153/TCP ClusterIP 1 => 10.244.1.69:9153/TCP (active)
2 => 10.244.1.109:9153/TCP (active)
5 10.96.0.1:443/TCP ClusterIP 1 => 172.19.0.2:6443/TCP (active)
12 10.96.162.7:443/TCP ClusterIP 1 => 172.19.0.2:4244/TCP (active)
15 10.96.106.135:80/TCP ClusterIP 1 => 10.244.1.115:80/TCP (active)
16 10.96.55.103:80/TCP ClusterIP 1 => 10.244.1.67:80/TCP (active)
17 [::]:30965/TCP NodePort
19 [::]:30965/TCP/i NodePort
21 0.0.0.0:30965/TCP NodePort
23 0.0.0.0:30965/TCP/i NodePort
25 10.96.245.249:80/TCP ClusterIP
26 172.19.255.1:80/TCP LoadBalancer
27 172.19.255.1:80/TCP/i LoadBalancer
28 [fd00:10:96::d99f]:80/TCP ClusterIP
```
Even though the actual redirection happens via BPF/iptables TPROXY, it would be
nice to display the information about the local redirection to the node-local
L7 proxy (Envoy) in some form.
Therefore, this commit changes the frontend model generation to treat the
proxy redirection as backend of the service frontend in the form
`<localhost-addr>:<proxy-port>/<proto> (active)`
Result
```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID Frontend Service Type Backend
1 10.96.62.28:80/TCP ClusterIP 1 => 10.244.1.149:4245/TCP (active)
2 10.96.0.10:53/TCP ClusterIP 1 => 10.244.1.69:53/TCP (active)
2 => 10.244.1.109:53/TCP (active)
3 10.96.0.10:53/UDP ClusterIP 1 => 10.244.1.69:53/UDP (active)
2 => 10.244.1.109:53/UDP (active)
4 10.96.0.10:9153/TCP ClusterIP 1 => 10.244.1.69:9153/TCP (active)
2 => 10.244.1.109:9153/TCP (active)
5 10.96.0.1:443/TCP ClusterIP 1 => 172.19.0.2:6443/TCP (active)
12 10.96.162.7:443/TCP ClusterIP 1 => 172.19.0.2:4244/TCP (active)
15 10.96.106.135:80/TCP ClusterIP 1 => 10.244.1.115:80/TCP (active)
16 10.96.55.103:80/TCP ClusterIP 1 => 10.244.1.67:80/TCP (active)
17 [::]:30965/TCP NodePort 1 => [::1]:14543/TCP (active)
19 [::]:30965/TCP/i NodePort 1 => [::1]:14543/TCP (active)
21 0.0.0.0:30965/TCP NodePort 1 => 127.0.0.1:14543/TCP (active)
23 0.0.0.0:30965/TCP/i NodePort 1 => 127.0.0.1:14543/TCP (active)
25 10.96.245.249:80/TCP ClusterIP 1 => 127.0.0.1:14543/TCP (active)
26 172.19.255.1:80/TCP LoadBalancer 1 => 127.0.0.1:14543/TCP (active)
27 172.19.255.1:80/TCP/i LoadBalancer 1 => 127.0.0.1:14543/TCP (active)
28 [fd00:10:96::d99f]:80/TCP ClusterIP 1 => [::1]:14543/TCP (active)
```
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* datapath/ipcachelistener: use injected localnodestore
This commit refactors the datapath ipcache BPF listener to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.
Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* datapath/linuxnodehandler: retrieve node ips from localnodestore
This commit refactors the `linuxNodeHandler` to fetch the node ips
from the injected `LocalNodeStore` instead of using the global
functions `node.GetIP[v4/v6]`.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* identity/cache: use injected localnodestore
This commit refactors the identity cache to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.
Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* node/address: remove global functions `GetIP[v4/v6]`
This commit removes the unused global functions `GetIPv4` & `GetIPv6`.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* test: remove K8sDatapathBandwidthTest
The Ginkgo bandwidth test was already not running in CI. It only ran on
net-next kernels, but the focus group that included it
(f10-agent-hubble-bandwidth) was excluded from the net-next job.
Bandwidth enforcement is independently tested in
cilium-cli/connectivity/builder/network_bandwidth_limit.go.
Fixes: #44165
Related: #37837
Signed-off-by: Pavan More <pavansmore05@gmail.com>
* wireguard: remove cleanup code for old userspace devices
2578e8ff4dfa ("wireguard: remove deprecated userspace fallback") removed
the userspace mode in v1.17, we can now stop cleaning up any network device
created for that mode.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* loadbalancer: Check for equality and skip insert when not changed
This adds DeepEqual() implementations for Service and FrontendParams and uses
it to skip inserting the service and updating the frontend if nothing has changed.
As we now don't forcefully update the frontend the orphan detection has to actually
build a set to check if a frontend has been removed.
Benchmarks before:
Benchmark_UpsertServiceAndFrontends_100-6 3549 317691 ns/op 314771 objects/sec
BenchmarkInsertBackend-6 2818 423975 ns/op 235863 objects/sec
BenchmarkReplaceBackend-6 326682 3793 ns/op 263669 objects/sec
BenchmarkReplaceService-6 2327074 509.4 ns/op 1963230 objects/sec
After:
Benchmark_UpsertServiceAndFrontends_100-6 3464 331791 ns/op 301395 objects/sec
Benchmark_UpsertServiceAndFrontends_100_Unchanged-6 14652 81250 ns/op 1230766 objects/sec
BenchmarkInsertBackend-6 2956 401100 ns/op 249315 objects/sec
BenchmarkReplaceBackend-6 3402430 360.9 ns/op 2771038 objects/sec
BenchmarkReplaceService-6 2068555 556.6 ns/op 1796743 objects/sec
Signed-off-by: Jussi Maki <jussi@isovalent.com>
* loadbalancer: Remove dummy ingress endpoint workaround
Remove the skipping of the 192.192.192.192:9999 dummy ingress
endpoint as the next commit will remove the creation of it.
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* operator/helm: Remove creation of dummy ingress endpoint
With the new agent load-balancer implementation the dummy endpoint
is no longer necessary. Now that v1.18 has been released with that
implementation we can remove the creation of the dummy endpoint
without breaking upgrade.
Note: This commit removes the dummy endpoint in the Operator-
(Gateway API & dedicated Ingress) & Helm- (shared Ingress) managed
`EndpointSlice`'s. The creation of the `EndpointSlice` itself will
will be done in a separate PR.
Fixes: #19262
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* monitor: report 3rd argument in DBG_GENERIC debug monitor messages
Currently, in case cilium_dbg3(ctx, DBG_GENERIC, arg1, arg2, arg3) is used in
datapath code, the 3rd argument will not be reported in the monitor
message, e.g.:
bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac)
Also report the 3rd argument, so the monitor message will look as
follows:
bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac) arg3=170 (0xaa)
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* policy: cleanup label selector parsing and validation
This commit cleans up the parsing and validation of policy label
selectors by removing the dependency on k8s labels library for
validation.
With this change we no longer convert cilium representation
of label keys with source prefixed using ':' delimiter to k8s specific
representation with source prefixed using '.' for validation.
This avoids the additional complexity in policy code to convert between
the two representations. It also simplifies marshalling of selectors as
the objects now have a unified format.
This is not a functional change and does not have any associated user
impact.
Signed-off-by: Deepesh Pathak <deepesh.pathak@isovalent.com>
* helm/ztunnel: bind health check to localhost
Security hardening for ztunnel running with hostNetwork: true:
Add host field to readiness probe to bind the health check port 15021
to 127.0.0.1 instead of 0.0.0.0. This reduces attack surface by ensuring
the health check endpoint is only accessible from localhost (kubelet
runs on same node).
Signed-off-by: Quang Nguyen <nguyenquang@microsoft.com>
* ci:wireguard: enable Host Firewall in native routing e2e tests
This enabled Host Firewall in the `wireguard-3` config.
This helps us validating that host-related packets go through the host
firewall hook in `cilium_host` when WireGuard is enabled in native routing.
In overlay, even if we'd miss a redirect we'd see the packet in bpf_overlay,
which will then redirect the packet to bpf_host for HostFW validation.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* mcsapi: Add namespace filtering conditions to ServiceImport controller
Add namespace global status checking to the ServiceImport controller:
- Set Invalid condition on ServiceExport when namespace is not global
- Set NamespaceNotGlobal condition on ServiceImport when namespace is not global
- Signal service controller to skip/delete derived Service for non-global namespaces
by setting SupportedIPFamilies annotation to empty
This implements the MCS-API portion of the ClusterMesh global namespace
filtering feature as described in CFP-39876.
Signed-off-by: Jacques Massa <jac.massa0908@gmail.com>
* docs: split up network policy language page
Up until now, the page title had been "Layer 3 Examples", which is
a section headline and confusing, since it is among other examples.
Splitting up into several pages, similar to `network/kubernetes/`,
keeps the ToC as it is, and makes it easier to navigate compared to
the lengthy page it was, while also giving each page a suitable
headline.
Since examples are mixed with the language specification, change
headings from "Layer 3 Examples" to "Layer 3 Policies", etc.
Drop the old page and redirect to the overview to keep links working.
Signed-off-by: Daniel Maslowski <info@orangecms.org>
* golangci-lint: fix and simplify golangci-lint.sh
golangci-lint.sh would always rebuild the binary. The script was over-engineered
and didn't use the `golangci-lint version` subcommand, which doesn't need the
version string parsing logic. Simplify the script and fix the rebuild trigger.
Prepare the directory structure for splitting the main .golangci-lint.yaml in a
subsequent commit.
Signed-off-by: Timo Beckers <timo@isovalent.com>
* golangci-lint: split kubeapi configuration into separate file
The addition of kubeapilinter forced all users of the main .golangci.yaml to be
running a custom-built version of the tool and make sure it's kept in sync.
VSCode runs `golangci-lint run` in the repo by default and requires explicit
configuration to use another tool, like a script in the Cilium repository.
Before this PR, the script was broken and would always rebuilt custom-gcl.
Breaking the default tooling is not great. This commit moves kubeapi-specific
configuration to a dedicated config file in tools/golangci-lint-kubeapi and
specifies it explicitly when starting the kubeapi linter.
Signed-off-by: Timo Beckers <timo@isovalent.com>
* policy: Import the ClusterNetworkPolicy v1alpha2 API go modules
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Add parser function for Cluster Network Policy
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Add flags to enable or disable Cluster Network Policy
Disabled by default. A new Makefile target is added that enables it in kind clusters.
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Add watcher for Cluster Network Policy
Signed-off-by: Blaz Zupan <blaz@google.com>
* policy: Move NetworkPolicy and CNP/CCNP to the 'Normal' tier
Signed-off-by: Blaz Zupan <blaz@google.com>
* CODEOWNERS: re-assign operator/identitygc, adjust endpoint team description
Assign operator/identitygc to @cilium/sig-policy instead of
@cilium/endpoint based on Joe's feedback. This seems like the more
appropriate team based on the description.
Also update the description for the @cilium/endpoint team to cover
responsibilities for cluster-wide/k8s considerations and endpoint
lifecycle.
Follow-up to commit 11376d5c715b ("CODEOWNERS: add more specific owners
for operator subsystems").
Ref. https://github.com/cilium/cilium/pull/44279#pullrequestreview-3780398303
Suggested-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* nodemap: converted net.IP to netip.Addr, Part of #24246
- Converted net.IP to netip.Addr in node map functions
- Added validation for zero netip.Addr values in newNodeKey
- Wrapped parse errors with %w for better error context
Signed-off-by: Sanjeevliv <sanjeevsethilive@gmail.com>
* bpf/tests: fix byte ordering for for TCP seq/win values
The pktgen logic for BPF tests sets default values on TCP packets in
host byte order only. This feeds into the TCP checksum calculation
and while these inputs are technically incorrect, tests that assert
the TCP checksum are correctly testing the invalid output values.
With the introduction of scapy packet definitions, matching these values
to existing tests is difficult because scapy does convert seq and win
values into network byte order. The effect is that L4 sequence numbers
between pktgen and scapy are wrong.
This commit updates pktgen and existing tests that assert these values
to use the appropriate byte conversion routines so that both pktgen
and scapy definitions produce the same results.
This causes all affected BPF tests to fail. This will be addressed
in the next commit.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* bpf/tests: fix TCP checksum assertions in all tests
This commit updates every test that asserts TCP checksums to reflect new
values following the change to the default TCP seq/win values, such that
the assertions for this should succeed with a scapy packet definition.
As part of this change, test failure logs have been updated to correctly
show value observed in a packet, and the expected value, both in network
byte order. This is to simplify future debugging. This has also been
applied to UDP checksum assertions for consistency.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* bpf/tests: fix default_data definition for scapy tests
The pktgen code defines default_data macro as a string literal which
includes a NUL-terminating character. The size of this string is 20
bytes.
The scapy code defines the same value without the NUL-terminating
character. The size of this string is 19 bytes.
As a result, when converting a test from pktgen to scapy, the packet
length changes, which causes assertion failures beause the change leads
to different L3 checksum values.
This commit modifies the scapy default_data variable to include a NUL
terminating character to match pktgen definitions. It is anticipated
this can be removed once all BPF tests are migrated to scapy.
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
* endpoint, fqdn: remove restoration of deprecated V1 DNSRules
Now that 1.16 is EOL, the code restoring PortProto V1 (without protocol)
can be removed.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* endpoint: rename DNS rules field
The V2 suffix is unnecessary now that we stop restoring V1 rules and
there is only a single supported version/field.
Also add a test that verifies that V2 rules are restored and V1 rules
are ignored while retaining backwards compatibility.
Signed-off-by: Tobias Klauser <tobias@cilium.io>
* tests: Ignore identity manager related error in versions < 1.18
Due to https://github.com/cilium/cilium/pull/42661 and
https://github.com/cilium/cilium/pull/42662 not being backported yet to
v1.17, CI fails in the upgrade/downgrade test with this error.
Therefore, we must add it to the ignore list until the PRs are at least
backported to v1.17.
The error was removed from the ignore list in
https://github.com/cilium/cilium/pull/42982.
Suggested-by: Marco Iorio <marco.iorio@isovalent.com>
Suggested-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
* metrics: remove agent bootstrap metrics
This commit removes the deprecated agent bootstrap metrics.
Hive job metrics can be used to inspect the duration of the legacy
daemon initialization job. In special cases it might be worth to
introduce context specific metrics - similar to what has been
introduced for the endpoint restoration logic.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* policy: fix policy tests
This fixes a policy break due to how label source is handled that
recently changed.
Fixes: 270daa4067 ("policy: Add parser function for Cluster Network Policy")
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
* resource/test: let TestResource_WithFakeClient set resource version
Update the TestResource_WithFakeClient test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* cid/test: let TestUpdatePodLabels set resource version
Update the TestUpdatePodLabels test to correctly specify the expected
resource version during updates, in preparation for extending the fake
client to actually enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* ipam/test: let Test_LocalNodeCIDRsSyncer set resource version
Update the Test_LocalNodeCIDRsSyncer test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* bgp/test: correctly set resource version when updating test resources
Update the bgp tests to correctly specify the expected resource version
during updates, in preparation for extending the fake client to actually
enforce optimistic concurrency control.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* test/controlplane: adaptation for optimistic concurrency control
Update the UpdateObjects helper to use the [ObjectTracker.Patch],
instead of [ObjectTracker.Update], in preparation for the subsequent
commit that will make the latter implement optimistic concurrency
control, and validate resource version mismatches, which is not
required in this context.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* k8s/client/fake: fix resource version configuration in tracker
Currently, the object tracker is affected by a bug that causes the
resource version to not be set on creation or update if the object
does not have the [metav1.TypeMeta] set. Indeed, in that case, the
function updating the TypeMeta creates a deep copy of the object,
causing operations performed via [meta.Accessor] to act on the old
copy, and not have effect. Let's get this fixed by changing the
[fillTypeMetaIfNeeded] function to not create a deep copy, given
that it already operates on a copy of the original object.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* k8s/client/fake: let update operations respect resource versioning
Currently, the statedb object tracker backing the fake kubernetes client
used for testing purposes does not respect resource versioning, and allows
update operations to succeed regardless of the provided resource version.
While convenient for the `k8s/update` command itself, this approach is
problematic in case of controllers acting on the same resources, as it
can lead to objects being unexpectedly reverted to incorrect versions,
due to the missing optimistic concurrency control.
Let's get this fixed by extending the update implementation to additionally
compare the resource version of the stored and provided objects, and reject
the update in case they do not match, as the real Kubernetes API Server
would do. By default, the k8s/update command still ignores the provided
resource version, letting the update succeed regardless: this matches the
desired behavior in the vast majority of the tests, and avoids the need
for complex operations to set the expected resource version. Still, if
necessary, the stricter behavior can be enabled via the dedicated flag.
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* chore(deps): update base-images to v1.26.0
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* images: update cilium-{runtime,builder}
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
* chore(deps): update cilium/cilium-cli action to v0.19.1
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all github action dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update gcr.io/etcd-development/etcd docker tag to v3.6.8
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all lvh-images main
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all github action dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update all-dependencies
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* images: update cilium-{runtime,builder}
Signed-off-by: Cilium Imagebot <noreply@cilium.io>
* test: fix goleak check in combination with script tests
Currently, multiple script tests are intended to validate that no
goroutines are leaked once the tests end, deferring the invocation
of the dedicated [testutils.GoleakVerifyNone] function. However,
the underlying [goleak.VerifyNone] utility is incompatible with
t.Parallel [1], which is set by default by script tests, and no
check is actually performed.
Let's get this fixed by using [goleak.VerifyTestMain] instead, as
also suggested by goleak documentation itself. This commit fixes all
occurrences spotted via:
$ git grep -l GoleakVerifyNone | xargs grep -l testdata
It is worth additionally mentioning that:
* GoleakVerifyTestMain was already invoked in the redirectpolicy
package, and is thus not added;
* The functions previously ignored in the devices_controller tests
do not appear to be necessary anymore, and have been omitted; yet,
we need to additionally ignore one metrics related goroutine that
is otherwise flagged when IPSec is enabled;
* One of the script tests in the route/reconciler package did not
correctly stop the hive, causing a few goroutines to be leaked.
Ideally we should have a linter to catch this problem directly
in CI, but that's deferred for the future.
[1]: https://pkg.go.dev/go.uber.org/goleak#VerifyNone
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
* fix(deps): update all go dependencies main
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* chore(deps): update quay.io/cilium/cilium-envoy docker tag to v1.36.5-1770892622-f97ae52c05a1edbbdaa6393f8595431259cf2ca1
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
* README: Update releases
Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
* docs: fix duplicate --version in Helm OCI install/upgrade examples
|CHART_VERSION| already expands to '--version <release>'.
Removing the extra literal --version before |CHART_VERSION| so the
rendered CLI is correct (e.g. single '--version 1.19.0').
Signed-off-by: Ghassan Malke <ghassan+github@malke.nl>
* gh: e2e-upgrade: skip disk cleanup when workflow is skipped
Most parts of this workflow are skipped when testing patch-level upgrades
in the `main` branch. Also skip the initial disk cleanup, which takes
around 1 minute.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* bpf:refactor: move ipv{4,6}_host_delivery to local_delivery.h
The two bpf programs `bpf_overlay` and `bpf_wireguard` share the same
logic to redirect a packet to cilium_host@ingress. The mere difference is
that in WireGuard the packet needs to be adjusted to add the Ethernet
layer before doing that, and using __ETH_HLEN rather than ETH_HLEN for
the L3 header offset computation.
Let's move the common logic into `local_delivery.h`, and add comments
to the functions to clarify their purpose.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* bpf:refactor:wireguard: remove resolve_srcid_ipv{4,6}
We copied-it over from bpf_host, with apposite simplifications for WireGuard.
Having this helper with the same name in both programs is a bit
confusing IMHO. Moving this into `identity.h` would be good, but I checked
most of the codebase and we actually do an inline lookup to retrieve the
`info->sec_identity`. Adapting all LOCs would not be worth it, as in most
cases the `info` pointer is needed for other matters and not only for
retrieving the identity. I have not found a common denominator yet.
For this reason, I opted for simplifying this code in `bpf_wireguard` only.
While doing that, let's also start re-using UNKNOWN_ID as default rather
than WORLD_IPV{4,6}_ID, as we were doing in `bpf_host`` before porting over the code.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* bpf, datapath: move CIDR identity range macros to bpf/lib/identity.h
This change moves the CIDR identity range macros in bpf/lib/identity.h
and stops emitting CIDR_IDENTITY_RANGE_* defines from the datapath
header writer.
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
* renovate: allow update of k8s libraries in stable branches
The intent was always to update the k8s libraries patch versions into
the stable branches but due to a misconfiguration in renovate config
that never happened.
Signed-off-by: André Martins <andre@cilium.io>
* renovate: skip updating sigs.k8s.io/network-policy-api v0.1.8
This tag does not contain a package that exist in a later commit, thus
we should skip it until it gets fixed.
Signed-off-by: André Martins <andre@cilium.io>
* gh: e2e-upgrade: don't hardcode IPsec encryption algorithm
Some e2e configs specify a different encryption algorithm (cbc-aes-sha256).
Have the e2e-upgrade workflow respect this.
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
* policy: Pull labels from the CachedSelectionUser
Pull labels from the CachedSelectionUser instead of caching them in the
identitySelector. Caching the rule labels in the identitySelector only
works when the selector is only ever used in a single policy. Pulling the
labels from the registered users can pull the labels from all rules that
are currently using the selector.
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* api: Return a label array list for policy selectors
Return a list of labels arrays in the labels field of 'cilium-dbg policy
selectors' response, so that the labels from all the "user" rules can be
returned. With this change the labels field shows the labels from all
"users" rather than just one of them.
Example output of 'cilium-dbg policy selectors' before and after:
Before:
SELECTOR LABELS USERS IDENTITIES
&LabelSelector{MatchLabels:map[string]string{reserved.host: ,},MatchExpressions:[]LabelSelectorRequirement{},} 1 1
&LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{},} default/allow-80-8080 2 1
2
...
After:
SELECTOR LABELS USERS IDENTITIES
&LabelSelector{MatchLabels:map[string]string{reserved.host: ,},MatchExpressions:[]LabelSelectorRequirement{},} 1 1
&LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{},} default/allow-80-8080,default/l7-rule 2 1
2
...
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
* Revert "bpf: wire events map rate limits through node config"
This reverts commit b4fed1ddd0b25333eb3831537b9c450a86aa5d02.
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
* bgp: Extend Router.GetPeers to return address
We need this to query adj-rib in the new GetRoutes API.
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
* bgp: Mark existing BGPRouterManager.GetRoutes as legacy
The current RouterManager.GetPeer API returns the API model directly. In
the new CLI, we will rely on the new internal model. Since, we still
need to keep the legacy output around for a while, mark it as legacy and
keep it as is. We'll introduce a new implementation in the subsequent
commits.
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
* bgp: Minor fixes for bgp/peers command
- Remove unnecessary extra P from the function name
- Fix the bug in the Instance and Peer deduplication logic
- Don't sort slice including header
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
* bgp: Introduce a new BGPRouterManager.GetRoutes
Introduce a new GetRoutes API on BGPRouterManager which returns
BGPv2-native result. The result contains the Instance name that the
route is retrieved from, and the Neighbor name for adj-rib-in and
adj-rib-out.
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
* bgp: add routes hive command
Introduce bgp/routes to query BGPRouterManager.GetRoutes and render a
structured table for loc-rib and adj-rib tables.
Example output:
loc-rib
```
Instance Prefix NextHop Best Age
instance0 10.0.0.0/32 0.0.0.0 true 10s
10.0.0.1/32 0.0.0.0 true 10s
instance1 10.0.0.2/32 0.0.0.0 true 10s
```
adj-rib
```
Instance Peer Prefix NextHop Age
instance0 peer0 10.244.0.0/24 10.99.0.110 10s
10.96.50.104/32 10.99.0.110 10s
peer1 10.244.0.0/24 10.99.0.110 10s
10.96.50.104/32 10.99.0.110 10s
instance1 peer2 10.244.0.0/24 10.99.0.110 10s
10.96.50.104/32 10.99.0.110 10s
```
It has an optional flags -o (output to file) and --no-age (disable Age
output to make the command output predictable).
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
* bgp: Add a simple test command to advertise route
The adj-rib-in test requires the peer GoBGP to advertise the route. Add
a simple command to advertise route from test GoBGP instance.
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
* bgp: Add command output script test scenrio
Add a script test scenario that tests the output of the bgp/peers and
bgp/routes commands.
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
* options: Migrate `VtepCidrMask` from `net.IP` to `netip.Addr`
Related: #24246
Signed-off-by: Hadrien Patte <hadrien.patte@datadoghq.com>
* policy: add support for wildcard specifier anywhere in sni pattern
This commit relaxes k8s api validation pattern for server names in
policy api to allow wildcard specifiers anywhere in SNI pattern.
This allows users to write more compressed network policies and is
inline with the syntax supported in FQDN match pattern.
With this change users can now specify allowed server names with
wildcard as:
- '**.cilium.io': Existing behavior which matches any number of
subdomain levels in the prefix. "test.cilium.io" and
"test.app.cilium.io" matches but "cilium.io" does not.
- '*.cilium.io': Existing behavior which matches all subdomains of
cilium.io on a single level. "test.cilium.io" matches but
"test.app.cilium.io" and "cilium.io" do not.
- 'sub*.cilium.io': Matches subdomains of cilium.io where the subdomain
component begins with "sub"(only one level). "sub.cilium.io" and
"subdomain.cilium.io" matches wile "www.cilium.io", "cilium.io" and
"test.subdomain.cilium.io" do not.
Additionally this commit introduces a new helper function used to
sanitize server names pattern when converting to envoy protobuf. This
is required because cilium-envoy doesn't support the same semantics
for match pattern syntax as DNS match pattern in cilium-agent.
Signed-off-by: Deepesh Pathak <deepesh.pathak@isovalent.com>
* cli: add connectivity test for tls sni pattern with random wildcard
Signed-off-by: Deepesh Pathak <deepesh.pathak@isovalent.com>
* docs: Fix formatting for install command for GKE Clustermesh
The snippet contains a |CHART_VERSION| directive that is not substituted
when generating the docs, because it's under a "code-block" directive
instead of a "parsed-literal".
Fix the directive, adjust backslashes accordingly, and remove the
redundant "--version" argument (already generated when expanding
|CHART_VERSION|).
Trim trailing white spaces in the file.
Fixes: 63bfe7d8f943 ("Added GKE-to-GKE Clustermesh Preparation guide")
Signed-off-by: Quentin Monnet <qmo@qmon.net>
* docs: Fix formatting for command to use Prometheus metrics:
The snippet contains a |CHART_VERSION| directive that is not substituted
when generating the docs, because it's under a "code-block" directive
instead of a "parsed-literal".
Fix the directive, adjust backslashes accordingly, and remove the
redundant "--version" argument (already generated when expanding
|CHART_VERSION|).
Trim trailing white spaces in the file.
Fixes: b76f9285bb94 ("docs: add Helm configuration instructions for metrics")
Signed-off-by: Quentin Monnet <qmo@qmon.net>
* docs: Fix commands split in parsed-literal blocks throughout docs
For "parsed-literal" blocks, we need a double-backslash at the end of
the line to make a single "\" appear in the generated HTML docs. With a
single one in the source, Sphinx remove the line breaks and leaves
multiple spaces (from the indentation on the next line) instead.
Go through multiple locations (all that I could find) in the docs where
we have parsed-literal blocks with single backslashes to mark a command
split, and adjust with double backslashes instead.
For .../sbom.rst, also add the missing indentation marking the
continuation of the command line.
Trim trailing white spaces, if any, in all edited files.
Signed-off-by: Quentin Monnet <qmo@qmon.net>
* bpf/tests/scapy: fix CTX len on pkt len mismatch
Fix error trace showing the incorrect CTX len when packet and buffer
length mismatched.
Signed-off-by: Marc Suñé <marc.sune@isovalent.com>
* bpf/tests/scapy: throw build bug if pkts > 1518b
Throw build bug if packets exceed the __SCAPY_MAX_BUF (1518bytes) on
BUF_DECL().
Signed-off-by: Marc Suñé <marc.sune@isovalent.com>
* bpf/tests/scapy: cleanup __ASSERT_TRACE_FAIL_BUF()
Remove unnecessary args to __ASSERT_TRACE_FAIL_BUF().
Signed-off-by: Marc Suñé <marc.sune@isovalent.com>
* bpf/tests/scapy: support pkts 1036-1518 bytes
Commit e80be9ebff work-arounded the 128 byte limitation of the
cilium builtins implementation by reimplementing (hack, innefficiently
as noted in the commit msg) a simple version of memcpy/memcmp to be
used by scapy assert checks (only).
Unfortunately, when buffers exceed ~1036 bytes, the clang/LLVM optimizer
removes memcpy code and (attempts to) use built-in instead (even at O1).
Since the builtin has a hard limit of 128 8-byte words [1] (thanks
Daniel Borkmann for the pointer), this leads to:
In file included from _scapy_selftest.c:12:
./scapy.h:64:23: error: A call to built-in function 'memcpy' is not supported.
64 | *(__u64 *)(dst + i) = *(__u64 *)(src + i);
| ^
./scapy.h:64:23: error: A call to built-in function 'memcpy' is not supported.
./scapy.h:64:23: error: A call to built-in function 'memcpy' is not supported.
./scapy.h:64:23: error: A call to built-in function 'memcpy' is not supported.
or if directly using `__bpf_memcpy_builtin()` (which calls
`__builtin_memcpy()`):
In file included from _scapy_selftest.c:7:
In file included from ./pktgen.h:7:
/home/msuneclo/dev/cilium/bpf/include/bpf/builtins.h:165:2: error: A call to built-in function 'memcpy' is not supported.
165 | __builtin_memcpy(d, s, len);
This commit works-around this issue too by wrapping the original
_scapy_memcpy() to perform chunked memcpys. Current packet size is limited to
1518 bytes (__SCAPY_MAX_BUF), but could be extended to 2K.
[1] https://github.com/llvm/llvm-project/blob/a6929f7937696bb07788be6428fdcf1bf36775b5/llvm/lib/Target/BPF/BPFSelectionDAGInfo.h#L34
Signed-off-by: Marc Suñé <marc.sune@isovalent.com>
Reported-by: Simone Magnani <simone.magnani@isovalent.com>
* script:test:lb: explicit deny ClusterIP access when not enabled
This commit has no functional changes, but it explicitly set the
`--bpf-lb-external-clusterip=false` flag in the test data, to make it
clear and explicit that we don't expect ClusterIP to be routable.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* bpf:test: add IPv4/6 coverage for nodeport non-routable cluster IP
This commits adds `tc_lb{4,6}_nonroutable_clusterip` test that ensures
packets sent from external node to a non-routable ClusterIP service
are dropped with the correct reason code DROP_IS_CLUSTER_IP and
that the metrics are updated correctly.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* ginkgo: remove `ClusterIP cannot be accessed externally when access is disabled`
The Ginkgo test verifies that ClusterIP services are not reachable from
external (i.e., nodeWithoutCilium) when the `bpf-lb-external-clusterip` flag is
disabled. However, this behavior is already covered by:
1. `pkg/loadbalancer/tests/testdata/clusterip.txtar`, where in the LB map
we expect `FLAGS=ClusterIP+sessionAffinity+non-routable` for the service.
2. `tc_lb{4,6}_nonroutable_clusterip.c`, where we verify that an
incoming packet destined to a service w/o the SVC_FLAG_ROUTABLE is
being dropped.
Thus, this Ginkgo test can be simply removed as is.
Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
* loader: Reduce number of permutations for load-time configs
The runtime is growing exponentially with the number of load-time
permutations we are covering in verifier tests. This is already causing
timeouts in some cases, so let's try to reconsider permutations.
This commit removes coverage for some load-time configs being disabled.
All of these configs are enabled by default and unlikely to be disabled
by users. Even if they were disabled, it's unlikely that action would
increase complexity (on the contrary).
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
* node/address: refactor GetCiliumEndpointNodeIP
Currently, the global function `node.GetCiliumEndpointNodeIP` uses
the global `LocalNodeStore` instance to retrieve the local node.
In preparation to eventually get rid of the global field that holds the
local node store, this commit refactors the function `GetCiliumEndpointNodeIP`
to expect the local node store to be passed as an argument.
This also allows us to get rid of some test related helper functions and
makes dependencies explicit.
Note: Not all refactored places properly support context propagation
and error handling. For this places, we currently use `context.Background()`
and `logging.Fatal`. This is similar to what already happened under the hood.
Note 2: In a later step we might want to refactor the function into a
method of the `LocalNode`. I hesitate to do it in this commit because
IMO it's more endpoint than node related :/
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* README: Update releases
Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
* clustermesh: Enable global namespace default configuration option
Expose the `--clustermesh-default-global-namespace` flag that was
previously hidden, allowing users to configure whether namespaces
are treated as global by default in Clustermesh.
Changes:
- Add Helm value `clustermesh.defaultGlobalNamespace`
- Pass flag to clustermesh-apiserver deployment and cilium-config
- Update documentation and schema
- Remove MarkHidden() for the flag in config.go
Signed-off-by: Anubhab Majumdar <anmajumdar@microsoft.com>
* ci: Add global namespace support for connectivity tests
Add support for testing clustermesh with defaultGlobalNamespace=false:
- Add defaultGlobalNamespace to test non-global namespace behavior in
conformance-clustermesh.yaml
- Annotate namespaces in deployment.yaml when
defaultGlobalNamespace=false
Signed-off-by: Anubhab Majumdar <anmajumdar@microsoft.com>
* ingress/gateway-api: remove EndpointSlice creation
With the removal of the ingress dummy endpoint from the `EndpointSlice`s
that are used for Cilium Ingress & Gateway API, it's possible to get
rid of the need to create and manage these `EndpointSlice`s completely.
This commit removes the respective Helm files, Operator reconciliation logic/watches.
* Gateway API: remove per-Gateway EndpointSlice creation
* Ingress (dedicated): remove per Ingress EndpointSlice creation
* Ingress (shared): remove shared EndpointSlice from Helm
Helm keeps track of installed resources in a k8s secret and will
remove these when updating from a previous version.
The removal of any potential leftovers of Operator-managed `EndpointSlices`
from previous versions on upgrade is handled in the following commit.
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
* ingress/gateway-api: cleanup old EndpointSlice
This commit ensures that any `EndpointSlice`s
that have been …
Due to cilium#42661 and cilium#42662 not being backported yet to v1.17, CI fails in the upgrade/downgrade test with this error. Therefore, we must add it to the ignore list until the PRs are at least backported to v1.17. The error was removed from the ignore list in cilium#42982. Suggested-by: Marco Iorio <marco.iorio@isovalent.com> Suggested-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
This removes a logline in the list of ignored errors. We want to catch this since it points directly to correctness issues. When this occurs, its an issue with the reference counting of identity use - and either there is a leak, or some things are detached prematurely that can cause policy issues.
Fixes: #16419