Skip to content

Update netlink library to not set XFRMA_IF_ID = 0 by default#18506

Merged
gandro merged 2 commits intocilium:masterfrom
tklauser:pr/encrypt-status-test-fix
Jan 18, 2022
Merged

Update netlink library to not set XFRMA_IF_ID = 0 by default#18506
gandro merged 2 commits intocilium:masterfrom
tklauser:pr/encrypt-status-test-fix

Conversation

@tklauser
Copy link
Copy Markdown
Member

@tklauser tklauser commented Jan 17, 2022

(previous PR title: Fix EncryptStatusSuite.TestCountUniqueIPsecKeys)

As of Linux kernel commit torvalds/linux@68ac0f3810e7 ("xfrm: state and
policy should fail if XFRMA_IF_ID 0"), specifying xfrm if_id = 0 leads
to EINVAL being returned. So far, the netlink library always specified
the XFRMA_IF_ID netlink attribute, regardless of its value. Upstream PR
vishvananda/netlink#727 changed this behavior to
only set XFRMA_IF_ID in case XfrmState.Ifid or XfrmPolicy.Ifid are != 0
to fix the issue.

Updated using:

go get github.com/vishvananda/netlink@main
go mod tidy
go mod vendor

Fixes: #18500

@tklauser tklauser added area/CI Continuous Integration testing issue or flake release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.10 labels Jan 17, 2022
@tklauser tklauser requested a review from a team as a code owner January 17, 2022 17:16
@tklauser tklauser requested a review from twpayne January 17, 2022 17:16
@tklauser
Copy link
Copy Markdown
Member Author

tklauser commented Jan 17, 2022

/test-runtime

Job 'Cilium-PR-Runtime-net-next' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

RuntimePrivilegedUnitTests Run Tests

Failure Output

FAIL: Failed to run privileged unit tests

If it is a flake, comment /mlh new-flake Cilium-PR-Runtime-net-next so I can create a new GitHub issue to track it.

@pchaigno
Copy link
Copy Markdown
Member

Regarding backport labels, the test doesn't exist in v1.10.

@pchaigno pchaigno added release-note/ci This PR makes changes to the CI. and removed needs-backport/1.10 release-note/bug This PR fixes an issue in a previous release of Cilium. labels Jan 17, 2022
Copy link
Copy Markdown
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! How did you find this so quickly?

@tklauser
Copy link
Copy Markdown
Member Author

Nice catch! How did you find this so quickly?

git bisect run 😉

@tklauser
Copy link
Copy Markdown
Member Author

There is another legitimate failure in pkg/datapath/linux/ipsec: https://jenkins.cilium.io/job/Cilium-PR-Runtime-net-next/1043/

I think this needs to be fixed in the netlink library to only set the XFRMA_IF_ID attribute if it is not 0. I've opened vishvananda/netlink#727 doing that.

Restore the original netns on test exit or failure and also check errors
on these operations.

Also make sure to unlock the OS thread and close the test netns on test
failure (i.e. using defer, not only at test exit).

Fixes: 70e5c58 ("test: add unit tests for `cilium encrypt` and subcommands")
Signed-off-by: Tobias Klauser <tobias@cilium.io>
@tklauser tklauser force-pushed the pr/encrypt-status-test-fix branch from 3215287 to 11bf1df Compare January 18, 2022 07:24
@tklauser tklauser requested a review from a team as a code owner January 18, 2022 07:24
@tklauser tklauser requested a review from rolinh January 18, 2022 07:24
@tklauser tklauser force-pushed the pr/encrypt-status-test-fix branch from 11bf1df to facba9e Compare January 18, 2022 07:25
@tklauser
Copy link
Copy Markdown
Member Author

Dropped the second commit explicitly setting XfrmState.IfId to 1 and instead added a commit updating the github.com/vishvananda/netlink library to pull in vishvananda/netlink#727.

…tlink

As of Linux kernel commit torvalds/linux@68ac0f3810e7 ("xfrm: state and
policy should fail if XFRMA_IF_ID 0"), specifying xfrm if_id = 0 leads
to EINVAL being returned. So far, the netlink library always specified
the XFRMA_IF_ID netlink attribute, regardless of its value. Upstream PR
vishvananda/netlink#727 changed this behavior to
only set XFRMA_IF_ID in case XfrmState.Ifid or XfrmPolicy.Ifid are != 0
to fix the issue.

Updated using:

    go get github.com/vishvananda/netlink@main
    go mod tidy
    go mod vendor

Signed-off-by: Tobias Klauser <tobias@cilium.io>
@tklauser tklauser force-pushed the pr/encrypt-status-test-fix branch from facba9e to dd7334b Compare January 18, 2022 07:28
@tklauser
Copy link
Copy Markdown
Member Author

/test-runtime

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 18, 2022
@pchaigno
Copy link
Copy Markdown
Member

Runtime tests are passing. Reviews are in. Marking ready to merge.

@pchaigno
Copy link
Copy Markdown
Member

pchaigno commented Mar 3, 2022

We also need to backport this to v1.10 to fix the EKS workflow there (cf. https://github.com/cilium/cilium/runs/5405780998?check_suite_focus=true hit in #18835).

@pchaigno
Copy link
Copy Markdown
Member

The title is a bit misleading here. IIRC, this pull request doesn't only fix the test.

@tklauser
Copy link
Copy Markdown
Member Author

tklauser commented May 2, 2022

The title is a bit misleading here. IIRC, this pull request doesn't only fix the test.

Good point. The title still stems from the initial fix which was made before the actual cause of the issue was known. I'll change the PR title/description accordingly.

@tklauser tklauser changed the title Fix EncryptStatusSuite.TestCountUniqueIPsecKeys Update netlink library to not set XFRMA_IF_ID = 0 by default May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/CI Continuous Integration testing issue or flake backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: RuntimePrivilegedUnitTests Run Tests EncryptStatusSuite.TestCountUniqueIPsecKeys value syscall.Errno = 0x16 ("invalid argument")

7 participants