Skip to content

allocator: fix out-of-valid-range identities being allocated#18151

Merged
jibi merged 1 commit intocilium:masterfrom
ctripcloud:avoid_allocating_out_of_range_identity
Feb 14, 2022
Merged

allocator: fix out-of-valid-range identities being allocated#18151
jibi merged 1 commit intocilium:masterfrom
ctripcloud:avoid_allocating_out_of_range_identity

Conversation

@ArthurChiao
Copy link
Copy Markdown
Contributor

When multiple cilium clusters share the same KVStore ("" prefix
prevents them from being overlapped), the identity allocator inside each
cilium agent may allocate out identities that are out of the cluster's
valid range.

This patch fixed the problem by validating the identity at two places:

  1. the place where an identity is just allocated out
  2. the place before an identity is enqueued into available idpool

Adapted from Jaff Cheng's internal patch at trip.com.

Signed-off-by: ArthurChiao arthurchiao@hotmail.com

@ArthurChiao ArthurChiao requested a review from a team as a code owner December 7, 2021 08:07
@ArthurChiao ArthurChiao requested review from a team and twpayne December 7, 2021 08:07
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 7, 2021
@ArthurChiao ArthurChiao changed the title allocator: fix out-of-valid-range identities being allocated WIP: allocator: fix out-of-valid-range identities being allocated Dec 7, 2021
@ArthurChiao ArthurChiao force-pushed the avoid_allocating_out_of_range_identity branch 3 times, most recently from 8c436a1 to 7bd54d7 Compare December 8, 2021 06:57
@nbusseneau nbusseneau marked this pull request as draft December 8, 2021 17:59
@nbusseneau
Copy link
Copy Markdown
Member

@ArthurChiao FYI I'm converting this to draft since it's still WIP :)

@ArthurChiao ArthurChiao force-pushed the avoid_allocating_out_of_range_identity branch from 7bd54d7 to d141907 Compare December 9, 2021 01:29
@ArthurChiao ArthurChiao marked this pull request as ready for review December 9, 2021 02:07
@ArthurChiao
Copy link
Copy Markdown
Contributor Author

ArthurChiao commented Dec 9, 2021

Hi @nbusseneau just rebased to the latest master, and ready for reviewing now.

Noticed that the "Code scanning results / CodeQL" failed, but the failed code is not introduced in this patch. Should I fix them in this PR as well?

Thanks!

@ArthurChiao ArthurChiao changed the title WIP: allocator: fix out-of-valid-range identities being allocated allocator: fix out-of-valid-range identities being allocated Dec 9, 2021
@ArthurChiao ArthurChiao force-pushed the avoid_allocating_out_of_range_identity branch from d141907 to c943f9b Compare December 9, 2021 09:13
@ArthurChiao ArthurChiao requested a review from a team as a code owner December 9, 2021 09:13
@ArthurChiao ArthurChiao requested a review from aanm December 9, 2021 09:13
@twpayne
Copy link
Copy Markdown
Contributor

twpayne commented Dec 9, 2021

Noticed that the "Code scanning results / CodeQL" failed, but the failed code is not introduced in this patch. Should I fix them in this PR as well?

No, please ignore this. It's actually a false positive from CodeQL.

@twpayne twpayne added needs-backport/1.11 release-note/bug This PR fixes an issue in a previous release of Cilium. labels Dec 9, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Dec 9, 2021
Copy link
Copy Markdown
Member

@nbusseneau nbusseneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes LGTM, thanks.

@nbusseneau
Copy link
Copy Markdown
Member

nbusseneau commented Dec 10, 2021

/test

Job 'Cilium-PR-K8s-1.23-kernel-net-next' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sPolicyTest Basic Test checks all kind of Kubernetes policies

Failure Output

FAIL: failed to ensure kubectl version: failed to run kubectl version

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.23-kernel-net-next so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-GKE' hit: #17270 (94.25% similarity)

@nbusseneau
Copy link
Copy Markdown
Member

Closing and re-opening PR since GH Actions seem stuck (this happens sometimes).

@nbusseneau nbusseneau closed this Dec 10, 2021
@aanm
Copy link
Copy Markdown
Member

aanm commented Jan 5, 2022

/test

@aanm aanm closed this Jan 6, 2022
@aanm aanm reopened this Jan 6, 2022
@aanm
Copy link
Copy Markdown
Member

aanm commented Jan 6, 2022

/test

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Test pods are not ready after timeout

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

@pchaigno
Copy link
Copy Markdown
Member

pchaigno commented Jan 11, 2022

The GKE CI job failed with known flake #17307. Other required tests are passing. Reviews are in. Merging.

Oh, wait! There are merge conflicts :-(

@pchaigno pchaigno added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. and removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Jan 11, 2022
@ArthurChiao ArthurChiao force-pushed the avoid_allocating_out_of_range_identity branch from 384f94b to e89dcff Compare January 11, 2022 14:14
@pchaigno
Copy link
Copy Markdown
Member

@ArthurChiao You will need to rebase again to catch #18443. It fixes a CI infrastructure issue that "appeared" yesterday. Sorry for the trouble.

When multiple cilium clusters share the same KVStore ("<clustername>" prefix
prevents them from being overlapped), the identity allocator inside each
cilium agent may allocate out identities that are out of the cluster's
valid range.

This patch fixed the problem by validating the identity at two places:

1. the place where an identity is just allocated out
2. the place before an identity is enqueued into available idpool

Adapted from Jaff Cheng's internal patch at trip.com.

Signed-off-by: ArthurChiao <arthurchiao@hotmail.com>
@ArthurChiao ArthurChiao force-pushed the avoid_allocating_out_of_range_identity branch from e89dcff to c9b5f5f Compare January 12, 2022 14:44
@ArthurChiao
Copy link
Copy Markdown
Contributor Author

@pchaigno it's ok, just rebased, thanks :)

@pchaigno
Copy link
Copy Markdown
Member

/test

@pchaigno pchaigno removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Jan 12, 2022
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Feb 12, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 12, 2022
@jibi jibi merged commit 5224f69 into cilium:master Feb 14, 2022
@ArthurChiao ArthurChiao deleted the avoid_allocating_out_of_range_identity branch February 14, 2022 13:53
@pchaigno
Copy link
Copy Markdown
Member

pchaigno commented Feb 15, 2022

This pull request broke the multicluster workflow. I'll send a revert PR.

EDIT: PR is at #18808.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants