Skip to content

occm: implement a support for atomic routes update#2134

Merged
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
kayrus:atomic-routes
Mar 9, 2023
Merged

occm: implement a support for atomic routes update#2134
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
kayrus:atomic-routes

Conversation

@kayrus
Copy link
Copy Markdown
Contributor

@kayrus kayrus commented Feb 27, 2023

What this PR does / why we need it:

An extra improvement for the route reconciliation logic. This PR allows to use a single API call to add or remove routes instead of fetching existing routes and combining them with new or removing old routes with a further update. This change allows concurrent calls without consistency issues.

Which issue this PR fixes(if applicable):
fixes #2089
based on #2090

Special notes for reviewers:

Release note:

Implement a support for concurrent atomic routes updates.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 27, 2023
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 27, 2023
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @kayrus. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 27, 2023
@jichenjc
Copy link
Copy Markdown
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 28, 2023
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

k8s-ci-robot commented Feb 28, 2023

@kayrus: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
openstack-cloud-csi-manila-e2e-test 362b0cc link true /test openstack-cloud-csi-manila-e2e-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 28, 2023
// RouterOpts is used for Neutron routes
type RouterOpts struct {
RouterID string `gcfg:"router-id"` // required
RouterID string `gcfg:"router-id"` // required
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my opinion this required should be removed because its not really required. Like mentioned in another PR - we have been using OCCM for 7 years without using this at all.

@zetaab
Copy link
Copy Markdown
Member

zetaab commented Mar 2, 2023

please rebase this after that another PR is merged

@kayrus
Copy link
Copy Markdown
Contributor Author

kayrus commented Mar 2, 2023

This is still WIP. I'm in contact with our neutron team. I'd like to remove the manual config toggle, based on extensions API. So there should be no need for an extra option.
I'll also remove the required mark from the struct member.

@zetaab
Copy link
Copy Markdown
Member

zetaab commented Mar 2, 2023

@kayrus if you have time could you explain why this routes feature is needed? Our architecture has always been that we have router in (project) network with dhcp and router will handle the traffic automatically, no need to add routes. Could we get some benefits of using it?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 2, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 3, 2023
@kayrus
Copy link
Copy Markdown
Contributor Author

kayrus commented Mar 3, 2023

@zetaab if you have time could you explain why this routes feature is needed?

k8s has a global PODs CIDR, and each node serves its own segment, e.g. 10.16.0.0/16 will be served by multiple nodes with 10.16.1.0/24, 10.16.2.0/24, 10.16.3.0/24, etc... Routes interface tells to a cloud provider router which POD CIDR segment should be served by which node and adds static routes to a router.

Therefore VMs or Loadbalancers from the same private network can access a pod's network using static routes:

  • 10.16.1.0/24 is routed to node1
  • 10.16.2.0/24 is routed to node2
  • etc...

This approach is used in GCP, AWS, Azure and other providers as well. Basically it adds an ability for loadbalancers to access PODs network directly avoiding nodePorts.

Initially there was no atomic routes update, and routes were updated with the following logic: get all routes from a router, append a new route to the list of existing routes, update the router. And this logic worked in concurrent manner, thus there were a lot of inconsistent updates, which overwrote routes, submitted by parallel threads.

With an atomic update it should be possible to update routes in concurrent manner and reduce an amount of API calls. Which should significantly reduce the already improved route update logic.

Currently I'm trying to figure out whether it's possible to get extraroute-atomic extension info from the neutron API to avoid config option toggle, therefore the PR is still WIP.

@kayrus
Copy link
Copy Markdown
Contributor Author

kayrus commented Mar 3, 2023

@zetaab I updated my comment above. See the bold text.


* `router-id`
Required. Specifies the Neutron router ID to manage Kubernetes cluster routes.
* `extraroute-atomic`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in the future we can check through neutron ext-list to detect whether it's enabled or not ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what I'm investigating right now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. my tests show overall routes reconciliation latency improvements from 2-5 minutes down to 10 seconds for a cluster of 16 nodes.

ready for review.

@kayrus kayrus changed the title occm: implement an optional atomic routes update occm: implement a support for atomic routes update Mar 7, 2023
@kayrus kayrus marked this pull request as ready for review March 7, 2023 14:06
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 7, 2023
@kayrus
Copy link
Copy Markdown
Contributor Author

kayrus commented Mar 7, 2023

@zetaab @jichenjc ready for review

@kayrus
Copy link
Copy Markdown
Contributor Author

kayrus commented Mar 9, 2023

@zetaab @jichenjc kindly ping
see also #2150 and #2152.

k8s-ci-robot pushed a commit that referenced this pull request Mar 9, 2023
#2134) (#2150)

* occm: implement an optional atomic routes update

* Fix IPv6 detection logic in routes
k8s-ci-robot pushed a commit that referenced this pull request Mar 9, 2023
#2134) (#2152)

* occm: implement an optional atomic routes update

* Fix IPv6 detection logic in routes
Copy link
Copy Markdown
Member

@zetaab zetaab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2023
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zetaab

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[oocm] generates an excessive number of os-interface api requests

4 participants