feat: parallelize nodes update#1133
Conversation
✅ Deploy Preview for kubernetes-sigs-nfd ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
marquiz
left a comment
There was a problem hiding this comment.
Thanks @AhmedGrati for the PR.
I think we need to throttle the max parallelism (i.e. how many concurrent go routines are generated) somehow. Say we have a cluster with 2000 nodes we don't probably want the fire 2000 goroutines at the same time
/milestone v0.14.0
|
Can we encorporate workqueue here? Have a flag than configures the number of consumers(which will update specific node) and we will just We can use https://pkg.go.dev/k8s.io/client-go/util/workqueue for example. |
Yeah, for example something like this |
|
Hey @PiotrProkop, I've just come up with this design to include workqueue, and to have a goroutines pool responsible for labeling nodes.
|
|
ping @marquiz @PiotrProkop |
|
I think decoupling labeling into separate On the other hand I know that implementing it is non trivial and I think @marquiz WDYT? |
I agree with @PiotrProkop here. Using workqueues would be more future proof and architecturally cleaner for the long run. However, as @PiotrProkop said it's not trivial and I think lot more work (including tests) than the simple-and-stupid SizedWaitGroup. Using SizedWaitGroup should be safe and simple (currently) as there is only one place (and goroutine) from where we do labeling. Thus, I'm fine with either approach. You can choose 😉 |
|
Thanks for your input @marquiz @PiotrProkop, I'll implement it using |
de0c0cf to
d908a70
Compare
dca45f6 to
64a284e
Compare
bdea8c1 to
2906059
Compare
5f4dc3f to
9bf7037
Compare
9bf7037 to
3a1953d
Compare
|
@AhmedGrati you might want to rebase once more to run the logcheck (for structured logging) |
3a1953d to
b62a63c
Compare
| var lock = &sync.Mutex{} | ||
| var instance *nodeUpdaterPool | ||
|
|
||
| func getNodeUpdaterPool(nfdMaster nfdMaster) *nodeUpdaterPool { |
There was a problem hiding this comment.
This singleton approach looks a bit weird as you give nfdMaster as an argument but that argument is totally omitted in subsequent calls.
How about changing this to
| func getNodeUpdaterPool(nfdMaster nfdMaster) *nodeUpdaterPool { | |
| func newNodeUpdaterPool(nfdMaster nfdMaster) *nodeUpdaterPool |
and then calling this in NewNfdMaster and storing the instance in the nfdMaster struct?
There was a problem hiding this comment.
I agree, It would be much more readable, since the nodeUpdaterPool is only used from within nfdMaster.
There was a problem hiding this comment.
Wouldn't that result in a circular dependency?
There was a problem hiding this comment.
nfdMaster contains nodeUpdaterPool and nodeUpdaterPool contains nfdMaster.
There was a problem hiding this comment.
Think about it like doubly-linked list
|
|
||
| defer queue.Done(nodeName) | ||
|
|
||
| if err := u.nfdMaster.nfdAPIUpdateOneNode(nodeName.(string)); err != nil { |
There was a problem hiding this comment.
and if this is the only place where nfdAPIUpdateOneNode is called you can just remove this func from nfdMaster and make it nodeUpdaterPoll func, this way you don't need to pass nfdMaster to nodeUpdaterPool but prolly just Kubernetes client.
There was a problem hiding this comment.
That might not be that easy as nfdAPIUpdateOneNode() is very tightly coupled with nfdMaster, dependent on many fields/parameters of nfdMaster down in the call chain. The architecture could be refactored but I think that will probably be a bigger effort and better be a separate PR
There was a problem hiding this comment.
Ok, we should create issue to track this work.
|
/unhold |
|
@marquiz any idea why is this failing? |
|
/retest-required |
|
@AhmedGrati also one with: could you clean up the commit message 😇 |
This PR aims to optimize the process of updating nodes with corresponding features. In fact, previously, we were updating nodes sequentially even though they are independent from each other. Therefore, we integrated new components: LabelersNodePool which is responsible for spininng a goroutine whenever there's a request for updating nodes, and a Workqueue which is responsible for holding nodes names that should be updated. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
marquiz
left a comment
There was a problem hiding this comment.
Good job and thanks for the persistence on this @AhmedGrati. I don't think I have anything to add for now -> approved
@PiotrProkop could you take look, with fresh eyes?
marquiz
left a comment
There was a problem hiding this comment.
Good job and thanks for the persistence on this @AhmedGrati. I don't think I have anything to add for now -> approved
@PiotrProkop could you take look, with fresh eyes?
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AhmedGrati, marquiz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AhmedGrati, marquiz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #1133 +/- ##
==========================================
+ Coverage 28.72% 29.22% +0.49%
==========================================
Files 51 52 +1
Lines 7139 7219 +80
==========================================
+ Hits 2051 2110 +59
- Misses 4871 4883 +12
- Partials 217 226 +9
|
|
|
||
| defer queue.Done(nodeName) | ||
|
|
||
| if err := u.nfdMaster.nfdAPIUpdateOneNode(nodeName.(string)); err != nil { |
There was a problem hiding this comment.
Ok, we should create issue to track this work.
|
LGTM label has been added. DetailsGit tree hash: 046a11ad55b9f94015cfdd23dff74551e04792d3 |

Resolves #1096.