Conversation
The allocator starts a watch on events and never shuts it down. If this node loses leadership, events will continue to pile up in that queue forever, leading to memory exhaustion. Fix this by giving the watch a well-defined lifecycle. Return it along with a cancel function from an init() method to make it clear it must be cancelled, instead of sticking the cancel function value in a struct where there is no clear responsibilty for calling it. In the future, we might consider changing Watch to run a callback function, so shutting down the watch is mandatory (similar to what we do with store transactions). Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Codecov Report
@@ Coverage Diff @@
## master #2215 +/- ##
==========================================
+ Coverage 60.03% 60.05% +0.01%
==========================================
Files 124 124
Lines 20115 20122 +7
==========================================
+ Hits 12077 12085 +8
+ Misses 6680 6674 -6
- Partials 1358 1363 +5 |
|
👍 great catch @briantd! Thanks for working on a patch for it @aaronlehmann! |
|
LGTM. |
|
LGTM, with the caveat that I don't know the allocator code very well :) So please excuse the dumb question: this looks like it's creating a watch per |
|
It didn't make sense to me that the watch used to be shared between all actors. That wouldn't have worked once we had multiple actors, because they would have competed for events, so I moved the watch call to be per-actor.
|
|
Ah ok, so it's not like a pool of workers who grabs the first event, it's more of a fanout to all actors? |
|
Correct. Other actors would be handling things other than network allocation. It's a bit of a weird design and we would probably change it anyway if we introduced other kinds of allocators.
|
|
Ok, that makes sense, thanks for explaining! 👍 |
|
LGTM |
This was something I tripped over in one of my PoC attempts to refactor some of this stuff for CNI networking on top of #1965, I can confirm that it does not work well at all... |
- moby/swarmkit#2218 - moby/swarmkit#2215 - moby/swarmkit#2233 Signed-off-by: Ying <ying.li@docker.com>
- moby/swarmkit#2218 - moby/swarmkit#2215 - moby/swarmkit#2233 Signed-off-by: Ying <ying.li@docker.com>
- moby/swarmkit#2218 - moby/swarmkit#2215 - moby/swarmkit#2233 Signed-off-by: Ying <ying.li@docker.com>
- moby/swarmkit#2218 - moby/swarmkit#2215 - moby/swarmkit#2233 Signed-off-by: Ying <ying.li@docker.com>
The allocator starts a watch on events and never shuts it down. If this node loses leadership, events will continue to pile up in that queue forever, leading to memory exhaustion.
Fix this by giving the watch a well-defined lifecycle. Return it along with a cancel function from an init() method to make it clear it must be cancelled, instead of sticking the cancel function value in a struct where there is no clear responsibilty for calling it.
In the future, we might consider changing
Watchto run a callback function, so shutting down the watch is mandatory (similar to what we do with store transactions).cc @briantd @jakegdocker