Commit b855b25
node/manager: synthesize node deletion events
When the cilium agent is down (due to a crash or an upgrade), it can
miss node events. Upon startup, live nodes are upserted, but when
deletions are missed, the agent fails to clean up node-related system
state. Examples of such state includes bpf map entries, xfrm states or
routes. In particular, the agent fails to clean up node IP to nodeID
mappings in the nodeid bpf map. Since K8s will happily recycle such IPs,
this can lead to breakage, as the agent associate the wrong nodeID with
IPs.
To avoid leaking this state, the node manager now dumps its view of the
current set of nodes to a file in the runtime state directory, which can
be read on restart of an agent. This is similar to how we restore other
state upon restart.
When reading this file, it's important to avoid resurrecting long-gone
nodes (as we don't know for how long the agent was down) - instead, we
merely take note of which nodes we knew of in the past, compare that to
the nodes we consider live (once synced to k8s), and delete the ones
which seem to have disappeared.
The motivation to build this reconciliation based on full state dumps to
disk is that downstream code generally assumes to have access to a full
node object in the deletion callbacks. This makes is infeasible to base
the pruning on just the information available in bpf maps. In an
alternative design, downstream subsystems are responsible for cleaning
up their own state based on just a node identifier, but current code
doesn't allow for this.
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>1 parent 545fbc8 commit b855b25
7 files changed
Lines changed: 405 additions & 51 deletions
File tree
- pkg
- clustermesh
- datapath/linux
- node/manager
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | | - | |
| 39 | + | |
| 40 | + | |
40 | 41 | | |
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| |||
1086 | 1087 | | |
1087 | 1088 | | |
1088 | 1089 | | |
1089 | | - | |
| 1090 | + | |
1090 | 1091 | | |
1091 | 1092 | | |
| 1093 | + | |
| 1094 | + | |
| 1095 | + | |
| 1096 | + | |
1092 | 1097 | | |
1093 | 1098 | | |
1094 | 1099 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| 62 | + | |
| 63 | + | |
62 | 64 | | |
63 | 65 | | |
64 | 66 | | |
| |||
0 commit comments