Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.7", GitCommit:"6f482974b76db3f1e0f5d24605a9d1d38fad9a2b", GitTreeState:"clean", BuildDate:"2019-03-25T02:52:13Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.8", GitCommit:"4e209c9383fa00631d124c8adcc011d617339b3c", GitTreeState:"clean", BuildDate:"2019-02-28T18:40:05Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Bug report
General Information
Linux ip-10-71-33-200 4.19.0-0.bpo.2-amd64 #1 SMP Debian 4.19.16-1~bpo9+1 (2019-02-07) x86_64 x86_64 x86_64 GNU/Linux)How to reproduce the issue
We see here that all agents have a big spike in consumption but depending on the agent it can be quickly managed or last a lot longer. At the end we killed agents to come back more quickly to normal state and it worked. While the high CPU consumption lasted the concerned nodes dropped egress trafic to new identities (because of regeneration not done).
One suggestion of @aanm was patch Cilium to avoid GCollecting so many unused identities in a single loop.
cf. conversation on slack https://cilium.slack.com/archives/C1MATJ5U5/p1561039462124100 for some details.