Skip to content

Host Policies can break kube if applied incorrectly. #12289

Merged
aanm merged 1 commit intocilium:masterfrom
jedsalazar:pr/caution-with-host-policies
Jul 9, 2020
Merged

Host Policies can break kube if applied incorrectly. #12289
aanm merged 1 commit intocilium:masterfrom
jedsalazar:pr/caution-with-host-policies

Conversation

@jedsalazar
Copy link
Copy Markdown
Contributor

@jedsalazar jedsalazar commented Jun 25, 2020

If a user unintentionally applies an overly aggressive host-level netpolicy it
will break access to their nodes and break Kube. Given the blast radius of a
mistake and the undocumented fix, I discussed an initial solution in the docs
with Paul Chaignon and we agreed that a good start is preventing an example
policy from loading until the user changes example values.

This has the benefit of explicitly requiring some thought from users before
applying a policy that could lead to a k8s outage.

Happy to discuss alternative ideas here but the goal is to protect operators
from breaking their environments while testing this feature.

Signed-off-by: Jed Salazar jed@isovalent.com

@jedsalazar jedsalazar requested a review from a team as a code owner June 25, 2020 15:30
@maintainer-s-little-helper

This comment has been minimized.

@jedsalazar jedsalazar added the area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. label Jun 25, 2020
Copy link
Copy Markdown
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a decent option to me.

@coveralls
Copy link
Copy Markdown

coveralls commented Jun 26, 2020

Coverage Status

Coverage increased (+0.005%) to 36.941% when pulling a8a84b0 on jedsalazar:pr/caution-with-host-policies into 7afc2a9 on cilium:master.

Comment thread examples/policies/host/lock-down-ingress.yaml Outdated
@aanm aanm added the release-note/misc This PR makes changes that have no direct user impact. label Jun 29, 2020
@aanm aanm requested a review from pchaigno June 29, 2020 12:00
Comment thread Documentation/policy/language.rst Outdated
@pchaigno
Copy link
Copy Markdown
Member

@jedsalazar By the way, could you remove my GitHub handle from the commit message if you get a chance? Not that I have anything against it, but I will then receive notifications from GitHub anytime this commit is pulled somewhere. Quentin did that once and I ended up having notifications for random Cilium forks, folks rebasing their pull requests, etc. 😄

@maintainer-s-little-helper

This comment has been minimized.

@maintainer-s-little-helper maintainer-s-little-helper Bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jun 30, 2020
@maintainer-s-little-helper

This comment has been minimized.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intending to replace the 22 port instead of 8472?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to drop port 22? I thought that was one of the few pieces that didn't completely brick the nodes in these cases where incorrect policy was applied; at least someone can still SSH into the node & unload BPF filters if necessary.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's a good idea not to remove any of the existing ports, but instead add one extra that can be removed? I'm not sure what the implications are for if we drop 8472 whether this makes it more likely that users will deploy policies that might block tunnel traffic.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might have been accidentally removed?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we replace this policy with the following?

apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
description: "Allow a minimum set of required ports on ingress of worker nodes"
metadata:
  name: "lock-down-ingress-worker-node"
spec:
  nodeSelector:
    matchLabels:
      type: worker
  ingress:
  - fromEntities:
    - world
  - toPorts:
    - ports:
      - port: "6443"
        protocol: TCP
      - port: "2379"
        protocol: TCP
      - port: "4240"
        protocol: TCP
      - port: "8472"
        protocol: UDP
      - port: "PLEASE_ADD_PORTS_RELEVANT_TO_YOUR_ENV"
        protocol: ANY

That would ensure external access is never broken (unless bug) while forcing the user to take a look and add relevant ports.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what I've been trying to say since the beginning 😢. However, the fromEntities does not seem correct in this case. The nodes won't be able to connect to kube-apiserver since the policy is only allowing world and nodes are not world

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for my misunderstanding @aanm. I think that's a great approach. for the fromEntities, do you think we should replace that with remote-node or cluster?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for my misunderstanding @aanm. I think that's a great approach. for the fromEntities, do you think we should replace that with remote-node or cluster?

@jedsalazar I think it shouldn't be either. mostly because new nodes won't have an identity assigned, and therefore none of the entity will work (well world will probably work but I believe that's not our goal)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might have been accidentally removed?

@maintainer-s-little-helper

This comment has been minimized.

4 similar comments
@maintainer-s-little-helper

This comment has been minimized.

@maintainer-s-little-helper

This comment has been minimized.

@maintainer-s-little-helper

This comment has been minimized.

@maintainer-s-little-helper

This comment has been minimized.

Comment thread Documentation/policy/language.rst Outdated
@maintainer-s-little-helper

This comment has been minimized.

Comment thread Documentation/policy/language.rst Outdated
@maintainer-s-little-helper

This comment has been minimized.

@jedsalazar jedsalazar force-pushed the pr/caution-with-host-policies branch from b7f1331 to cb2d41e Compare July 6, 2020 19:34
@maintainer-s-little-helper maintainer-s-little-helper Bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jul 6, 2020
@jedsalazar jedsalazar force-pushed the pr/caution-with-host-policies branch from cb2d41e to badfb26 Compare July 6, 2020 19:39
Comment thread Documentation/policy/language.rst Outdated
Host Policies can break kube if applied incorrectly. If a user unintentionally
applies an overly aggressive host-level netpolicy it will break access to their
nodes and break kube. Given the blast radius of a mistake and the undocumented
fix, I discussed an initial solution in the docs with @pchaigno and we agreed
that a good start is preventing an exmaple policy from loading until the user
changes example values.

This has the benefit of explicitly requiring some thought from users before
applying a policy that could lead to a k8s outage.

Happy to discuss alternative ideas here but the goal is to protect operators
from breaking their environments while testing this feature.

Signed-off-by: Jed Salazar <jed@isovalent.com>
@jedsalazar jedsalazar force-pushed the pr/caution-with-host-policies branch from fc31435 to a8a84b0 Compare July 7, 2020 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants