Fix GC uid races and handling of conflicting ownerReferences#92743
Conversation
3235bec to
a4b11dd
Compare
|
/retest |
|
/skip |
|
/assign @jpbetz |
28a133e to
df55393
Compare
|
/retest |
|
/retest pull-kubernetes-conformance-kind-ga-only-parallel |
This comment has been minimized.
This comment has been minimized.
|
/test pull-kubernetes-conformance-kind-ga-only-parallel |
|
docs update for 1.20 at kubernetes/website#25091 |
|
curious why did we make the cross namespace reference an invalid case ? |
namespaces are intended to be independent of each other, so cross-namespace references have not been permitted in things like ownerReferences, secret/configmap volume references, etc. additionally, granting permissions to namespace |
thanks |
|
could this fix merge to 1.19.x ? |
No, this is a significant enough rewrite of the garbagecontroller that it is limited to 1.20+ For clusters on previous versions, I would recommend running https://github.com/kubernetes-sigs/kubectl-check-ownerreferences tool is available to identify invalid ownerReferences that could trigger the bug this PR is fixing so they can be eliminated. |
This PR fixes #474 Also refer: kubernetes/kubernetes#92743 (a child that is cluster-scoped with owner reference to namespaced type in namespace B), i.e. starting K8s 1.20, Kubernetes will log an event in namespace kube-system with involvedObject of bad-child indicating the error. Today, TridentProvisioner CR (namespaced) gets set as Parent to cluster-scoped dependent objects such as clusterrole, clusterrolebinding, podsecuritypolicy, CSIDriver CR, and OCP's SCC, this confuses OCP 4.5 and above and thus OCP removes the ownerRef from the cluster scoped resources. As part of the fix: 1. ownerReference field is not going to be set on the cluster-scoped objects that TridentProvisioner CR creates. 2. With this change there is no change in how TridentProvisioner CR interacts with Trident objects and autoheals them. 3. There is also no change in the Trident uninstallation as well because Operator does not look at the ownerReferences when uninstalling Trident.
Addresses issues with race conditions encountered when observing invalid ownerReferences
Fixes #65200
child in namespace A with owner reference to namespaced type in namespace B
child that is cluster-scoped with owner reference to namespaced type in namespace B
child pointing at non-preferred still-served apiVersion of parent object (e.g. rbac/v1beta1)
child pointing at no-longer-served apiVersion of still-existing parent object (e.g. extensions/v1beta1 deployment)
child pointing at no-longer-served apiVersion of no-longer-existing parent object (e.g. extensions/v1beta1 deployment)
child pointing at incorrect apiVersion/kind of still-existing parent object (e.g. core/v1 Secret with uid=123, where an apps/v1 Deployment with uid=123 exists)
Follow-ups:
add a controller/mechanism to migrate apiVersions of ownerReferences for known kubernetes types that have graduated or moved API groups (this is a pre-existing problem)- tracked in Add migration of ownerReferences that refer to deprecated/no-longer-served Kubernetes API versions #96650/kind bug
/cc @jpbetz @deads2k
/sig api-machinery