[GarbageCollector] Adding a proposal for server-side cascading deletion#23656
Conversation
|
ref #1535 |
docs/proposals/cascading-deletion.md
Outdated
There was a problem hiding this comment.
Delete is a verb, so it's ok to use a verb here: OrphanChildren
There was a problem hiding this comment.
Do we have a clear parent child relationship defined somewhere?
Do we need something like controllerRef to materalize the same basic thing we have today for ObjectMeta.Namespace so the relationship is explicit?
|
I'm in favor of finalizers. I find tombstones scary for a number of reasons (e.g., privacy, extensibility, scalability, failure modes due to lack of atomicity, change in semantics vs. what kubectl delete --cascade currently provides), they likely would conflict with my plan in the future to use all resources as their own tombstones, and I want finalizers for a number of other use cases. |
docs/proposals/cascading-deletion.md
Outdated
There was a problem hiding this comment.
I would like to find a way to use one of the existing deletion fields (e.g., DeletionTimestamp)
There was a problem hiding this comment.
In the draft I wrote "we can reuse ObjectMeta.DeletionTimestamp if the DeletionTimestamp does not require the resource to be deleted after the time stamp is reached".
On second thought, this shouldn't prevent us from reusing the DeletionTimestamp, because no matter whether we reuse it, finalizers will break this on-time deletion promise made by the DeletionTimestamp.
cc @smarterclayton.
There was a problem hiding this comment.
DeletionTimestamp changes when you call delete multiple times for a resource that is undergoing graceful termination... so I think you want a different field/concept.
There was a problem hiding this comment.
Technically DeletionTimestamp is not a promise. Because we don't assume global time in the cluster, DeletionTimestamp is a best effort record of the anticipated deletion of the resource. No component in the cluster should be using DeletionTimestamp as a clock comparison (they should be using their own clock and incrementing it by GracefulDeletionPeriodSeconds in their own timeline).
Once DeletionTimestamp is set, it can never be unset (by API contract).
There was a problem hiding this comment.
We only need to check whether DeletionTimesstamp is non-nil, its value doesn't matter.
There was a problem hiding this comment.
It's value does matter for pods, right?
There was a problem hiding this comment.
@smarterclayton thanks for the clarification. The comment in types.go (
kubernetes/pkg/api/v1/types.go
Line 144 in b60ef6f
@derekwaynecarr, I mean its value doesn't matter to the finalizers, because finalizers only check if DeletionTimestamp is nil. And finalizers should only update objects, which won't modify the DeletionGracePeriod or DeletionTimestamp (see https://github.com/kubernetes/kubernetes/blob/master/pkg/api/validation/validation.go#L353), so introducing finalizers won't affect the graceful termination. Did I miss something?
There was a problem hiding this comment.
@caesarxuchao - it matters for who does the final delete call to the API server. In the current namespace pattern, the finalizer controller will send a delete when it observes all finalizers have emptied. in the case for pods, the kubelet will send a delete ?gracePeriod=0 which would have expected the pod to be removed. I guess I need to think a little more about what it would mean to attach a finalizer to a pod and the kubelet interaction associated.
There was a problem hiding this comment.
The following is how I imagine it will work, feel free to point out the loopholes:
In API server's deletion handler, if gracePeriod=0 and the finalizers field is empty, then deletes the object immediately; otherwise just do an update, which will set the DeletionGracePeriod.
And when the last finalizer has done its job and sent an Update request to API server, API server will notice the finalizers are empty, it checks if gracePeriod=0, if so, API server deletes it; otherwise API server just updates the finalizers field to empty, some other parties will send a delete with graceperiod=0 later.
A pod may exist for a while after kubelet sends a deletion request with gracePeriod=0, and kubelet will continue to receive update event about the pod as its finalizers get removed, so kubelet will send more deletion requests with gracePeriod=0, but I think it's harmless.
|
I have a feeling this would have a significant impact on the speed of our e2e runs, and the number of flakes we may or may not encounter unless we restructure them with the idea that orphans are OK. The namespace deletion flake is a flake that never dies. /cc @kubernetes/rh-cluster-infra - this potentially extends the namespace finalizer concept to all of the other resource types. |
|
I need more time to reason on this, and would like to understand if the gc or finalizer controller is bundled with the controller-manager. If so, it feels like it would give a strong argument to having shared informers i.e. #23575 |
|
If we extend the finalizer pattern, I think its imperative that we have a |
docs/proposals/cascading-deletion.md
Outdated
There was a problem hiding this comment.
If we reuse the DeletionTimestamp field, here API server also needs to check if DeletionGracePeriod is 0, if so, immediately deletes the object, otherwise just carries out the update.
docs/proposals/cascading-deletion.md
Outdated
There was a problem hiding this comment.
"Parent" doesn't have any obvious, standard meaning. OwnerReferences may be more familiar, similar to the usual concept of object/memory ownership.
We'd need to think of another term for "Children", however.
Also, we can't change the default behavior for existing APIs/versions, so cascading deletion needs to not happen by default through the API, or, equivalently, orphaning needs to occur by default.
There was a problem hiding this comment.
We'd need to think of another term for "Children", however.
How about "Dependent"? It sounds like a good match to "owner".
There was a problem hiding this comment.
I'll update the terminology after we decide the winner of the two designs. The new terminology is already used in the PR that adds the API #23928.
There was a problem hiding this comment.
The current Namespace finalizer is similar to the dependent object, but in this case the dependency is a cluster provider of some kind (Kubernetes, OpenShift). If we go this route for dependencies as a object reference, we need to let non-namespaced resources depend on something akin to a API provider that must be registered with the cluster. I also could see the same value in Node(s)
There was a problem hiding this comment.
I guess the general question is if cluster-scoped resources can have dependents and to what things.
There was a problem hiding this comment.
I'm not sure I follow, can you give an example?
On Tue, Apr 12, 2016 at 8:30 PM, Derek Carr notifications@github.com
wrote:
In docs/proposals/cascading-deletion.md
#23656 (comment)
:+type DeleteOptions struct {
- …
- OrphanChildren bool
+}
+`
+DeleteOptions.OrphanChildren: allows a user to express whether the child objects should be orphaned.
+
+`
+type ObjectMeta struct {
- ...
- ParentReferences []ObjectReference
+}
+```
+ObjectMeta.ParentReferences: links the resource to the parent resources. For example, a replica set
Rcreated by a deploymentDshould have an entry in ObjectMeta.ParentReferences pointing toD. The link should be set when the child object is created. It can be updated after the creation.The current Namespace finalizer is similar to the dependent object, but in
this case the dependency is a cluster provider of some kind (Kubernetes,
OpenShift). If we go this route for dependencies as a object reference, we
need to let non-namespaced resources depend on something akin to a API
provider that must be registered with the cluster. I also could see the
same value in Node(s)—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/23656/files/cc943fae771b704370a7ddca1d6267b6786fd807#r59488717
docs/proposals/garbage-collection.md
Outdated
There was a problem hiding this comment.
Explicitly listed "propagating the grace period" as non goal of this proposal.
5064696 to
4044e33
Compare
docs/proposals/garbage-collection.md
Outdated
There was a problem hiding this comment.
Moved "propagating grace period" to Open Qeustions and noted down the tentative solution.
|
LGTM! @smarterclayton @liggitt any last comments before I apply the label? |
|
GCE e2e build/test passed for commit 999677a76c85b75642683b26ea6eab490034af3a. |
|
GCE e2e build/test passed for commit 4044e334dbc38e897092feef132f763a840fb4ef. |
|
None from me.
|
4044e33 to
91cb08f
Compare
|
GCE e2e build/test passed for commit 91cb08f9dc76724d1256b5ba9160658a9f911f13. |
91cb08f to
b3d7297
Compare
|
GCE e2e build/test failed for commit b3d7297. Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake. |
|
Automatic merge from submit-queue |
…anges Automatic merge from submit-queue API changes for Cascading deletion This PR includes the necessary API changes to implement cascading deletion with finalizers as proposed is in #23656. Comments are welcome. @lavalamp @derekwaynecarr @bgrant0607 @rata @hongchaodeng
|
Is there a PR for this anywhere I can follow? |
|
@yuvipanda here's a list of all the PRs so far: https://github.com/kubernetes/kubernetes/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Amerged+garbage+collector+author%3Acaesarxuchao |
|
@caesarxuchao awesome! Do you think all of this will land for 1.3? |
|
All the listed PRs are merged. GC will be alpha (disabled by default) for 1.3. We'll push for beta in 1.4. |
|
Awesome! \o/ (I'll be happy to turn it on in our install once 1.3 releases) On Mon, Jun 6, 2016 at 9:52 PM, Chao Xu notifications@github.com wrote:
Yuvi Panda T |
|
Would you provide an update on the status for the documentation for this feature as well as add any PRs as they are created? Not Started / In Progress / In Review / Done Thanks |
|
@pwittrock I sent a PR to add a user doc. |
@lavalamp @bgrant0607
cc @kubernetes/sig-api-machinery @derekwaynecarr
ref #12143 #19054 #17305 (initializer proposal)
This change is