Gang Termination#114
Merged
Merged
Conversation
* Introduced `PodCliqueConditionType` and ConditionTypeMinAvailableBreached condition type. * Refactored PodClique reconcile status and added code to add PodClique condition. * PCSG reconciler now watches for PCLQ delete and update events. * Corrected the validation for TerminationDelay. Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
* Removed PodCliqueConditionType and instead just defined constants in constants.go * Removed unused constants from pod component. * Removed unnecessary PGS lookup in pod component. * Added code in PCLQ component in PCSG to handle gang termination. * Removed utils/pcsg.go as the function defined was never used. * Refactored PCLQ status reconciliation, conditionally updating MinAvailableBreached condition. * Introduced a helper function `HasConditionChanged` Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
…odGangSetName`. Signed-off-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
Signed-off-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
This commit starts to change the deletion of PCLQs by making atomic delete calls for all PCLQs belonging to a PodGang. Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
Signed-off-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
…orks. Signed-off-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
* Refactored PodCliqueScalingGroup reconcileStatus adding functions to mutate selector and minAvailableBreached condition. * Added a bunch of helper functions. Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
Signed-off-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
* Added WIP code to handle PGS replica pod gang termination. * Added utility functions. Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
Signed-off-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
…a` in podgangset/podclique component. * Extracted PCSG replica deletion due to MinAvailable breached into `triggerDeletionOfMinAvailableBreachedPodGangs` function. Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
from this list can be deleted. * Changed the order of method invocations in Sync. All deletes are called first and then the createOrUpdate is called. Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
minAvailable breached. * Every PCLQ that is part of a PCSG now also has LabelPodCliqueScalingGroupReplicaIndex label. Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
* It might be the case that `PodClique`s might be in the `Unknown` status for the `MinAvailableBreached` condition, in which case the `PodCliqueScalingGroup` must also inherit this status for this condition. It was also observed that `PodClique`s might have empty conditions, which indicate that they have not been reconciled by the `PodClique` controller. In these cases, the `Unknown` status is set. * `triggerDeletionOfMinAvailableBreachedPCSGReplicas` returns a `bool` to indicate if a re-queue must occur to handle `PodClique`s that have breached `minAvailable`, but have not crossed termination delay. Signed-off-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
comparing it with MinAvailableBreached. * For PCLQs that are marked for termination, status update is now skipped. * Refactored PCSG computeMinAvailableBreachedCondition method. Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
true for all PCSG replicas. This responsibility is now delegated to PGS reconciler. * For PCLQs that are marked for termination their MinAvailableBreached condition is set to Unknown. Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
create. Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
* Added a log Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
…ermination. Signed-off-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces the multi-level Gang termination functionality. Gang termination can happen at the
PodCliqueScalingGrouplevel or at thePodGangSetlevel.To trigger Gang termination two configuration fields are important to understand:
PodGangSet.spec.template.cliques[<index>].spec.minAvailable- this is the minimum number ofreadypods that must exist at any given time. If this number is breached then the PodClique is a candidate for termination.PodCliqueScalingGroupminAvailableis currently hard coded to 1 and is not configurable. In future we will consider making this configurable via the API.PodGangSet.Spec.Template.TerminationDelay- A higher grouping resource likePodCliqueScalingGrouporPodGangSetobserves theMinAvailablebreached conditions on its constituents and waits forTerminationDelayduration. If the duration is crossed andMinAvailableis still breached then responsible reconciler will trigger termination of a gang represented either by a replica ofPodCliqueScalingGrouporPodGangSet.This PR introduces a new condition (
MinAvailableBreached) that will be set on bothPodCliqueandPodCliqueScalingGroup.PodCliqueScalingGroupfor every replica (except the 1st replica) will monitor the constituentPodClique'sMinAvailableBreachedcondition. If it reports true then afterTerminationDelayit will delete + create all thePodClique's for thatPodCliqueScalingGroupreplica. This will in turn recreate all the pods across all constituentPodCliques.PodGangSetfor every replica monitors the following:MinAvailableBreachedcondition atPodCliqueScalingGrouplevel. If any one ofPodCliqueScalingGroupreports breach ofMinAvailablehard coded to 1, then it will delete + recreate allPodCliquesfor thePodGangSetreplica after waiting forTerminationDelayduration.MinAvailabeBreachedforPodCliquesthat do not belong to anyPodCliqueScalingGroup. If any one of thesePodCliquesreport that itsMinAvailablehas been breached, then after waiting forTerminationDelayduration.Current limitation:
controller-runtimeuses cached clients. The issue with these clients are that if there are a burst of events that are enqueued for a reconciler, then it can result in incorrect computation of num resources to create or delete. We see this forPodCliquereconciler. As a consequence for a very short duration postGang terminationwe see some additional Pods created which are subsequently removed after a few seconds as well. K8s controllers likereplica-setcontroller also suffers from this problem and they solve it via controller expectations. However there are issues in that approach as well. We plan to improve on that design and introduce our own version of controller expectations in a later PR.