Skip to content

Conversation

@hussein-awala
Copy link
Member

@hussein-awala hussein-awala commented Apr 18, 2023

Add a new argument on_finish_action for KubernetesPodOperator to determine how to deal with the finished pods. There are 3 possible values:

  • delete_pod: delete the pod regardless its state
  • delete_succeeded_pod: delete the pod if it terminate with Succeeded state
  • keep_pod: keep the pod and don't delete it

@boring-cyborg boring-cyborg bot added provider:cncf-kubernetes Kubernetes (k8s) provider related issues area:providers labels Apr 18, 2023
@hussein-awala hussein-awala added the type:new-feature Changelog: New Features label Apr 18, 2023
:param is_delete_operator_pod: What to do when the pod reaches its final
state, or the execution is interrupted. If True (default), delete the
pod; if False, leave the pod.
:param delete_when_fails: If False, and is_delete_operator_pod is set to True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit complicated.
I think we should consider replace is_delete_operator with:
action_on_finish that supports several modes. For example: delete_pod, delete_pod_success

That way you dont need to work hard to understand the combinationa of flags, the mode is self explanatory.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was doing something similar to #29394, but I hesitated to deprecate is_delete_operator_pod, but yes, these flags may confuse users. I'll implement the argument you propose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jedcunningham WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Agree with @eladkal simple flag is better. I made similar "keep old flag" approaches in the past and it led to really bad complexity/

@eladkal eladkal removed the type:new-feature Changelog: New Features label Apr 19, 2023
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. Needs conflict resolving though. @jedcunningham @dstandish ?

Copy link
Member

@jedcunningham jedcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System test examples need updating, particularly as they get sucked into the docs.

Are you planning to change the operators based on KPO in a followup PR? Just want to make sure it isn't missed. For example, the eks pod operator.

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
@hussein-awala
Copy link
Member Author

Are you planning to change the operators based on KPO in a followup PR? Just want to make sure it isn't missed. For example, the eks pod operator.

@jedcunningham yes, in this PR, I have ensured that it is fully backward compatible. This means that users can update the cncf.kubernetes provider version and still use it with older versions of Amazon or Google providers:

self.on_finish_action = (
OnFinishAction.DELETE_POD if is_delete_operator_pod else OnFinishAction.KEEP_POD
)
self.is_delete_operator_pod = is_delete_operator_pod
else:
self.on_finish_action = OnFinishAction(on_finish_action)
self.is_delete_operator_pod = self.on_finish_action == OnFinishAction.DELETE_POD

Once this PR is merged, I'll create a separate PR for each provider to deprecate the argument is_delete_operator_pod and use the new one.

@eladkal
Copy link
Contributor

eladkal commented Apr 23, 2023

Once this PR is merged, I'll create a separate PR for each provider to deprecate the argument is_delete_operator_pod and use the new one.

We can do it in this PR if you prefer.. no reason to keep it in seperated PRs unless that is how you prefer.

@hussein-awala
Copy link
Member Author

@eladkal we have this message added 15 months ago in the version 3.0.0 of Amazon provider, and we are releasing the version 8.0.0 without doing anything

:param is_delete_operator_pod: What to do when the pod reaches its final
state, or the execution is interrupted. If True, delete the
pod; if False, leave the pod. Current default is False, but this will be
changed in the next major release of this provider.

Also we have a warning on the None is_delete_operator_pod, what is the policy for the providers deprecation?

@hussein-awala
Copy link
Member Author

hussein-awala commented Apr 23, 2023

Same warning added to Google provider in 6.4.0, and we are releasing 10.0.0

@potiuk
Copy link
Member

potiuk commented Apr 23, 2023

Also we have a warning on the None is_delete_operator_pod, what is the policy for the providers deprecation?

We don't have provider deprecation policy - we remove deprecated things when we see that they woudl be helpful in cleaning up the code.

@potiuk
Copy link
Member

potiuk commented Apr 23, 2023

BTW. @eladkal is big proponent of removal of all deprecations when we release a breaking version. I am a bit more cautiouss, but I have no problem with that.

@hussein-awala
Copy link
Member Author

IMHO we can merge this PR without updating the operators GKEStartPodOperator and EksPodOperator, and I can create two PRs which do the same thing for these two operators and change the default value for pod_deletion to be consistent with KPO, and we can add them to the next major release of each provider with a breaking change note. WDYT?

@eladkal
Copy link
Contributor

eladkal commented Apr 23, 2023

IMHO we can merge this PR without updating the operators GKEStartPodOperator and EksPodOperator, and I can create two PRs which do the same thing for these two operators and change the default value for pod_deletion to be consistent with KPO, and we can add them to the next major release of each provider with a breaking change note. WDYT?

It's OK to update multiple providers in one PR.

I see no reason to break backward compatibility in any provider.
This PR add new parameter and deprecate the old one.
We should do the same for the other providers that inherit from KPO.

@hussein-awala
Copy link
Member Author

Okay, I'll add them.

@hussein-awala hussein-awala requested a review from o-nikolas as a code owner May 21, 2023 14:13
@hussein-awala
Copy link
Member Author

The issue with these operators is that the default is_delete_operator_pod is not aligned with the KPO. Therefore, we need to handle this parameter individually for each subclass operator.

Using on_finish_action as an argument for the super function won't work with the older version of the KPO. Additionally, providing is_delete_operator_pod will result in a deprecation warning being logged, even if the user provides on_finish_action for the GKE/EKS operator.
IMO, the best solution is to check the KPO version until modifying the default value of is_delete_operator_pod for the GKE/EKS operator, which, btw, is already deprecated.

@kand617
Copy link

kand617 commented May 31, 2023

Any update on this?
Our usecase is to keep pods that have failed and delete ones that completed successfully.
Can we opt for a simple solution? A new param perhaps?
is_delete_operator_pod_failed is_delete_operator_pod_success

if is_delete_operator_pod=True without extra params => delete pod always.
if either new params are defined, then ignore is_delete_operator_pod?

In some cases, Its unclear why pod has been removed. Cant tell from the output

@dstandish
Copy link
Contributor

i am late to the party here. that said, what about the following...

keep is_delete_pod_operator and let it be bool | OnFinishAction (or some other better name such as DeletePodCondition)

then it maintains a little more continuity and the name stays more tailored to what is actually being controlled, and we don't have to add any more deprecations. then it could be is_delete_operator_pod=DeletePodCondition.TaskFailure

one thing that pops out for me re the name on_finish_action... it kind of sounds like it's a more elaborate "callback" feature.

anyway, just a suggestion

@eladkal
Copy link
Contributor

eladkal commented Jun 2, 2023

Using on_finish_action as an argument for the super function won't work with the older version of the KPO. Additionally, providing is_delete_operator_pod will result in a deprecation warning being logged, even if the user provides on_finish_action for the GKE/EKS operator.

Is this a concern only for google provider?
currently google provider depended on

- name: cncf.kubernetes
dependencies:
- apache-airflow-providers-cncf-kubernetes>=6.2.0

Should we consider bumping the dependency apache-airflow-providers-cncf-kubernetes>7.0.0 ?

@hussein-awala
Copy link
Member Author

Should we consider bumping the dependency apache-airflow-providers-cncf-kubernetes>7.0.0 ?

@eladkal +1, bu we should add the same dependency to AWS provider, is that possible?

- aiobotocore[boto3]>=2.2.0
- name: cncf.kubernetes
dependencies:
- apache-airflow-providers-cncf-kubernetes>=8.0.0
Copy link
Contributor

@eladkal eladkal Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think 8.0.0 is right?
There is no such version for cncf.kubernetes

I believe it should be:
apache-airflow-providers-cncf-kubernetes>7.0.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think >=8.0.0 is fine, and we have to release those three packages together. There should be 8.0.0 release of cncf.kubernetes coming together with releasing of those two providers (the changes in cncf.kubernetes it brings are breaking, so that's expected to be bumped to 8.0.0.

Comment: The "additional" dependencies are pretty special because they are mostly "informatory". Almost no-one will install apache-airflow-providers-amazon[cncf.kubernetes]==8.3.0 - what is mainly the purpose of those is to be document the dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w8.. I'm missing something here.
In this PR we add a feature to apache-airflow-providers-cncf-kubernetes that feature is what needed for amazon and google provider so what we are after is setting dependency for the next version of apache-airflow-providers-cncf-kubernetes thus the condition should be:

apache-airflow-providers-cncf-kubernetes>=7.2.0

Currently stable version of cncf.kubernetes is 7.1.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR we add a feature to apache-airflow-providers-cncf-kubernetes that feature is what needed for amazon and google provider so what we are after is setting dependency for the next version of apache-airflow-providers-cncf-kubernetes thus the condition should be:

Ah ok. If that's not a breaking change, then yes. I have not looked in detail enough and badly classified upcoming cncf.kubernetes version as MAJOR upgrade.

This is really the result of our release process and why in the past I was leaving all the version upgrades to the release manager "documentation preparation" - because literally we do not know if the next version will be MINOR or MAJOR upgrade until we merge last change. I think it's next to impossible to change it unless we would have a much more active RM role upfront (RM would have to make versioning decisions upfront and for example hold breaking changes from being merged if the next release is non-breaking, but I feel that is quite impossible with our volunteer-driven organisation).

Ideally we should have in this case >= NEXT_CNCF_KUBERNETES_PROVIDER_VERSION that could be automatically changed when the release documentation is prepared - because this is what we have here, really. But it's yet another automation complexity and feels like a bit of an edge-case that can be handled manually.

And in edge-cases like that it's hard to have a process to follow for every single case like that, so it's case-by-case basis.

Side note. Assuming that we do not know if the next release is MINOR or MAJOR we could add >=MINOR_UPGRADE always. It's a bit limiting and slightly confusing for the users if we do MAJOR release instead, but technically it would work.

For example in this case we could agree to have >=7.2.0 here independently if the next version is MAJOR or MINOR. Even if we release 8.0.0, that would still technicaly work (8.0.0 >= 7.2.0), but there would be no 7.2.0 release at all., However (except confusing the users) there is also a side effect that it would not alow us (we don't do it anyway but techniclaly there is such a possiblity with mixed-governance model) to release 7.2.0 after the 8.0.0. Technically (and purely theoretically because we've never done so) we could have it and then the >=7.2.0 would be wrong if the features needed will not be cherry-picked to the v7 branch of the provider.

But I think it's better to get it right even if it requires a bit of discussion and "humans-in-the-loop" agreeing on details of such versioning question (as we do now).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note. Assuming that we do not know if the next release is MINOR or MAJOR we could add >=MINOR_UPGRADE always. It's a bit limiting and slightly confusing for the users if we do MAJOR release instead, but technically it would work.

I like this idea, I added 8.0.0 because I didn't know if there is any breaking changes on the other PRs, but yes 7.2.0 will work for both minor and major releases 👍

@potiuk
Copy link
Member

potiuk commented Jun 29, 2023

>=7.2.0 @hussein-awala ?

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
@eladkal eladkal changed the title Add a new arg for KPO to only delete the pod when it doesn't fail Add on_finish_action to KubernetesPodOperator Jun 30, 2023
@eladkal eladkal merged commit dd937e5 into apache:main Jun 30, 2023
@eladkal
Copy link
Contributor

eladkal commented Jun 30, 2023

Great work @hussein-awala !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants