Description
When wait_for_completion=True in EmrCreateJobFlowOperator, the operator code does not actually wait for the cluster to complete successfully before returning a success state. Instead, a success state is returned as soon as the cluster starts running. This can result in the task succeeding even if the cluster is terminated with errors after it begins running.
I believe this is due to this line of code that assigns the "WAIT_FOR_COMPLETION" WaitPolicy to the waiter. This corresponds to the "job_for_waiting" wait policy with which the waiter will only wait for the cluster to start running before returning a success state.
If the user wants the waiter to wait until the cluster completes, WAIT_FOR_STEPS_COMPLETION corresponding to the "job_flow_terminated" wait policy must be used. The operator has hard coded the "job_for_waiting" wait policy, so the user cannot configure the wait policy.
I propose adding a wait_policy parameter to the operator which allows the user to specify which wait policy they would prefer to use.
Use case/motivation
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
Description
When
wait_for_completion=Truein EmrCreateJobFlowOperator, the operator code does not actually wait for the cluster to complete successfully before returning a success state. Instead, a success state is returned as soon as the cluster starts running. This can result in the task succeeding even if the cluster is terminated with errors after it begins running.I believe this is due to this line of code that assigns the "WAIT_FOR_COMPLETION" WaitPolicy to the waiter. This corresponds to the "job_for_waiting" wait policy with which the waiter will only wait for the cluster to start running before returning a success state.
If the user wants the waiter to wait until the cluster completes, WAIT_FOR_STEPS_COMPLETION corresponding to the "job_flow_terminated" wait policy must be used. The operator has hard coded the "job_for_waiting" wait policy, so the user cannot configure the wait policy.
I propose adding a wait_policy parameter to the operator which allows the user to specify which wait policy they would prefer to use.
Use case/motivation
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct