fix: resume_glue_job_on_retry with xcom_push in GlueJobOperator#62560
Conversation
d5a522c to
d714235
Compare
d714235 to
75180c8
Compare
|
cc @wilsonhooi86, I think you may be interested in this. :) |
potiuk
left a comment
There was a problem hiding this comment.
Looks good but I would love @wilsonhooi86 to confirm it works :)
|
Hi @henry3260 , thanks for the providing the fix. I'm testing using MWAA environment. Usually I just run the command in MWAA requirements file: Is there a guide for me to follow? Appreciate your support. |
Hi @wilsonhooi86 , can you pull my pr version and test your use case? I actually run the airlfow and test it, looking good for me. |
see https://docs.aws.amazon.com/mwaa/latest/userguide/working-dags-dependencies.html#configuring-dag-dependencies-upload alternatively since this is only code change in 1 file, you can also just copy the new operator code into your test dag file give it a new name MyGlueOperator and then use it with your DAG to see if it solve the problem. Sometimes quick and dirty is a good choice :) |
resume_glue_job_on_retry with xcom_push in GlueJobOperator
…apache#62560) * GlueJobOperator: Recover job run via task UUID when XCom is missing * add resume_glue_job_on_retry back
…apache#62560) * GlueJobOperator: Recover job run via task UUID when XCom is missing * add resume_glue_job_on_retry back
…apache#62560) * GlueJobOperator: Recover job run via task UUID when XCom is missing * add resume_glue_job_on_retry back
Why
When a Glue job run fails before
xcom_push, the task retry cannot readglue_job_run_id.This causes a new Glue job run to be started on retry, leading to duplicate runs.
What
On retry with
resume_glue_job_on_retry=True, we add a stable task UUID to Glue Argumentsand, if XCom is missing, search recent runs via get_job_runs to recover the existing run.
If the recovered run is still
RUNNING/STARTING, we reuse it and push itsjob_run_id;otherwise we start a new run as usual.
closes: #62353
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.