Skip to content

fix: resume_glue_job_on_retry with xcom_push in GlueJobOperator#62560

Merged
potiuk merged 4 commits into
apache:mainfrom
henry3260:fix-resume-glue-job-on-retry
Mar 11, 2026
Merged

fix: resume_glue_job_on_retry with xcom_push in GlueJobOperator#62560
potiuk merged 4 commits into
apache:mainfrom
henry3260:fix-resume-glue-job-on-retry

Conversation

@henry3260

@henry3260 henry3260 commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

Why

When a Glue job run fails before xcom_push, the task retry cannot read glue_job_run_id.
This causes a new Glue job run to be started on retry, leading to duplicate runs.

What

On retry with resume_glue_job_on_retry=True, we add a stable task UUID to Glue Arguments
and, if XCom is missing, search recent runs via get_job_runs to recover the existing run.
If the recovered run is still RUNNING/STARTING, we reuse it and push its job_run_id;
otherwise we start a new run as usual.

closes: #62353

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg Bot added area:providers provider:amazon AWS/Amazon - related issues labels Feb 27, 2026
@henry3260 henry3260 force-pushed the fix-resume-glue-job-on-retry branch 3 times, most recently from d5a522c to d714235 Compare February 28, 2026 02:15
@henry3260 henry3260 force-pushed the fix-resume-glue-job-on-retry branch from d714235 to 75180c8 Compare February 28, 2026 02:32
@henry3260

Copy link
Copy Markdown
Contributor Author

cc @wilsonhooi86, I think you may be interested in this. :)

@potiuk potiuk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I would love @wilsonhooi86 to confirm it works :)

@wilsonhooi86

Copy link
Copy Markdown

Hi @henry3260 , thanks for the providing the fix.
May I know how I can test this patch?

I'm testing using MWAA environment. Usually I just run the command in MWAA requirements file:
pip install apache-airflow-providers-amazon==9.22.0rc2 but wasn't sure how to apply this patch of code in the pip install provider.

Is there a guide for me to follow? Appreciate your support.

@henry3260

Copy link
Copy Markdown
Contributor Author

Hi @henry3260 , thanks for the providing the fix. May I know how I can test this patch?

I'm testing using MWAA environment. Usually I just run the command in MWAA requirements file: pip install apache-airflow-providers-amazon==9.22.0rc2 but wasn't sure how to apply this patch of code in the pip install provider.

Is there a guide for me to follow? Appreciate your support.

Hi @wilsonhooi86 , can you pull my pr version and test your use case? I actually run the airlfow and test it, looking good for me.

Copilot AI review requested due to automatic review settings March 9, 2026 15:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Comment thread providers/amazon/src/airflow/providers/amazon/aws/operators/glue.py
Comment thread providers/amazon/src/airflow/providers/amazon/aws/operators/glue.py
Comment thread providers/amazon/tests/unit/amazon/aws/operators/test_glue.py
Comment thread providers/amazon/tests/unit/amazon/aws/operators/test_glue.py
Comment thread providers/amazon/src/airflow/providers/amazon/aws/operators/glue.py
Comment thread providers/amazon/src/airflow/providers/amazon/aws/operators/glue.py
@eladkal

eladkal commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

Hi @henry3260 , thanks for the providing the fix. May I know how I can test this patch?

I'm testing using MWAA environment. Usually I just run the command in MWAA requirements file: pip install apache-airflow-providers-amazon==9.22.0rc2 but wasn't sure how to apply this patch of code in the pip install provider.

Is there a guide for me to follow? Appreciate your support.

see https://docs.aws.amazon.com/mwaa/latest/userguide/working-dags-dependencies.html#configuring-dag-dependencies-upload
you can specify your desired python libraries for the requirements.txt

alternatively since this is only code change in 1 file, you can also just copy the new operator code into your test dag file give it a new name MyGlueOperator and then use it with your DAG to see if it solve the problem. Sometimes quick and dirty is a good choice :)

@eladkal eladkal requested a review from vincbeck March 11, 2026 08:24
@eladkal eladkal changed the title fix resume_glue_job_on_retry fix: resume_glue_job_on_retry with xcom_push in GlueJobOperator Mar 11, 2026
@potiuk potiuk merged commit 08aee44 into apache:main Mar 11, 2026
92 checks passed
dominikhei pushed a commit to dominikhei/airflow that referenced this pull request Mar 11, 2026
…apache#62560)

* GlueJobOperator: Recover job run via task UUID when XCom is missing

* add resume_glue_job_on_retry back
PascalEgn pushed a commit to PascalEgn/airflow that referenced this pull request Mar 12, 2026
…apache#62560)

* GlueJobOperator: Recover job run via task UUID when XCom is missing

* add resume_glue_job_on_retry back
Pyasma pushed a commit to Pyasma/airflow that referenced this pull request Mar 13, 2026
…apache#62560)

* GlueJobOperator: Recover job run via task UUID when XCom is missing

* add resume_glue_job_on_retry back
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GlueJobOperator - resume_glue_job_on_retry doesn't seem to work on MWAA 2.11.0

6 participants