Revert "Apply AIR201-style replacements" by VladaZakharova · Pull Request #66712 · apache/airflow

VladaZakharova · 2026-05-11T13:13:20Z

This PR is being reverted because task.output["key"] is not always equivalent to {{ task_instance.xcom_pull("task_id")["key"] }}.

The old form pulls the task’s default return_value XCom and indexes the returned dictionary. The new form may instead look for an XCom stored under the separate key key. For operators that return a dict but do not push each field as an individual XCom key.

This reverts commit 1f84a56.

VladaZakharova · 2026-05-11T13:14:27Z

hi @shahar1 !
Follow up the comment from the release chat: this PR should be reverted since it breaks system tests run.
Can you please check? thanks!

shahar1 · 2026-05-11T14:32:35Z

hi @shahar1 !
Follow up the comment from the release chat: this PR should be reverted since it breaks system tests run.
Can you please check? thanks!

Thanks for reporting Vlada!
I'll try to review it a bit later today.

Until then:
@Dev-iL - I'll appreciate your feedback here what went it wrong, and how we could prevent it in the next iteration.

Dev-iL · 2026-05-11T17:23:15Z

@Dev-iL - I'll appreciate your feedback here what went it wrong, and how we could prevent it in the next iteration.

The answer seems quite obvious - either we had no tests for this, or the right tests didnt run.

Would it be possible to revert only the changes that actually break something and not everything?

VladaZakharova · 2026-05-11T18:48:06Z

@Dev-iL - I'll appreciate your feedback here what went it wrong, and how we could prevent it in the next iteration.

The answer seems quite obvious - either we had no tests for this, or the right tests didnt run.

Would it be possible to revert only the changes that actually break something and not everything?

I don't understand what do you mean, all these changes break google system tests, that's why we need to revert everything

Dev-iL · 2026-05-11T19:06:24Z

I don't understand what do you mean

Looking at the PR you wish to revert shows it did get tagged with the google provider. I know that certain tags cause additional tests to be triggered on CI, so I find it surprising that the google system tests didn't run despite that. If we revert the PR without updating the CI to really trigger these tests when the provider changes - it's a matter of time until something like this happens again.

all these changes break google system tests, that's why we need to revert everything

In addition to google, you wish to revert changes to the amazon provider and to multiple examples. Do the google system tests depend on all of those?

shahar1 · 2026-05-11T20:04:44Z

@Dev-iL - I'll appreciate your feedback here what went it wrong, and how we could prevent it in the next iteration.

The answer seems quite obvious - either we had no tests for this, or the right tests didnt run.

Would it be possible to revert only the changes that actually break something and not everything?

I checked, and there's indeed an issue in the rewrite: task.output["key"] isn't always equivalent to {{ xcom_pull('task')['key'] }}.

See for example the following snippet:

@task
def make_dict():
    return {"key": "value"} # Pushes one XCom row:  key="return_value", value={"key": "value"}

t = make_dict()

In the old form, Jinja is rendered at runtime:

"{{ ti.xcom_pull('make_dict')['key'] }}"
# 1. ti.xcom_pull('make_dict')                      -> reads key="return_value" → {"key": "value"}
# 2. ['key']                                        -> Python dict-index             → "value"

However, in the new form, XComArg is resolved at runtime:

t.output["key"]

# 1. t.output                                      -> PlainXComArg(op=make_dict, key="return_value")
# 2. ["key"]                                       -> PlainXComArg(op=make_dict, key="key")   <- key swapped, NOT indexed
# 3. resolve()                                     -> ti.xcom_pull(task_ids="make_dict", key="key")
# 4. no row with key="key" exists    -> None

See following for reference:

airflow/task-sdk/src/airflow/sdk/definitions/xcom_arg.py

Line 220 in 15d297a

return PlainXComArg(operator=self.operator, key=item)

The reasons that the system tests didn't fail is that we currently don't run them in the CI, as they require integration with the Cloud providers (specifically in GCP we don't have it). For that reason, GCP system tests currently run on Vlada's team's instance. The monitoring for these tests is publicly available:
https://storage.googleapis.com/providers-dashboard-html/dashboard.html
However, we currently don't have a mechanism to detect issues in PRs except for running the system tests locally (which wasn't done during merging - I should have checked it when reviewing last PR).

For the reasons above, I'll revert the changes as suggested (including in all providers) - if you manage to solve the issue described above, including unit tests and working example dags (could be simplified versions, no need for GCP logic), I'll happily revert the revert (only someone has to test at least one Dag).

Dev-iL · 2026-05-12T07:23:19Z

I learned from this that:

the replacement is only safe when a dict-returning task specifies multiple_outputs = True.
we need to have some guardrails to help deal with the somewhat unintuitive multiple_outputs autodetection logic.

VladaZakharova · 2026-05-12T08:30:04Z

@shahar1
Yes, you are completely right, thank you for good explanation!

@task

…02`) (#25152) > [!IMPORTANT] > Disclosure: this PR description was AI-drafted under my direction and reviewed by me before posting. ## Summary Implements rule `AIR202` (`airflow-task-implicit-multiple-outputs`) that flags `@task`-decorated functions whose `multiple_outputs` behavior is determined by Airflow's runtime inference rather than being set explicitly on the decorator. At runtime, Airflow's [`_infer_multiple_outputs`](https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/bases/decorator.py) inspects the function's return annotation: if it resolves to a subclass of `collections.abc.Mapping`, the return value is split into one `XCom` per key; otherwise the entire return value is stored as a single `XCom`. This couples typing to `XCom` layout in a non-obvious way — renaming, removing, or refining the return annotation silently changes a DAG's `XCom` behavior. Passing `multiple_outputs=` explicitly makes the author's intent clear and insulates the DAG from future changes to inference. ### What the rule flags ```python from airflow.sdk import task # or equivalent Airflow 2.x import path # AIR202 — annotation in the Mapping family @task def extract() -> dict: return {"x": 1, "y": 2} # AIR202 — body returns a dict literal @task() def transform(): return {"a": 1} # AIR202 — call-form decorator with unrelated kwargs @task(retries=3) def with_retries() -> dict[str, int]: return {"n": 1} # AIR202 — variant decorators with parens @task.short_circuit(trigger_rule="all_done") def gate() -> dict: return {"go": True} ``` ### Suggested fix The fix is **always available** but marked **unsafe**: it inserts the value Airflow would have inferred, preserving runtime behavior at the moment of the rewrite, but the author may have intended a different `XCom` layout and a multi-return function may not always return a dict. ```python @task(multiple_outputs=True) def extract() -> dict: return {"x": 1, "y": 2} @task(multiple_outputs=True) def transform(): return {"a": 1} @task(retries=3, multiple_outputs=True) def with_retries() -> dict[str, int]: return {"n": 1} ``` For the call-form path (`@task(...)` with existing parentheses), the fix uses `add_argument` so existing kwargs and formatting are preserved, including on multiline decorators. For the bare form (`@task`, `@task.branch`), the fix appends `(multiple_outputs=...)` after the decorator expression. ### Detection signals The rule fires when **either** of two independent signals holds and `multiple_outputs` is absent from the decorator: 1. **Annotation signal** — the return annotation resolves to a member of the Mapping family: `dict`, `Dict`, `Mapping`, `MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` (and the `collections` / `typing` variants thereof). 2. **Body signal** — the function body contains at least one `return` statement whose value is a dict literal, a dict comprehension, or a `dict(...)` call. `ReturnStatementVisitor` is used so returns inside nested `if` / `for` / `while` / `try` blocks are considered, while returns inside nested function definitions or class bodies are correctly **not** considered. ### What it does NOT flag - Functions where `multiple_outputs=` is already set explicitly (either `True` or `False`). - `@task.sensor`-decorated functions — the sensor decorator hardcodes `multiple_outputs=False`. - Functions whose only dict-returning code lives in a nested function or class method (the outer task is not affected). - Functions whose return annotation is not in the Mapping family and whose body does not return a dict-shaped value. ### Implementation details - Resolves `@task` / `@task.<variant>` via the semantic model, handling both bare (`@task`) and call (`@task(...)`) forms via `map_callable`. The supported variants list is local to the rule: `python`, `virtualenv`, `external_python`, `branch`, `branch_virtualenv`, `branch_external_python`, `short_circuit`, `docker`, `kubernetes`, `pyspark`. - Annotation matching walks subscripts (e.g., `dict[str, int]`, `Mapping[str, Any]`) down to the head and uses `match_typing_expr` / known-module resolution. - Body matching uses `ReturnStatementVisitor` from `ruff_python_ast::helpers`, which already skips nested function and class definitions — so nested-scope dict returns are correctly ignored without a custom traversal. - The inserted value (`True` / `False`) mirrors Airflow's inference: `True` when the annotation is in the Mapping family, `False` otherwise. When both signals fire but the annotation is non-Mapping (e.g., body returns a dict but the annotation is `list`), the inserted value is still `False` to preserve runtime behavior. ## Test Plan Added snapshot tests in `AIR202.py` covering: | Group | Cases | |-------|-------| | Annotation positives | `dict`, `dict[str, int]`, `Dict`, `Mapping`, `MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` | | Body positives | dict literal, dict comprehension, `dict(...)` call | | Decorator forms | bare `@task`, `@task()`, `@task(retries=3)`, multiline `@task(...)`, `@task.docker(image=...)`, `@task.virtualenv()`, `@task.short_circuit(trigger_rule=...)`, `@task.branch` | | Explicit-kwarg negatives | `multiple_outputs=True`, `multiple_outputs=False` — never flagged | | Sensor negative | `@task.sensor` returning a dict — never flagged | | Nested-scope negatives | outer task with inner function returning a dict; outer task with inner class whose method returns a dict — never flagged | A sweep against the local Apache Airflow checkout (`cargo run -p ruff -- check ~/airflow --no-cache --preview --select AIR202`) produced 19 diagnostics on the Airflow codebase itself, all of which were verified to be genuine cases of implicit `multiple_outputs` inference (no false positives). Related: - apache/airflow#43176 - apache/airflow#66712 Community approval: https://lists.apache.org/thread/99s321s2p5xqhzlm0x5wjgy044nggtpm --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

@task

…02`) (astral-sh#25152) > [!IMPORTANT] > Disclosure: this PR description was AI-drafted under my direction and reviewed by me before posting. ## Summary Implements rule `AIR202` (`airflow-task-implicit-multiple-outputs`) that flags `@task`-decorated functions whose `multiple_outputs` behavior is determined by Airflow's runtime inference rather than being set explicitly on the decorator. At runtime, Airflow's [`_infer_multiple_outputs`](https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/bases/decorator.py) inspects the function's return annotation: if it resolves to a subclass of `collections.abc.Mapping`, the return value is split into one `XCom` per key; otherwise the entire return value is stored as a single `XCom`. This couples typing to `XCom` layout in a non-obvious way — renaming, removing, or refining the return annotation silently changes a DAG's `XCom` behavior. Passing `multiple_outputs=` explicitly makes the author's intent clear and insulates the DAG from future changes to inference. ### What the rule flags ```python from airflow.sdk import task # or equivalent Airflow 2.x import path # AIR202 — annotation in the Mapping family @task def extract() -> dict: return {"x": 1, "y": 2} # AIR202 — body returns a dict literal @task() def transform(): return {"a": 1} # AIR202 — call-form decorator with unrelated kwargs @task(retries=3) def with_retries() -> dict[str, int]: return {"n": 1} # AIR202 — variant decorators with parens @task.short_circuit(trigger_rule="all_done") def gate() -> dict: return {"go": True} ``` ### Suggested fix The fix is **always available** but marked **unsafe**: it inserts the value Airflow would have inferred, preserving runtime behavior at the moment of the rewrite, but the author may have intended a different `XCom` layout and a multi-return function may not always return a dict. ```python @task(multiple_outputs=True) def extract() -> dict: return {"x": 1, "y": 2} @task(multiple_outputs=True) def transform(): return {"a": 1} @task(retries=3, multiple_outputs=True) def with_retries() -> dict[str, int]: return {"n": 1} ``` For the call-form path (`@task(...)` with existing parentheses), the fix uses `add_argument` so existing kwargs and formatting are preserved, including on multiline decorators. For the bare form (`@task`, `@task.branch`), the fix appends `(multiple_outputs=...)` after the decorator expression. ### Detection signals The rule fires when **either** of two independent signals holds and `multiple_outputs` is absent from the decorator: 1. **Annotation signal** — the return annotation resolves to a member of the Mapping family: `dict`, `Dict`, `Mapping`, `MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` (and the `collections` / `typing` variants thereof). 2. **Body signal** — the function body contains at least one `return` statement whose value is a dict literal, a dict comprehension, or a `dict(...)` call. `ReturnStatementVisitor` is used so returns inside nested `if` / `for` / `while` / `try` blocks are considered, while returns inside nested function definitions or class bodies are correctly **not** considered. ### What it does NOT flag - Functions where `multiple_outputs=` is already set explicitly (either `True` or `False`). - `@task.sensor`-decorated functions — the sensor decorator hardcodes `multiple_outputs=False`. - Functions whose only dict-returning code lives in a nested function or class method (the outer task is not affected). - Functions whose return annotation is not in the Mapping family and whose body does not return a dict-shaped value. ### Implementation details - Resolves `@task` / `@task.<variant>` via the semantic model, handling both bare (`@task`) and call (`@task(...)`) forms via `map_callable`. The supported variants list is local to the rule: `python`, `virtualenv`, `external_python`, `branch`, `branch_virtualenv`, `branch_external_python`, `short_circuit`, `docker`, `kubernetes`, `pyspark`. - Annotation matching walks subscripts (e.g., `dict[str, int]`, `Mapping[str, Any]`) down to the head and uses `match_typing_expr` / known-module resolution. - Body matching uses `ReturnStatementVisitor` from `ruff_python_ast::helpers`, which already skips nested function and class definitions — so nested-scope dict returns are correctly ignored without a custom traversal. - The inserted value (`True` / `False`) mirrors Airflow's inference: `True` when the annotation is in the Mapping family, `False` otherwise. When both signals fire but the annotation is non-Mapping (e.g., body returns a dict but the annotation is `list`), the inserted value is still `False` to preserve runtime behavior. ## Test Plan Added snapshot tests in `AIR202.py` covering: | Group | Cases | |-------|-------| | Annotation positives | `dict`, `dict[str, int]`, `Dict`, `Mapping`, `MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` | | Body positives | dict literal, dict comprehension, `dict(...)` call | | Decorator forms | bare `@task`, `@task()`, `@task(retries=3)`, multiline `@task(...)`, `@task.docker(image=...)`, `@task.virtualenv()`, `@task.short_circuit(trigger_rule=...)`, `@task.branch` | | Explicit-kwarg negatives | `multiple_outputs=True`, `multiple_outputs=False` — never flagged | | Sensor negative | `@task.sensor` returning a dict — never flagged | | Nested-scope negatives | outer task with inner function returning a dict; outer task with inner class whose method returns a dict — never flagged | A sweep against the local Apache Airflow checkout (`cargo run -p ruff -- check ~/airflow --no-cache --preview --select AIR202`) produced 19 diagnostics on the Airflow codebase itself, all of which were verified to be genuine cases of implicit `multiple_outputs` inference (no false positives). Related: - apache/airflow#43176 - apache/airflow#66712 Community approval: https://lists.apache.org/thread/99s321s2p5xqhzlm0x5wjgy044nggtpm --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

@task

…02`) (astral-sh#25152) > [!IMPORTANT] > Disclosure: this PR description was AI-drafted under my direction and reviewed by me before posting. ## Summary Implements rule `AIR202` (`airflow-task-implicit-multiple-outputs`) that flags `@task`-decorated functions whose `multiple_outputs` behavior is determined by Airflow's runtime inference rather than being set explicitly on the decorator. At runtime, Airflow's [`_infer_multiple_outputs`](https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/bases/decorator.py) inspects the function's return annotation: if it resolves to a subclass of `collections.abc.Mapping`, the return value is split into one `XCom` per key; otherwise the entire return value is stored as a single `XCom`. This couples typing to `XCom` layout in a non-obvious way — renaming, removing, or refining the return annotation silently changes a DAG's `XCom` behavior. Passing `multiple_outputs=` explicitly makes the author's intent clear and insulates the DAG from future changes to inference. ### What the rule flags ```python from airflow.sdk import task # or equivalent Airflow 2.x import path # AIR202 — annotation in the Mapping family @task def extract() -> dict: return {"x": 1, "y": 2} # AIR202 — body returns a dict literal @task() def transform(): return {"a": 1} # AIR202 — call-form decorator with unrelated kwargs @task(retries=3) def with_retries() -> dict[str, int]: return {"n": 1} # AIR202 — variant decorators with parens @task.short_circuit(trigger_rule="all_done") def gate() -> dict: return {"go": True} ``` ### Suggested fix The fix is **always available** but marked **unsafe**: it inserts the value Airflow would have inferred, preserving runtime behavior at the moment of the rewrite, but the author may have intended a different `XCom` layout and a multi-return function may not always return a dict. ```python @task(multiple_outputs=True) def extract() -> dict: return {"x": 1, "y": 2} @task(multiple_outputs=True) def transform(): return {"a": 1} @task(retries=3, multiple_outputs=True) def with_retries() -> dict[str, int]: return {"n": 1} ``` For the call-form path (`@task(...)` with existing parentheses), the fix uses `add_argument` so existing kwargs and formatting are preserved, including on multiline decorators. For the bare form (`@task`, `@task.branch`), the fix appends `(multiple_outputs=...)` after the decorator expression. ### Detection signals The rule fires when **either** of two independent signals holds and `multiple_outputs` is absent from the decorator: 1. **Annotation signal** — the return annotation resolves to a member of the Mapping family: `dict`, `Dict`, `Mapping`, `MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` (and the `collections` / `typing` variants thereof). 2. **Body signal** — the function body contains at least one `return` statement whose value is a dict literal, a dict comprehension, or a `dict(...)` call. `ReturnStatementVisitor` is used so returns inside nested `if` / `for` / `while` / `try` blocks are considered, while returns inside nested function definitions or class bodies are correctly **not** considered. ### What it does NOT flag - Functions where `multiple_outputs=` is already set explicitly (either `True` or `False`). - `@task.sensor`-decorated functions — the sensor decorator hardcodes `multiple_outputs=False`. - Functions whose only dict-returning code lives in a nested function or class method (the outer task is not affected). - Functions whose return annotation is not in the Mapping family and whose body does not return a dict-shaped value. ### Implementation details - Resolves `@task` / `@task.<variant>` via the semantic model, handling both bare (`@task`) and call (`@task(...)`) forms via `map_callable`. The supported variants list is local to the rule: `python`, `virtualenv`, `external_python`, `branch`, `branch_virtualenv`, `branch_external_python`, `short_circuit`, `docker`, `kubernetes`, `pyspark`. - Annotation matching walks subscripts (e.g., `dict[str, int]`, `Mapping[str, Any]`) down to the head and uses `match_typing_expr` / known-module resolution. - Body matching uses `ReturnStatementVisitor` from `ruff_python_ast::helpers`, which already skips nested function and class definitions — so nested-scope dict returns are correctly ignored without a custom traversal. - The inserted value (`True` / `False`) mirrors Airflow's inference: `True` when the annotation is in the Mapping family, `False` otherwise. When both signals fire but the annotation is non-Mapping (e.g., body returns a dict but the annotation is `list`), the inserted value is still `False` to preserve runtime behavior. ## Test Plan Added snapshot tests in `AIR202.py` covering: | Group | Cases | |-------|-------| | Annotation positives | `dict`, `dict[str, int]`, `Dict`, `Mapping`, `MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` | | Body positives | dict literal, dict comprehension, `dict(...)` call | | Decorator forms | bare `@task`, `@task()`, `@task(retries=3)`, multiline `@task(...)`, `@task.docker(image=...)`, `@task.virtualenv()`, `@task.short_circuit(trigger_rule=...)`, `@task.branch` | | Explicit-kwarg negatives | `multiple_outputs=True`, `multiple_outputs=False` — never flagged | | Sensor negative | `@task.sensor` returning a dict — never flagged | | Nested-scope negatives | outer task with inner function returning a dict; outer task with inner class whose method returns a dict — never flagged | A sweep against the local Apache Airflow checkout (`cargo run -p ruff -- check ~/airflow --no-cache --preview --select AIR202`) produced 19 diagnostics on the Airflow codebase itself, all of which were verified to be genuine cases of implicit `multiple_outputs` inference (no false positives). Related: - apache/airflow#43176 - apache/airflow#66712 Community approval: https://lists.apache.org/thread/99s321s2p5xqhzlm0x5wjgy044nggtpm --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Revert "Apply AIR201-style replacements (apache#65197)"

9a35bf1

This reverts commit 1f84a56.

VladaZakharova requested review from o-nikolas and shahar1 as code owners May 11, 2026 13:13

boring-cyborg Bot added area:providers provider:amazon AWS/Amazon - related issues provider:apache-beam provider:google Google (including GCP) related issues labels May 11, 2026

shahar1 approved these changes May 11, 2026

View reviewed changes

shahar1 merged commit 7ce44e9 into apache:main May 11, 2026
94 checks passed

Dev-iL mentioned this pull request May 14, 2026

[airflow] Implement airflow-task-implicit-multiple-outputs (AIR202) astral-sh/ruff#25152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "Apply AIR201-style replacements"#66712

Revert "Apply AIR201-style replacements"#66712
shahar1 merged 1 commit into
apache:mainfrom
VladaZakharova:revert-65197-2604/AIR201

VladaZakharova commented May 11, 2026

Uh oh!

VladaZakharova commented May 11, 2026

Uh oh!

shahar1 commented May 11, 2026

Uh oh!

Dev-iL commented May 11, 2026

Uh oh!

VladaZakharova commented May 11, 2026

Uh oh!

Dev-iL commented May 11, 2026

Uh oh!

shahar1 commented May 11, 2026

Uh oh!

Uh oh!

Dev-iL commented May 12, 2026

Uh oh!

VladaZakharova commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

VladaZakharova commented May 11, 2026

Uh oh!

VladaZakharova commented May 11, 2026

Uh oh!

shahar1 commented May 11, 2026

Uh oh!

Dev-iL commented May 11, 2026

Uh oh!

VladaZakharova commented May 11, 2026

Uh oh!

Dev-iL commented May 11, 2026

Uh oh!

shahar1 commented May 11, 2026

Uh oh!

Uh oh!

Dev-iL commented May 12, 2026

Uh oh!

VladaZakharova commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants