Skip to content

Revert "Apply AIR201-style replacements"#66712

Merged
shahar1 merged 1 commit into
apache:mainfrom
VladaZakharova:revert-65197-2604/AIR201
May 11, 2026
Merged

Revert "Apply AIR201-style replacements"#66712
shahar1 merged 1 commit into
apache:mainfrom
VladaZakharova:revert-65197-2604/AIR201

Conversation

@VladaZakharova

Copy link
Copy Markdown
Contributor

Reverts #65197

This PR is being reverted because task.output["key"] is not always equivalent to {{ task_instance.xcom_pull("task_id")["key"] }}.

The old form pulls the task’s default return_value XCom and indexes the returned dictionary. The new form may instead look for an XCom stored under the separate key key. For operators that return a dict but do not push each field as an individual XCom key.

@boring-cyborg boring-cyborg Bot added area:providers provider:amazon AWS/Amazon - related issues provider:apache-beam provider:google Google (including GCP) related issues labels May 11, 2026
@VladaZakharova

Copy link
Copy Markdown
Contributor Author

hi @shahar1 !
Follow up the comment from the release chat: this PR should be reverted since it breaks system tests run.
Can you please check? thanks!

@shahar1

shahar1 commented May 11, 2026

Copy link
Copy Markdown
Contributor

hi @shahar1 !
Follow up the comment from the release chat: this PR should be reverted since it breaks system tests run.
Can you please check? thanks!

Thanks for reporting Vlada!
I'll try to review it a bit later today.

Until then:
@Dev-iL - I'll appreciate your feedback here what went it wrong, and how we could prevent it in the next iteration.

@Dev-iL

Dev-iL commented May 11, 2026

Copy link
Copy Markdown
Collaborator

@Dev-iL - I'll appreciate your feedback here what went it wrong, and how we could prevent it in the next iteration.

The answer seems quite obvious - either we had no tests for this, or the right tests didnt run.

Would it be possible to revert only the changes that actually break something and not everything?

@VladaZakharova

Copy link
Copy Markdown
Contributor Author

@Dev-iL - I'll appreciate your feedback here what went it wrong, and how we could prevent it in the next iteration.

The answer seems quite obvious - either we had no tests for this, or the right tests didnt run.

Would it be possible to revert only the changes that actually break something and not everything?

I don't understand what do you mean, all these changes break google system tests, that's why we need to revert everything

@Dev-iL

Dev-iL commented May 11, 2026

Copy link
Copy Markdown
Collaborator

I don't understand what do you mean

Looking at the PR you wish to revert shows it did get tagged with the google provider. I know that certain tags cause additional tests to be triggered on CI, so I find it surprising that the google system tests didn't run despite that. If we revert the PR without updating the CI to really trigger these tests when the provider changes - it's a matter of time until something like this happens again.

all these changes break google system tests, that's why we need to revert everything

In addition to google, you wish to revert changes to the amazon provider and to multiple examples. Do the google system tests depend on all of those?

@shahar1

shahar1 commented May 11, 2026

Copy link
Copy Markdown
Contributor

@Dev-iL - I'll appreciate your feedback here what went it wrong, and how we could prevent it in the next iteration.

The answer seems quite obvious - either we had no tests for this, or the right tests didnt run.

Would it be possible to revert only the changes that actually break something and not everything?

I checked, and there's indeed an issue in the rewrite: task.output["key"] isn't always equivalent to {{ xcom_pull('task')['key'] }}.

See for example the following snippet:

@task
def make_dict():
    return {"key": "value"} # Pushes one XCom row:  key="return_value", value={"key": "value"}

t = make_dict()

In the old form, Jinja is rendered at runtime:

"{{ ti.xcom_pull('make_dict')['key'] }}"
# 1. ti.xcom_pull('make_dict')                      -> reads key="return_value" → {"key": "value"}
# 2. ['key']                                        -> Python dict-index             → "value" 

However, in the new form, XComArg is resolved at runtime:

t.output["key"]

# 1. t.output                                      -> PlainXComArg(op=make_dict, key="return_value")
# 2. ["key"]                                       -> PlainXComArg(op=make_dict, key="key")   <- key swapped, NOT indexed
# 3. resolve()                                     -> ti.xcom_pull(task_ids="make_dict", key="key")
# 4. no row with key="key" exists    -> None  

See following for reference:

return PlainXComArg(operator=self.operator, key=item)

The reasons that the system tests didn't fail is that we currently don't run them in the CI, as they require integration with the Cloud providers (specifically in GCP we don't have it). For that reason, GCP system tests currently run on Vlada's team's instance. The monitoring for these tests is publicly available:
https://storage.googleapis.com/providers-dashboard-html/dashboard.html
However, we currently don't have a mechanism to detect issues in PRs except for running the system tests locally (which wasn't done during merging - I should have checked it when reviewing last PR).

For the reasons above, I'll revert the changes as suggested (including in all providers) - if you manage to solve the issue described above, including unit tests and working example dags (could be simplified versions, no need for GCP logic), I'll happily revert the revert (only someone has to test at least one Dag).

@shahar1 shahar1 merged commit 7ce44e9 into apache:main May 11, 2026
94 checks passed
@Dev-iL

Dev-iL commented May 12, 2026

Copy link
Copy Markdown
Collaborator

I learned from this that:

  1. the replacement is only safe when a dict-returning task specifies multiple_outputs = True.
  2. we need to have some guardrails to help deal with the somewhat unintuitive multiple_outputs autodetection logic.

@VladaZakharova

Copy link
Copy Markdown
Contributor Author

@shahar1
Yes, you are completely right, thank you for good explanation!

ntBre pushed a commit to astral-sh/ruff that referenced this pull request May 19, 2026
…02`) (#25152)

> [!IMPORTANT]
> Disclosure: this PR description was AI-drafted under my direction and
reviewed by me before posting.

## Summary

Implements rule `AIR202` (`airflow-task-implicit-multiple-outputs`) that
flags `@task`-decorated functions whose `multiple_outputs` behavior is
determined by Airflow's runtime inference rather than being set
explicitly on the decorator.

At runtime, Airflow's
[`_infer_multiple_outputs`](https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/bases/decorator.py)
inspects the function's return annotation: if it resolves to a subclass
of `collections.abc.Mapping`, the return value is split into one `XCom`
per key; otherwise the entire return value is stored as a single `XCom`.
This couples typing to `XCom` layout in a non-obvious way — renaming,
removing, or refining the return annotation silently changes a DAG's
`XCom` behavior. Passing `multiple_outputs=` explicitly makes the
author's intent clear and insulates the DAG from future changes to
inference.

### What the rule flags

```python
from airflow.sdk import task  # or equivalent Airflow 2.x import path


# AIR202 — annotation in the Mapping family
@task
def extract() -> dict:
    return {"x": 1, "y": 2}


# AIR202 — body returns a dict literal
@task()
def transform():
    return {"a": 1}


# AIR202 — call-form decorator with unrelated kwargs
@task(retries=3)
def with_retries() -> dict[str, int]:
    return {"n": 1}


# AIR202 — variant decorators with parens
@task.short_circuit(trigger_rule="all_done")
def gate() -> dict:
    return {"go": True}
```

### Suggested fix

The fix is **always available** but marked **unsafe**: it inserts the
value Airflow would have inferred, preserving runtime behavior at the
moment of the rewrite, but the author may have intended a different
`XCom` layout and a multi-return function may not always return a dict.

```python
@task(multiple_outputs=True)
def extract() -> dict:
    return {"x": 1, "y": 2}


@task(multiple_outputs=True)
def transform():
    return {"a": 1}


@task(retries=3, multiple_outputs=True)
def with_retries() -> dict[str, int]:
    return {"n": 1}
```

For the call-form path (`@task(...)` with existing parentheses), the fix
uses `add_argument` so existing kwargs and formatting are preserved,
including on multiline decorators. For the bare form (`@task`,
`@task.branch`), the fix appends `(multiple_outputs=...)` after the
decorator expression.

### Detection signals

The rule fires when **either** of two independent signals holds and
`multiple_outputs` is absent from the decorator:

1. **Annotation signal** — the return annotation resolves to a member of
the Mapping family: `dict`, `Dict`, `Mapping`, `MutableMapping`,
`OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` (and
the `collections` / `typing` variants thereof).
2. **Body signal** — the function body contains at least one `return`
statement whose value is a dict literal, a dict comprehension, or a
`dict(...)` call. `ReturnStatementVisitor` is used so returns inside
nested `if` / `for` / `while` / `try` blocks are considered, while
returns inside nested function definitions or class bodies are correctly
**not** considered.

### What it does NOT flag

- Functions where `multiple_outputs=` is already set explicitly (either
`True` or `False`).
- `@task.sensor`-decorated functions — the sensor decorator hardcodes
`multiple_outputs=False`.
- Functions whose only dict-returning code lives in a nested function or
class method (the outer task is not affected).
- Functions whose return annotation is not in the Mapping family and
whose body does not return a dict-shaped value.

### Implementation details

- Resolves `@task` / `@task.<variant>` via the semantic model, handling
both bare (`@task`) and call (`@task(...)`) forms via `map_callable`.
The supported variants list is local to the rule: `python`,
`virtualenv`, `external_python`, `branch`, `branch_virtualenv`,
`branch_external_python`, `short_circuit`, `docker`, `kubernetes`,
`pyspark`.
- Annotation matching walks subscripts (e.g., `dict[str, int]`,
`Mapping[str, Any]`) down to the head and uses `match_typing_expr` /
known-module resolution.
- Body matching uses `ReturnStatementVisitor` from
`ruff_python_ast::helpers`, which already skips nested function and
class definitions — so nested-scope dict returns are correctly ignored
without a custom traversal.
- The inserted value (`True` / `False`) mirrors Airflow's inference:
`True` when the annotation is in the Mapping family, `False` otherwise.
When both signals fire but the annotation is non-Mapping (e.g., body
returns a dict but the annotation is `list`), the inserted value is
still `False` to preserve runtime behavior.

## Test Plan

Added snapshot tests in `AIR202.py` covering:

| Group | Cases |
|-------|-------|
| Annotation positives | `dict`, `dict[str, int]`, `Dict`, `Mapping`,
`MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`,
`TypedDict` |
| Body positives | dict literal, dict comprehension, `dict(...)` call |
| Decorator forms | bare `@task`, `@task()`, `@task(retries=3)`,
multiline `@task(...)`, `@task.docker(image=...)`, `@task.virtualenv()`,
`@task.short_circuit(trigger_rule=...)`, `@task.branch` |
| Explicit-kwarg negatives | `multiple_outputs=True`,
`multiple_outputs=False` — never flagged |
| Sensor negative | `@task.sensor` returning a dict — never flagged |
| Nested-scope negatives | outer task with inner function returning a
dict; outer task with inner class whose method returns a dict — never
flagged |

A sweep against the local Apache Airflow checkout (`cargo run -p ruff --
check ~/airflow --no-cache --preview --select AIR202`) produced 19
diagnostics on the Airflow codebase itself, all of which were verified
to be genuine cases of implicit `multiple_outputs` inference (no false
positives).

Related: 
- apache/airflow#43176
- apache/airflow#66712

Community approval:
https://lists.apache.org/thread/99s321s2p5xqhzlm0x5wjgy044nggtpm

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
thejchap pushed a commit to thejchap/ruff that referenced this pull request May 23, 2026
…02`) (astral-sh#25152)

> [!IMPORTANT]
> Disclosure: this PR description was AI-drafted under my direction and
reviewed by me before posting.

## Summary

Implements rule `AIR202` (`airflow-task-implicit-multiple-outputs`) that
flags `@task`-decorated functions whose `multiple_outputs` behavior is
determined by Airflow's runtime inference rather than being set
explicitly on the decorator.

At runtime, Airflow's
[`_infer_multiple_outputs`](https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/bases/decorator.py)
inspects the function's return annotation: if it resolves to a subclass
of `collections.abc.Mapping`, the return value is split into one `XCom`
per key; otherwise the entire return value is stored as a single `XCom`.
This couples typing to `XCom` layout in a non-obvious way — renaming,
removing, or refining the return annotation silently changes a DAG's
`XCom` behavior. Passing `multiple_outputs=` explicitly makes the
author's intent clear and insulates the DAG from future changes to
inference.

### What the rule flags

```python
from airflow.sdk import task  # or equivalent Airflow 2.x import path


# AIR202 — annotation in the Mapping family
@task
def extract() -> dict:
    return {"x": 1, "y": 2}


# AIR202 — body returns a dict literal
@task()
def transform():
    return {"a": 1}


# AIR202 — call-form decorator with unrelated kwargs
@task(retries=3)
def with_retries() -> dict[str, int]:
    return {"n": 1}


# AIR202 — variant decorators with parens
@task.short_circuit(trigger_rule="all_done")
def gate() -> dict:
    return {"go": True}
```

### Suggested fix

The fix is **always available** but marked **unsafe**: it inserts the
value Airflow would have inferred, preserving runtime behavior at the
moment of the rewrite, but the author may have intended a different
`XCom` layout and a multi-return function may not always return a dict.

```python
@task(multiple_outputs=True)
def extract() -> dict:
    return {"x": 1, "y": 2}


@task(multiple_outputs=True)
def transform():
    return {"a": 1}


@task(retries=3, multiple_outputs=True)
def with_retries() -> dict[str, int]:
    return {"n": 1}
```

For the call-form path (`@task(...)` with existing parentheses), the fix
uses `add_argument` so existing kwargs and formatting are preserved,
including on multiline decorators. For the bare form (`@task`,
`@task.branch`), the fix appends `(multiple_outputs=...)` after the
decorator expression.

### Detection signals

The rule fires when **either** of two independent signals holds and
`multiple_outputs` is absent from the decorator:

1. **Annotation signal** — the return annotation resolves to a member of
the Mapping family: `dict`, `Dict`, `Mapping`, `MutableMapping`,
`OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` (and
the `collections` / `typing` variants thereof).
2. **Body signal** — the function body contains at least one `return`
statement whose value is a dict literal, a dict comprehension, or a
`dict(...)` call. `ReturnStatementVisitor` is used so returns inside
nested `if` / `for` / `while` / `try` blocks are considered, while
returns inside nested function definitions or class bodies are correctly
**not** considered.

### What it does NOT flag

- Functions where `multiple_outputs=` is already set explicitly (either
`True` or `False`).
- `@task.sensor`-decorated functions — the sensor decorator hardcodes
`multiple_outputs=False`.
- Functions whose only dict-returning code lives in a nested function or
class method (the outer task is not affected).
- Functions whose return annotation is not in the Mapping family and
whose body does not return a dict-shaped value.

### Implementation details

- Resolves `@task` / `@task.<variant>` via the semantic model, handling
both bare (`@task`) and call (`@task(...)`) forms via `map_callable`.
The supported variants list is local to the rule: `python`,
`virtualenv`, `external_python`, `branch`, `branch_virtualenv`,
`branch_external_python`, `short_circuit`, `docker`, `kubernetes`,
`pyspark`.
- Annotation matching walks subscripts (e.g., `dict[str, int]`,
`Mapping[str, Any]`) down to the head and uses `match_typing_expr` /
known-module resolution.
- Body matching uses `ReturnStatementVisitor` from
`ruff_python_ast::helpers`, which already skips nested function and
class definitions — so nested-scope dict returns are correctly ignored
without a custom traversal.
- The inserted value (`True` / `False`) mirrors Airflow's inference:
`True` when the annotation is in the Mapping family, `False` otherwise.
When both signals fire but the annotation is non-Mapping (e.g., body
returns a dict but the annotation is `list`), the inserted value is
still `False` to preserve runtime behavior.

## Test Plan

Added snapshot tests in `AIR202.py` covering:

| Group | Cases |
|-------|-------|
| Annotation positives | `dict`, `dict[str, int]`, `Dict`, `Mapping`,
`MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`,
`TypedDict` |
| Body positives | dict literal, dict comprehension, `dict(...)` call |
| Decorator forms | bare `@task`, `@task()`, `@task(retries=3)`,
multiline `@task(...)`, `@task.docker(image=...)`, `@task.virtualenv()`,
`@task.short_circuit(trigger_rule=...)`, `@task.branch` |
| Explicit-kwarg negatives | `multiple_outputs=True`,
`multiple_outputs=False` — never flagged |
| Sensor negative | `@task.sensor` returning a dict — never flagged |
| Nested-scope negatives | outer task with inner function returning a
dict; outer task with inner class whose method returns a dict — never
flagged |

A sweep against the local Apache Airflow checkout (`cargo run -p ruff --
check ~/airflow --no-cache --preview --select AIR202`) produced 19
diagnostics on the Airflow codebase itself, all of which were verified
to be genuine cases of implicit `multiple_outputs` inference (no false
positives).

Related: 
- apache/airflow#43176
- apache/airflow#66712

Community approval:
https://lists.apache.org/thread/99s321s2p5xqhzlm0x5wjgy044nggtpm

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
anishgirianish pushed a commit to anishgirianish/ruff that referenced this pull request May 28, 2026
…02`) (astral-sh#25152)

> [!IMPORTANT]
> Disclosure: this PR description was AI-drafted under my direction and
reviewed by me before posting.

## Summary

Implements rule `AIR202` (`airflow-task-implicit-multiple-outputs`) that
flags `@task`-decorated functions whose `multiple_outputs` behavior is
determined by Airflow's runtime inference rather than being set
explicitly on the decorator.

At runtime, Airflow's
[`_infer_multiple_outputs`](https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/bases/decorator.py)
inspects the function's return annotation: if it resolves to a subclass
of `collections.abc.Mapping`, the return value is split into one `XCom`
per key; otherwise the entire return value is stored as a single `XCom`.
This couples typing to `XCom` layout in a non-obvious way — renaming,
removing, or refining the return annotation silently changes a DAG's
`XCom` behavior. Passing `multiple_outputs=` explicitly makes the
author's intent clear and insulates the DAG from future changes to
inference.

### What the rule flags

```python
from airflow.sdk import task  # or equivalent Airflow 2.x import path


# AIR202 — annotation in the Mapping family
@task
def extract() -> dict:
    return {"x": 1, "y": 2}


# AIR202 — body returns a dict literal
@task()
def transform():
    return {"a": 1}


# AIR202 — call-form decorator with unrelated kwargs
@task(retries=3)
def with_retries() -> dict[str, int]:
    return {"n": 1}


# AIR202 — variant decorators with parens
@task.short_circuit(trigger_rule="all_done")
def gate() -> dict:
    return {"go": True}
```

### Suggested fix

The fix is **always available** but marked **unsafe**: it inserts the
value Airflow would have inferred, preserving runtime behavior at the
moment of the rewrite, but the author may have intended a different
`XCom` layout and a multi-return function may not always return a dict.

```python
@task(multiple_outputs=True)
def extract() -> dict:
    return {"x": 1, "y": 2}


@task(multiple_outputs=True)
def transform():
    return {"a": 1}


@task(retries=3, multiple_outputs=True)
def with_retries() -> dict[str, int]:
    return {"n": 1}
```

For the call-form path (`@task(...)` with existing parentheses), the fix
uses `add_argument` so existing kwargs and formatting are preserved,
including on multiline decorators. For the bare form (`@task`,
`@task.branch`), the fix appends `(multiple_outputs=...)` after the
decorator expression.

### Detection signals

The rule fires when **either** of two independent signals holds and
`multiple_outputs` is absent from the decorator:

1. **Annotation signal** — the return annotation resolves to a member of
the Mapping family: `dict`, `Dict`, `Mapping`, `MutableMapping`,
`OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` (and
the `collections` / `typing` variants thereof).
2. **Body signal** — the function body contains at least one `return`
statement whose value is a dict literal, a dict comprehension, or a
`dict(...)` call. `ReturnStatementVisitor` is used so returns inside
nested `if` / `for` / `while` / `try` blocks are considered, while
returns inside nested function definitions or class bodies are correctly
**not** considered.

### What it does NOT flag

- Functions where `multiple_outputs=` is already set explicitly (either
`True` or `False`).
- `@task.sensor`-decorated functions — the sensor decorator hardcodes
`multiple_outputs=False`.
- Functions whose only dict-returning code lives in a nested function or
class method (the outer task is not affected).
- Functions whose return annotation is not in the Mapping family and
whose body does not return a dict-shaped value.

### Implementation details

- Resolves `@task` / `@task.<variant>` via the semantic model, handling
both bare (`@task`) and call (`@task(...)`) forms via `map_callable`.
The supported variants list is local to the rule: `python`,
`virtualenv`, `external_python`, `branch`, `branch_virtualenv`,
`branch_external_python`, `short_circuit`, `docker`, `kubernetes`,
`pyspark`.
- Annotation matching walks subscripts (e.g., `dict[str, int]`,
`Mapping[str, Any]`) down to the head and uses `match_typing_expr` /
known-module resolution.
- Body matching uses `ReturnStatementVisitor` from
`ruff_python_ast::helpers`, which already skips nested function and
class definitions — so nested-scope dict returns are correctly ignored
without a custom traversal.
- The inserted value (`True` / `False`) mirrors Airflow's inference:
`True` when the annotation is in the Mapping family, `False` otherwise.
When both signals fire but the annotation is non-Mapping (e.g., body
returns a dict but the annotation is `list`), the inserted value is
still `False` to preserve runtime behavior.

## Test Plan

Added snapshot tests in `AIR202.py` covering:

| Group | Cases |
|-------|-------|
| Annotation positives | `dict`, `dict[str, int]`, `Dict`, `Mapping`,
`MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`,
`TypedDict` |
| Body positives | dict literal, dict comprehension, `dict(...)` call |
| Decorator forms | bare `@task`, `@task()`, `@task(retries=3)`,
multiline `@task(...)`, `@task.docker(image=...)`, `@task.virtualenv()`,
`@task.short_circuit(trigger_rule=...)`, `@task.branch` |
| Explicit-kwarg negatives | `multiple_outputs=True`,
`multiple_outputs=False` — never flagged |
| Sensor negative | `@task.sensor` returning a dict — never flagged |
| Nested-scope negatives | outer task with inner function returning a
dict; outer task with inner class whose method returns a dict — never
flagged |

A sweep against the local Apache Airflow checkout (`cargo run -p ruff --
check ~/airflow --no-cache --preview --select AIR202`) produced 19
diagnostics on the Airflow codebase itself, all of which were verified
to be genuine cases of implicit `multiple_outputs` inference (no false
positives).

Related: 
- apache/airflow#43176
- apache/airflow#66712

Community approval:
https://lists.apache.org/thread/99s321s2p5xqhzlm0x5wjgy044nggtpm

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues provider:apache-beam provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants