Revert "Apply AIR201-style replacements"#66712
Conversation
This reverts commit 1f84a56.
|
hi @shahar1 ! |
Thanks for reporting Vlada! Until then: |
The answer seems quite obvious - either we had no tests for this, or the right tests didnt run. Would it be possible to revert only the changes that actually break something and not everything? |
I don't understand what do you mean, all these changes break google system tests, that's why we need to revert everything |
Looking at the PR you wish to revert shows it did get tagged with the google provider. I know that certain tags cause additional tests to be triggered on CI, so I find it surprising that the google system tests didn't run despite that. If we revert the PR without updating the CI to really trigger these tests when the provider changes - it's a matter of time until something like this happens again.
In addition to google, you wish to revert changes to the amazon provider and to multiple examples. Do the google system tests depend on all of those? |
I checked, and there's indeed an issue in the rewrite: See for example the following snippet: @task
def make_dict():
return {"key": "value"} # Pushes one XCom row: key="return_value", value={"key": "value"}
t = make_dict()In the old form, Jinja is rendered at runtime: However, in the new form, See following for reference: The reasons that the system tests didn't fail is that we currently don't run them in the CI, as they require integration with the Cloud providers (specifically in GCP we don't have it). For that reason, GCP system tests currently run on Vlada's team's instance. The monitoring for these tests is publicly available: For the reasons above, I'll revert the changes as suggested (including in all providers) - if you manage to solve the issue described above, including unit tests and working example dags (could be simplified versions, no need for GCP logic), I'll happily revert the revert (only someone has to test at least one Dag). |
|
I learned from this that:
|
|
@shahar1 |
…02`) (#25152) > [!IMPORTANT] > Disclosure: this PR description was AI-drafted under my direction and reviewed by me before posting. ## Summary Implements rule `AIR202` (`airflow-task-implicit-multiple-outputs`) that flags `@task`-decorated functions whose `multiple_outputs` behavior is determined by Airflow's runtime inference rather than being set explicitly on the decorator. At runtime, Airflow's [`_infer_multiple_outputs`](https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/bases/decorator.py) inspects the function's return annotation: if it resolves to a subclass of `collections.abc.Mapping`, the return value is split into one `XCom` per key; otherwise the entire return value is stored as a single `XCom`. This couples typing to `XCom` layout in a non-obvious way — renaming, removing, or refining the return annotation silently changes a DAG's `XCom` behavior. Passing `multiple_outputs=` explicitly makes the author's intent clear and insulates the DAG from future changes to inference. ### What the rule flags ```python from airflow.sdk import task # or equivalent Airflow 2.x import path # AIR202 — annotation in the Mapping family @task def extract() -> dict: return {"x": 1, "y": 2} # AIR202 — body returns a dict literal @task() def transform(): return {"a": 1} # AIR202 — call-form decorator with unrelated kwargs @task(retries=3) def with_retries() -> dict[str, int]: return {"n": 1} # AIR202 — variant decorators with parens @task.short_circuit(trigger_rule="all_done") def gate() -> dict: return {"go": True} ``` ### Suggested fix The fix is **always available** but marked **unsafe**: it inserts the value Airflow would have inferred, preserving runtime behavior at the moment of the rewrite, but the author may have intended a different `XCom` layout and a multi-return function may not always return a dict. ```python @task(multiple_outputs=True) def extract() -> dict: return {"x": 1, "y": 2} @task(multiple_outputs=True) def transform(): return {"a": 1} @task(retries=3, multiple_outputs=True) def with_retries() -> dict[str, int]: return {"n": 1} ``` For the call-form path (`@task(...)` with existing parentheses), the fix uses `add_argument` so existing kwargs and formatting are preserved, including on multiline decorators. For the bare form (`@task`, `@task.branch`), the fix appends `(multiple_outputs=...)` after the decorator expression. ### Detection signals The rule fires when **either** of two independent signals holds and `multiple_outputs` is absent from the decorator: 1. **Annotation signal** — the return annotation resolves to a member of the Mapping family: `dict`, `Dict`, `Mapping`, `MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` (and the `collections` / `typing` variants thereof). 2. **Body signal** — the function body contains at least one `return` statement whose value is a dict literal, a dict comprehension, or a `dict(...)` call. `ReturnStatementVisitor` is used so returns inside nested `if` / `for` / `while` / `try` blocks are considered, while returns inside nested function definitions or class bodies are correctly **not** considered. ### What it does NOT flag - Functions where `multiple_outputs=` is already set explicitly (either `True` or `False`). - `@task.sensor`-decorated functions — the sensor decorator hardcodes `multiple_outputs=False`. - Functions whose only dict-returning code lives in a nested function or class method (the outer task is not affected). - Functions whose return annotation is not in the Mapping family and whose body does not return a dict-shaped value. ### Implementation details - Resolves `@task` / `@task.<variant>` via the semantic model, handling both bare (`@task`) and call (`@task(...)`) forms via `map_callable`. The supported variants list is local to the rule: `python`, `virtualenv`, `external_python`, `branch`, `branch_virtualenv`, `branch_external_python`, `short_circuit`, `docker`, `kubernetes`, `pyspark`. - Annotation matching walks subscripts (e.g., `dict[str, int]`, `Mapping[str, Any]`) down to the head and uses `match_typing_expr` / known-module resolution. - Body matching uses `ReturnStatementVisitor` from `ruff_python_ast::helpers`, which already skips nested function and class definitions — so nested-scope dict returns are correctly ignored without a custom traversal. - The inserted value (`True` / `False`) mirrors Airflow's inference: `True` when the annotation is in the Mapping family, `False` otherwise. When both signals fire but the annotation is non-Mapping (e.g., body returns a dict but the annotation is `list`), the inserted value is still `False` to preserve runtime behavior. ## Test Plan Added snapshot tests in `AIR202.py` covering: | Group | Cases | |-------|-------| | Annotation positives | `dict`, `dict[str, int]`, `Dict`, `Mapping`, `MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` | | Body positives | dict literal, dict comprehension, `dict(...)` call | | Decorator forms | bare `@task`, `@task()`, `@task(retries=3)`, multiline `@task(...)`, `@task.docker(image=...)`, `@task.virtualenv()`, `@task.short_circuit(trigger_rule=...)`, `@task.branch` | | Explicit-kwarg negatives | `multiple_outputs=True`, `multiple_outputs=False` — never flagged | | Sensor negative | `@task.sensor` returning a dict — never flagged | | Nested-scope negatives | outer task with inner function returning a dict; outer task with inner class whose method returns a dict — never flagged | A sweep against the local Apache Airflow checkout (`cargo run -p ruff -- check ~/airflow --no-cache --preview --select AIR202`) produced 19 diagnostics on the Airflow codebase itself, all of which were verified to be genuine cases of implicit `multiple_outputs` inference (no false positives). Related: - apache/airflow#43176 - apache/airflow#66712 Community approval: https://lists.apache.org/thread/99s321s2p5xqhzlm0x5wjgy044nggtpm --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…02`) (astral-sh#25152) > [!IMPORTANT] > Disclosure: this PR description was AI-drafted under my direction and reviewed by me before posting. ## Summary Implements rule `AIR202` (`airflow-task-implicit-multiple-outputs`) that flags `@task`-decorated functions whose `multiple_outputs` behavior is determined by Airflow's runtime inference rather than being set explicitly on the decorator. At runtime, Airflow's [`_infer_multiple_outputs`](https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/bases/decorator.py) inspects the function's return annotation: if it resolves to a subclass of `collections.abc.Mapping`, the return value is split into one `XCom` per key; otherwise the entire return value is stored as a single `XCom`. This couples typing to `XCom` layout in a non-obvious way — renaming, removing, or refining the return annotation silently changes a DAG's `XCom` behavior. Passing `multiple_outputs=` explicitly makes the author's intent clear and insulates the DAG from future changes to inference. ### What the rule flags ```python from airflow.sdk import task # or equivalent Airflow 2.x import path # AIR202 — annotation in the Mapping family @task def extract() -> dict: return {"x": 1, "y": 2} # AIR202 — body returns a dict literal @task() def transform(): return {"a": 1} # AIR202 — call-form decorator with unrelated kwargs @task(retries=3) def with_retries() -> dict[str, int]: return {"n": 1} # AIR202 — variant decorators with parens @task.short_circuit(trigger_rule="all_done") def gate() -> dict: return {"go": True} ``` ### Suggested fix The fix is **always available** but marked **unsafe**: it inserts the value Airflow would have inferred, preserving runtime behavior at the moment of the rewrite, but the author may have intended a different `XCom` layout and a multi-return function may not always return a dict. ```python @task(multiple_outputs=True) def extract() -> dict: return {"x": 1, "y": 2} @task(multiple_outputs=True) def transform(): return {"a": 1} @task(retries=3, multiple_outputs=True) def with_retries() -> dict[str, int]: return {"n": 1} ``` For the call-form path (`@task(...)` with existing parentheses), the fix uses `add_argument` so existing kwargs and formatting are preserved, including on multiline decorators. For the bare form (`@task`, `@task.branch`), the fix appends `(multiple_outputs=...)` after the decorator expression. ### Detection signals The rule fires when **either** of two independent signals holds and `multiple_outputs` is absent from the decorator: 1. **Annotation signal** — the return annotation resolves to a member of the Mapping family: `dict`, `Dict`, `Mapping`, `MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` (and the `collections` / `typing` variants thereof). 2. **Body signal** — the function body contains at least one `return` statement whose value is a dict literal, a dict comprehension, or a `dict(...)` call. `ReturnStatementVisitor` is used so returns inside nested `if` / `for` / `while` / `try` blocks are considered, while returns inside nested function definitions or class bodies are correctly **not** considered. ### What it does NOT flag - Functions where `multiple_outputs=` is already set explicitly (either `True` or `False`). - `@task.sensor`-decorated functions — the sensor decorator hardcodes `multiple_outputs=False`. - Functions whose only dict-returning code lives in a nested function or class method (the outer task is not affected). - Functions whose return annotation is not in the Mapping family and whose body does not return a dict-shaped value. ### Implementation details - Resolves `@task` / `@task.<variant>` via the semantic model, handling both bare (`@task`) and call (`@task(...)`) forms via `map_callable`. The supported variants list is local to the rule: `python`, `virtualenv`, `external_python`, `branch`, `branch_virtualenv`, `branch_external_python`, `short_circuit`, `docker`, `kubernetes`, `pyspark`. - Annotation matching walks subscripts (e.g., `dict[str, int]`, `Mapping[str, Any]`) down to the head and uses `match_typing_expr` / known-module resolution. - Body matching uses `ReturnStatementVisitor` from `ruff_python_ast::helpers`, which already skips nested function and class definitions — so nested-scope dict returns are correctly ignored without a custom traversal. - The inserted value (`True` / `False`) mirrors Airflow's inference: `True` when the annotation is in the Mapping family, `False` otherwise. When both signals fire but the annotation is non-Mapping (e.g., body returns a dict but the annotation is `list`), the inserted value is still `False` to preserve runtime behavior. ## Test Plan Added snapshot tests in `AIR202.py` covering: | Group | Cases | |-------|-------| | Annotation positives | `dict`, `dict[str, int]`, `Dict`, `Mapping`, `MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` | | Body positives | dict literal, dict comprehension, `dict(...)` call | | Decorator forms | bare `@task`, `@task()`, `@task(retries=3)`, multiline `@task(...)`, `@task.docker(image=...)`, `@task.virtualenv()`, `@task.short_circuit(trigger_rule=...)`, `@task.branch` | | Explicit-kwarg negatives | `multiple_outputs=True`, `multiple_outputs=False` — never flagged | | Sensor negative | `@task.sensor` returning a dict — never flagged | | Nested-scope negatives | outer task with inner function returning a dict; outer task with inner class whose method returns a dict — never flagged | A sweep against the local Apache Airflow checkout (`cargo run -p ruff -- check ~/airflow --no-cache --preview --select AIR202`) produced 19 diagnostics on the Airflow codebase itself, all of which were verified to be genuine cases of implicit `multiple_outputs` inference (no false positives). Related: - apache/airflow#43176 - apache/airflow#66712 Community approval: https://lists.apache.org/thread/99s321s2p5xqhzlm0x5wjgy044nggtpm --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…02`) (astral-sh#25152) > [!IMPORTANT] > Disclosure: this PR description was AI-drafted under my direction and reviewed by me before posting. ## Summary Implements rule `AIR202` (`airflow-task-implicit-multiple-outputs`) that flags `@task`-decorated functions whose `multiple_outputs` behavior is determined by Airflow's runtime inference rather than being set explicitly on the decorator. At runtime, Airflow's [`_infer_multiple_outputs`](https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/bases/decorator.py) inspects the function's return annotation: if it resolves to a subclass of `collections.abc.Mapping`, the return value is split into one `XCom` per key; otherwise the entire return value is stored as a single `XCom`. This couples typing to `XCom` layout in a non-obvious way — renaming, removing, or refining the return annotation silently changes a DAG's `XCom` behavior. Passing `multiple_outputs=` explicitly makes the author's intent clear and insulates the DAG from future changes to inference. ### What the rule flags ```python from airflow.sdk import task # or equivalent Airflow 2.x import path # AIR202 — annotation in the Mapping family @task def extract() -> dict: return {"x": 1, "y": 2} # AIR202 — body returns a dict literal @task() def transform(): return {"a": 1} # AIR202 — call-form decorator with unrelated kwargs @task(retries=3) def with_retries() -> dict[str, int]: return {"n": 1} # AIR202 — variant decorators with parens @task.short_circuit(trigger_rule="all_done") def gate() -> dict: return {"go": True} ``` ### Suggested fix The fix is **always available** but marked **unsafe**: it inserts the value Airflow would have inferred, preserving runtime behavior at the moment of the rewrite, but the author may have intended a different `XCom` layout and a multi-return function may not always return a dict. ```python @task(multiple_outputs=True) def extract() -> dict: return {"x": 1, "y": 2} @task(multiple_outputs=True) def transform(): return {"a": 1} @task(retries=3, multiple_outputs=True) def with_retries() -> dict[str, int]: return {"n": 1} ``` For the call-form path (`@task(...)` with existing parentheses), the fix uses `add_argument` so existing kwargs and formatting are preserved, including on multiline decorators. For the bare form (`@task`, `@task.branch`), the fix appends `(multiple_outputs=...)` after the decorator expression. ### Detection signals The rule fires when **either** of two independent signals holds and `multiple_outputs` is absent from the decorator: 1. **Annotation signal** — the return annotation resolves to a member of the Mapping family: `dict`, `Dict`, `Mapping`, `MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` (and the `collections` / `typing` variants thereof). 2. **Body signal** — the function body contains at least one `return` statement whose value is a dict literal, a dict comprehension, or a `dict(...)` call. `ReturnStatementVisitor` is used so returns inside nested `if` / `for` / `while` / `try` blocks are considered, while returns inside nested function definitions or class bodies are correctly **not** considered. ### What it does NOT flag - Functions where `multiple_outputs=` is already set explicitly (either `True` or `False`). - `@task.sensor`-decorated functions — the sensor decorator hardcodes `multiple_outputs=False`. - Functions whose only dict-returning code lives in a nested function or class method (the outer task is not affected). - Functions whose return annotation is not in the Mapping family and whose body does not return a dict-shaped value. ### Implementation details - Resolves `@task` / `@task.<variant>` via the semantic model, handling both bare (`@task`) and call (`@task(...)`) forms via `map_callable`. The supported variants list is local to the rule: `python`, `virtualenv`, `external_python`, `branch`, `branch_virtualenv`, `branch_external_python`, `short_circuit`, `docker`, `kubernetes`, `pyspark`. - Annotation matching walks subscripts (e.g., `dict[str, int]`, `Mapping[str, Any]`) down to the head and uses `match_typing_expr` / known-module resolution. - Body matching uses `ReturnStatementVisitor` from `ruff_python_ast::helpers`, which already skips nested function and class definitions — so nested-scope dict returns are correctly ignored without a custom traversal. - The inserted value (`True` / `False`) mirrors Airflow's inference: `True` when the annotation is in the Mapping family, `False` otherwise. When both signals fire but the annotation is non-Mapping (e.g., body returns a dict but the annotation is `list`), the inserted value is still `False` to preserve runtime behavior. ## Test Plan Added snapshot tests in `AIR202.py` covering: | Group | Cases | |-------|-------| | Annotation positives | `dict`, `dict[str, int]`, `Dict`, `Mapping`, `MutableMapping`, `OrderedDict`, `DefaultDict`, `Counter`, `ChainMap`, `TypedDict` | | Body positives | dict literal, dict comprehension, `dict(...)` call | | Decorator forms | bare `@task`, `@task()`, `@task(retries=3)`, multiline `@task(...)`, `@task.docker(image=...)`, `@task.virtualenv()`, `@task.short_circuit(trigger_rule=...)`, `@task.branch` | | Explicit-kwarg negatives | `multiple_outputs=True`, `multiple_outputs=False` — never flagged | | Sensor negative | `@task.sensor` returning a dict — never flagged | | Nested-scope negatives | outer task with inner function returning a dict; outer task with inner class whose method returns a dict — never flagged | A sweep against the local Apache Airflow checkout (`cargo run -p ruff -- check ~/airflow --no-cache --preview --select AIR202`) produced 19 diagnostics on the Airflow codebase itself, all of which were verified to be genuine cases of implicit `multiple_outputs` inference (no false positives). Related: - apache/airflow#43176 - apache/airflow#66712 Community approval: https://lists.apache.org/thread/99s321s2p5xqhzlm0x5wjgy044nggtpm --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Reverts #65197
This PR is being reverted because
task.output["key"]is not always equivalent to{{ task_instance.xcom_pull("task_id")["key"] }}.The old form pulls the task’s default
return_valueXCom and indexes the returned dictionary. The new form may instead look for an XCom stored under the separate keykey. For operators that return a dict but do not push each field as an individual XCom key.