Skip to content

add static checker for preventing to increase dag version#59430

Merged
potiuk merged 12 commits intoapache:mainfrom
wjddn279:add-static-checker-for-dag-parsing
Jan 20, 2026
Merged

add static checker for preventing to increase dag version#59430
potiuk merged 12 commits intoapache:mainfrom
wjddn279:add-static-checker-for-dag-parsing

Conversation

@wjddn279
Copy link
Copy Markdown
Contributor

Motivation

We observed that when runtime-varying values are used as arguments in DAG or Task constructors in Airflow, the DAG version increases infinitely. slack #55768 (comment)

Checking for DAG version increments at runtime is difficult. The most accurate detection method would be to parse the DAG object twice and compare if values differ. However, this would nearly double the DAG parsing execution time.

Therefore, I add a feature that exposes DAG warnings for these cases through AST-based static analysis before parsing in the dag-processor. While it cannot cover 100% of DAG usage patterns, it can cover most cases and has minimal performance impact (since ast.parse already runs on every DAG parse).

Logic of static check

The logic for detecting problematic situations through static check is as follows (I named this issue "runtime-varying"):

  1. Statically analyze a single DAG file through ast.parse.

  2. Traverse each node and check the following:

  • Has a variable been assigned a runtime-varying value? → This is to check if that variable is passed as an argument to a DAG or Task instance.
from datetime import datetime
import random as rd

start_date = datetime.now() # checked as tainted value
random_value = f"random_{rd.randint(1,1000)" # checked as tainted value
default_args = {'start_date': start_date} # checked as tainted value
  1. Check if the object is a DAG or Task declaration statement, and verify if runtime-varying variables or function calls are passed as arguments.
  • Check if it's a DAG declaration statement → We categorized DAG object definitions into 3 cases:
from airflow import DAG
from airflow.decorators import dag

dag = DAG(dag_id='dag_id, default_args=default_args) # DAG object definition imported from airflow module

with DAG(dag_id='dag_id, default_args=default_args) as dag: # Defined as context manager in with statement

@dag(dag_id='dag_id, default_args=default_args) # Defined via dag decorator
  • Check if it's a Task declaration statement → This case can be categorized into 2 types:
task1 = PythonOperator(task_id='task_id', dag=dag) # When the DAG object checked above is passed as an argument

with DAG(dag_id='dag_id, default_args=default_args) as dag:
     task2 = PythonOperator(task_id='task_id') # Function calls inside the with block where DAG context manager is declared

The cases covered by static checks are described in detail in the unit test code.

User Notification for Static Check Errors

I considered that static check failures are not severe enough to cause DAG parsing to fail, so I added them to DAG warnings. Warnings are added to DAGs generated from the DAG file and displayed in the UI as shown below. There seems to be an issue where \n characters in messages are ignored when displayed in the UI, which we plan to fix in a future PR.

image

future work

If this PR is merged, the following items are planned for future work:

  • Merge the existing ast.parse with the ast.parse executed in this subprocess.
  • Fix the UI that displays DAG warnings.
  • Make DAG warnings more visible by displaying them in the DAG list as well.
  • Document the cases where DAG version increases infinitely and the coverage scope of this static check.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:DAG-processing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants