Allow Operators to specify SKIPPED status internally#1292
Conversation
|
|
|
I really like this idea. In fact I would like to use this to eliminate the short circuit operator (because even though it was created to make a confusing situation simpler, it's still pretty confusing). With a I still need to look at the implementation but here are a couple thoughts:
|
|
|
|
|
|
Coverage decreased (-0.009%) to 67.05% when pulling f7cb59302cb840b2ae83cc9e527ebf5ab27b6460 on withnale:skip_exception into 0bae60f on airbnb:master. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| self.assertEqual(ti.state, State.SUCCESS) | ||
|
|
||
| @parameterized.expand([ | ||
| # |
There was a problem hiding this comment.
Could you add a header comment here listing all the fields, maybe abbreviated? I know they're in the function signature but at first glance it's hard to tell which of the numbers (5, 0, 0, 0, 0, 0) goes with which field. A header would help.
|
|
|
|
|
|
|
Thank you! |
|
👏 As an aside, when using the Branch operator, the second assumption above is not exactly accurate, specifically I agree it is a stumbling point and hence welcome the work in this PR. In the above example, I branch at |

NB: For discussion at present. Not expecting this PR to be merged as-is
Allow Operators to specify SKIPPED status internally
At present there is no clean way to specify that a downstream tasks should be skipped. The current options are:
Rather than have operators manipulate the session directly this should be provided centrally.
PR: Specify SKIPPED via Exception
The following PullRequest implements an Exception (AirflowSkipException - although since it isn't flagging an error state maybe a name change?) which when generated by an Operator indicates that the job completed without errors but should not be considered as successful.
The scheduler during it's normal DAG traversal looking for task instances to schedule will the see the state 'SKIPPED' and will use this in its decision making.
An example DAG is provided which shows two distinct trees using this approach. One of them includes a join using trigger_rule='all_success' and the other uses the rule='one_success', which probably explains things better than I can. In this circumstance, the DummySkipOperator just raises AirflowSkipException immediately. The example DAG graph from the webserver can be seen below:
I have also included an additional parameter to the BaseSensorOperator called soft_fail (defaults to false). If this is set to true, the sensor will error with a SKIPPED rather than an error. This should allow general sensors to encapsulate optional logic without generating errors on the DAG webserver frontpage.