-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Fix the logic of checking dataflow job state #34785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
| if current_state == self._expected_terminal_state: | ||
| if self._expected_terminal_state == DataflowJobStatus.JOB_STATE_RUNNING: | ||
| return not self._wait_until_finished |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never used this operator, but I wonder what is _expected_terminal_state and why we may return False if _wait_until_finished is set the True and the state is equals to the provided expected terminal state 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_expected_terminal_state is optional parameter indicating which dataflow job state is considered as a success (discussed in #8553 (comment)).
I think the coexistence of two similar parameters, _wait_until_finished and _expected_terminal_state, causes this weird implementation (What _wait_until_finished is True has the same meaning as what _expected_terminal_state is DataflowJobStatus.JOB_STATE_DONE). So, it would be better to remove _wait_until_finished if we adopt _expected_terminal_state 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we may return False if _wait_until_finished is set the True and the state is equals to the provided expected terminal state
Based on the above, this is because _wait_until_finished was applied preferentially compared to _expected_terminal_state in the original implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any solution regarding this problem? I am facing the same problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asafeunico As the temporary solution, you can use the version 10.8.0 of apache-airflow-providers-google without this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm the author of #34217 and sincerely apologize for the mess!
I ensured that the existing tests passed, but I didn't consider adding the JOB_STATE_DONE - which is problematic for that if case.
As @champon1020 stated, _wait_until_finished should have been applied preferentially compared to expected_terminal_state to avoid breaking default behavior.
I agree that having both params is somewhat confusing, and we should revise deprecating _wait_until_finished in the future.
The suggested solution LGTM, thank you for fixing it up!
| if current_state == self._expected_terminal_state: | ||
| if self._expected_terminal_state == DataflowJobStatus.JOB_STATE_RUNNING: | ||
| return not self._wait_until_finished |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm the author of #34217 and sincerely apologize for the mess!
I ensured that the existing tests passed, but I didn't consider adding the JOB_STATE_DONE - which is problematic for that if case.
As @champon1020 stated, _wait_until_finished should have been applied preferentially compared to expected_terminal_state to avoid breaking default behavior.
I agree that having both params is somewhat confusing, and we should revise deprecating _wait_until_finished in the future.
The suggested solution LGTM, thank you for fixing it up!
|
@eladkal Could you please take a look? Thanks! |
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
Fix the the logic of checking dataflow job state in _check_dataflow_job_state method.
Closes: #34767
The
expected_terminal_stateparameter was introduced to Dataflow operators in #34217.However, it raises the exception when
wait_until_finished = Truealthough the dataflow job state isJOB_STATE_DONE.Then, I divides the logic of checking terminal state into the case that
expected_terminal_stateisJOB_STATE_RUNNINGand others.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.