-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Revert "KPO Maintain backward compatibility for execute_complete and … #37446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…trigger run method (apache#37363)" This reverts commit 0640e6d.
|
Is this because some tests are failing on the main? CI was green on the PR branch. I'll investigate it. |
|
Hey @pankajastro - I believe (and have a demostrable proof) that #37363 broke some of the behaviour of KPO tests. The tests started to fail yesterday in main after the change has been merged and while the tests work in isolation, it seems that they introduce a some heavly fleaky behaviour for The async KPO behaviour when the system is busy - and I think it can have impact in production, so I think better is to revert it now and try to find out the root cause and fix. I was not able to pinpoint the root cause, but I believe it's caused by this change: It seems that on a busy system, the loop finishes much faster than it should and errors yout with The problem is that it does not happen even if you run a complete The issue is relatively easy (but a bit time consuming) to reproduce. On my local env, I can rather repeatably (same on CI) reproduce it with: The error is the same as in case of last few When I revert this one, it works nicely without errors. It's rather fishy and difficult to guess what's the problem. |
|
Thanks @potiuk for looking into it I'll check it. |
|
Yeah. It's rather strange - because generally the timeout should not happen until 120 seconds passes, and the longest test in the whole run in CI is 30s - so there is no way those failed tests took > 120 seconds. Really strange (but at least we have a recipe for reproduction). |
|
This means we are not ready for RC2 of |
|
Yeah, I think we need to wait for @pankajastro 's investigation :) |
|
For some tests mocking for trigger |
|
Merged! That explains it indeed, It did not fail in your PR, because it would generally work without mocking (and worked for me as well locally) if run in isolation / on not so busy system. Generally PRs will only run a subset of tests that selective checks find relevant. So in the original PR: #37363 only a small subsert of tests was run: In the new PR #37454 - I saw you added "full tests needed" and the set of tests was the same as in That's why I could quickly merge it, beibg more certain that the flakiness is fixed (because this time PR had the same set of tests). That's why original change was "green" because the set of tests run did not make the machine "busy enough" to trigger the flaky condition. In a way, our |
|
Thanks for super-quick diagnosis and fix @pankajastro BTW ! |
…trigger run method (#37363)"
This reverts commit 0640e6d.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.