KAFKA-17479 Fix ignoreFailures logic in CI workflow [3/n]#17106
Conversation
|
Here is how I verified this fix. Starting with empty build cache (rm -r ~/.gradle/caches/build-cache-1), I introduced a test that always fails and ran the following gradlew command. The tests all run, Gradle exits with return code 1, and the following output is produced: Notably, we see the JUnit XML files moved to the right place and the task ending with failure. If run again with no changes, the outcome is the same. After removing the failing test and re-running the gradlew command, I see the following: The tests are all run, Gradle exits with 0, and we see the cache get populated. Now re-running the same gradlew command results in No XML files moved, exit code 0. |
|
|
||
| // If there were any test failures, we want to fail the task to prevent the failures | ||
| // from being cached. | ||
| if (ext.hadFailure) { |
There was a problem hiding this comment.
Should we honor the "ignoreFailures=true" in this condition?
There was a problem hiding this comment.
😄 I was thinking the same. I just pushed an update to preserve the existing ignoreFailures behavior
|
Locally, I added a test that will always be flaky (fail, then pass) in the Before each run, I emptied the build cache. Here is the base command I used: Results
i) ignoreFailures is overwritten by GITHUB_ACTIONS |
|
|
||
| // If there were any test failures, we want to fail the task to prevent the failures | ||
| // from being cached. | ||
| if (ext.hadFailure) { |
There was a problem hiding this comment.
It seems the retry issue is still existent in normal PR. We can't use the "retry" to convert the result from "failed" to "successful". Maybe we can remove the -PmaxTestRetries=1 -PmaxTestRetryFailures=10 directly? Personally, I'd like to make tests stable instead of retry.
There was a problem hiding this comment.
Personally, I'd like to make tests stable instead of retry.
Yes, agreed. 😄 Until we get the tests in better shape, I think the retry thing is okay.
It seems the retry issue is still existent in normal PR
Normal PR will look like:
- Any failure (flaky or not) will cause Gradle to exit with 1
continue-on-error: truewill mark that step green (I think... need to verify this)- "Parse JUnit tests" will process the results correctly and exit 0 if only flaky, exit 1 if any true failures
So the overall CI workflow will be green if there are only flaky failures. Red if there are any real failures.
There was a problem hiding this comment.
So the overall CI workflow will be green if there are only flaky failures. Red if there are any real failures.
Got it. Thanks for detailed explanation
|
@chia7712 I realized I still did not address the timeout case. I'll fix that in a follow-up shortly. |
The ignoreFailures property was removed in apache#17066 to prevent test failures from being cached. However, this breaks the JUnit report and makes the github workflow less user friendly. The problem is that we are copying the junit test report files into a new directory (added in apache#17098) in a Gradle doLast closure. If we don't run with ignoreFailures=true, then this closure will not run and the test failures won't be processed by junit.py. This patch adds logic to ensure the doLast closure of :test is always run. The user provided -PignoreFailures is still honored for the test tasks so local developer workflows should not be disturbed. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This property was removed in #17066 to prevent test failures from being cached. However, this breaks the JUnit report and makes the github workflow less user friendly.
The problem is that we are copying the junit test report files into a new directory (added in #17098) in a Gradle
doLastclosure. If we don't run withignoreFailures=true, then this closure will not run and the test failures won't be processed by junit.py.This patch adds logic to ensure the doLast closure of
:testis always run. The user provided-PignoreFailuresis still honored for the test tasks.