KAFKA-17479 Fix ignoreFailures logic in CI workflow [3/n] by mumrah · Pull Request #17106 · apache/kafka

mumrah · 2024-09-05T22:35:12Z

This property was removed in #17066 to prevent test failures from being cached. However, this breaks the JUnit report and makes the github workflow less user friendly.

The problem is that we are copying the junit test report files into a new directory (added in #17098) in a Gradle doLast closure. If we don't run with ignoreFailures=true, then this closure will not run and the test failures won't be processed by junit.py.

This patch adds logic to ensure the doLast closure of :test is always run. The user provided -PignoreFailures is still honored for the test tasks.

mumrah · 2024-09-05T22:42:06Z

Here is how I verified this fix.

Starting with empty build cache (rm -r ~/.gradle/caches/build-cache-1), I introduced a test that always fails and ran the following gradlew command.

./gradlew --info --build-cache --no-scan --continue -PignoreFailures=true :metadata:test

The tests all run, Gradle exits with return code 1, and the following output is produced:

> Task :metadata:test FAILED

933 tests completed, 1 failed, 1 skipped
Finished generating test XML results (0.036 secs) into: /Users/davidarthur/Code/Apache/kafka/metadata/build/test-results/test
Generating HTML test report...
Finished generating test html results (0.043 secs) into: /Users/davidarthur/Code/Apache/kafka/metadata/build/reports/tests/test
There were failing tests. See the report at: file:///Users/davidarthur/Code/Apache/kafka/metadata/build/reports/tests/test/index.html
Copy JUnit XML for metadata to /Users/davidarthur/Code/Apache/kafka/build/junit-xml/metadata
[ant:copy] Copying 105 files to /Users/davidarthur/Code/Apache/kafka/build/junit-xml/metadata

FAILURE: Build failed with an exception.

* Where:
Build file '/Users/davidarthur/Code/Apache/kafka/build.gradle' line: 528

* What went wrong:
Execution failed for task ':metadata:test'.
> Failing this task since 'metadata:test' had test failures.

BUILD FAILED in 14s
31 actionable tasks: 7 executed, 24 up-to-date

Notably, we see the JUnit XML files moved to the right place and the task ending with failure.

If run again with no changes, the outcome is the same. After removing the failing test and re-running the gradlew command, I see the following:

> Task :metadata:test
Finished generating test XML results (0.046 secs) into: /Users/davidarthur/Code/Apache/kafka/metadata/build/test-results/test
Generating HTML test report...
Finished generating test html results (0.03 secs) into: /Users/davidarthur/Code/Apache/kafka/metadata/build/reports/tests/test
Copy JUnit XML for metadata to /Users/davidarthur/Code/Apache/kafka/build/junit-xml/metadata
[ant:copy] Copying 105 files to /Users/davidarthur/Code/Apache/kafka/build/junit-xml/metadata
Stored cache entry for task ':metadata:test' with cache key b2da6d0b8f4fbf99a090c0287dbf2f2c

BUILD SUCCESSFUL in 18s
31 actionable tasks: 6 executed, 25 up-to-date

The tests are all run, Gradle exits with 0, and we see the cache get populated. Now re-running the same gradlew command results in

> Task :metadata:test UP-TO-DATE
Custom actions are attached to task ':metadata:test'.
Build cache key for task ':metadata:test' is b2da6d0b8f4fbf99a090c0287dbf2f2c
Skipping task ':metadata:test' as it is up-to-date.

BUILD SUCCESSFUL in 810ms
31 actionable tasks: 31 up-to-date

No XML files moved, exit code 0.

chia7712

@mumrah thanks for this patch!

chia7712 · 2024-09-06T01:59:09Z

+
+      // If there were any test failures, we want to fail the task to prevent the failures
+      // from being cached.
+      if (ext.hadFailure) {


Should we honor the "ignoreFailures=true" in this condition?

oh, you have fixed it 😄

😄 I was thinking the same. I just pushed an update to preserve the existing ignoreFailures behavior

chia7712

@mumrah thanks for your patch. please take a look at two minor comments

mumrah · 2024-09-06T15:17:10Z

Locally, I added a test that will always be flaky (fail, then pass) in the :metadata module. Here are the results with the latest code 111fcf7.

Before each run, I emptied the build cache. Here is the base command I used:

./gradlew --build-cache --info --no-scan --continue :metadata:clean :metadata:test

Results

env	maxTestRetries	ignoreFailures	gradle exit	xml moved	cache saved	junit.py	comment
-	0	false	1	no	no	-	default invocation
-	1	false	0	no	yes	-	-
-	0	true	0	no	yes	-	-
-	1	true	0	no	yes	-	-
GITHUB_ACTIONS	0	false (i)	1	yes	no	1	-
GITHUB_ACTIONS	1	false (i)	1	yes	no	0	suggested CI invocation (ii)

i) ignoreFailures is overwritten by GITHUB_ACTIONS
ii) relies on "Parse JUnit tests" to fail the overall pipeline

chia7712 · 2024-09-06T16:27:28Z

+
+        // If there were any test failures, we want to fail the task to prevent the failures
+        // from being cached.
+        if (ext.hadFailure) {


It seems the retry issue is still existent in normal PR. We can't use the "retry" to convert the result from "failed" to "successful". Maybe we can remove the -PmaxTestRetries=1 -PmaxTestRetryFailures=10 directly? Personally, I'd like to make tests stable instead of retry.

Personally, I'd like to make tests stable instead of retry.

Yes, agreed. 😄 Until we get the tests in better shape, I think the retry thing is okay.

It seems the retry issue is still existent in normal PR

Normal PR will look like:

Any failure (flaky or not) will cause Gradle to exit with 1

continue-on-error: true will mark that step green (I think... need to verify this)

"Parse JUnit tests" will process the results correctly and exit 0 if only flaky, exit 1 if any true failures

So the overall CI workflow will be green if there are only flaky failures. Red if there are any real failures.

So the overall CI workflow will be green if there are only flaky failures. Red if there are any real failures.

Got it. Thanks for detailed explanation

chia7712

LGTM

mumrah · 2024-09-06T18:47:53Z

@chia7712 I realized I still did not address the timeout case. I'll fix that in a follow-up shortly.

The ignoreFailures property was removed in apache#17066 to prevent test failures from being cached. However, this breaks the JUnit report and makes the github workflow less user friendly. The problem is that we are copying the junit test report files into a new directory (added in apache#17098) in a Gradle doLast closure. If we don't run with ignoreFailures=true, then this closure will not run and the test failures won't be processed by junit.py. This patch adds logic to ensure the doLast closure of :test is always run. The user provided -PignoreFailures is still honored for the test tasks so local developer workflows should not be disturbed. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

mumrah added 2 commits September 5, 2024 16:42

add back ignoreFailures

5d039f7

fix doLast closure

44a4118

mumrah added the build Gradle build or GitHub Actions label Sep 5, 2024

mumrah changed the title ~~MINOR Fix ignoreFailures logic in CI workflow~~ KAFKA-17479 Fix ignoreFailures logic in CI workflow [3/n] Sep 5, 2024

mumrah mentioned this pull request Sep 5, 2024

MINOR Enable the GitHub build by default #17105

Merged

mumrah requested a review from chia7712 September 6, 2024 01:02

keep existing ignoreFailures behavior

d6f821d

chia7712 reviewed Sep 6, 2024

View reviewed changes

mumrah requested a review from chia7712 September 6, 2024 13:40

chia7712 reviewed Sep 6, 2024

View reviewed changes

Comment thread build.gradle Outdated

Comment thread build.gradle

only force failure if GITHUB_ACTIONS set

111fcf7

mumrah requested a review from chia7712 September 6, 2024 15:25

chia7712 reviewed Sep 6, 2024

View reviewed changes

chia7712 approved these changes Sep 6, 2024

View reviewed changes

mumrah merged commit 62379d7 into trunk Sep 6, 2024

mumrah deleted the gh-ignoreFailures branch September 6, 2024 18:47

mumrah mentioned this pull request Sep 6, 2024

KAFKA-17479 Fail the whole pipeline if junit step times out [4/n] #17121

Merged

Conversation

mumrah commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mumrah commented Sep 5, 2024

Uh oh!

chia7712 left a comment

Choose a reason for hiding this comment

Uh oh!

chia7712 Sep 6, 2024

Choose a reason for hiding this comment

Uh oh!

chia7712 Sep 6, 2024

Choose a reason for hiding this comment

Uh oh!

mumrah Sep 6, 2024

Choose a reason for hiding this comment

Uh oh!

chia7712 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mumrah commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chia7712 Sep 6, 2024

Choose a reason for hiding this comment

Uh oh!

mumrah Sep 6, 2024

Choose a reason for hiding this comment

Uh oh!

chia7712 Sep 6, 2024

Choose a reason for hiding this comment

Uh oh!

chia7712 left a comment

Choose a reason for hiding this comment

Uh oh!

mumrah commented Sep 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mumrah commented Sep 5, 2024 •

edited

Loading

mumrah commented Sep 6, 2024 •

edited

Loading