Skip to content

KAFKA-17479 Fix ignoreFailures logic in CI workflow [3/n]#17106

Merged
mumrah merged 4 commits into
trunkfrom
gh-ignoreFailures
Sep 6, 2024
Merged

KAFKA-17479 Fix ignoreFailures logic in CI workflow [3/n]#17106
mumrah merged 4 commits into
trunkfrom
gh-ignoreFailures

Conversation

@mumrah

@mumrah mumrah commented Sep 5, 2024

Copy link
Copy Markdown
Member

This property was removed in #17066 to prevent test failures from being cached. However, this breaks the JUnit report and makes the github workflow less user friendly.

The problem is that we are copying the junit test report files into a new directory (added in #17098) in a Gradle doLast closure. If we don't run with ignoreFailures=true, then this closure will not run and the test failures won't be processed by junit.py.

This patch adds logic to ensure the doLast closure of :test is always run. The user provided -PignoreFailures is still honored for the test tasks.

@mumrah mumrah added the build Gradle build or GitHub Actions label Sep 5, 2024
@mumrah mumrah changed the title MINOR Fix ignoreFailures logic in CI workflow KAFKA-17479 Fix ignoreFailures logic in CI workflow [3/n] Sep 5, 2024
@mumrah

mumrah commented Sep 5, 2024

Copy link
Copy Markdown
Member Author

Here is how I verified this fix.

Starting with empty build cache (rm -r ~/.gradle/caches/build-cache-1), I introduced a test that always fails and ran the following gradlew command.

./gradlew --info --build-cache --no-scan --continue -PignoreFailures=true :metadata:test

The tests all run, Gradle exits with return code 1, and the following output is produced:

> Task :metadata:test FAILED

933 tests completed, 1 failed, 1 skipped
Finished generating test XML results (0.036 secs) into: /Users/davidarthur/Code/Apache/kafka/metadata/build/test-results/test
Generating HTML test report...
Finished generating test html results (0.043 secs) into: /Users/davidarthur/Code/Apache/kafka/metadata/build/reports/tests/test
There were failing tests. See the report at: file:///Users/davidarthur/Code/Apache/kafka/metadata/build/reports/tests/test/index.html
Copy JUnit XML for metadata to /Users/davidarthur/Code/Apache/kafka/build/junit-xml/metadata
[ant:copy] Copying 105 files to /Users/davidarthur/Code/Apache/kafka/build/junit-xml/metadata

FAILURE: Build failed with an exception.

* Where:
Build file '/Users/davidarthur/Code/Apache/kafka/build.gradle' line: 528

* What went wrong:
Execution failed for task ':metadata:test'.
> Failing this task since 'metadata:test' had test failures.

BUILD FAILED in 14s
31 actionable tasks: 7 executed, 24 up-to-date

Notably, we see the JUnit XML files moved to the right place and the task ending with failure.

If run again with no changes, the outcome is the same. After removing the failing test and re-running the gradlew command, I see the following:

> Task :metadata:test
Finished generating test XML results (0.046 secs) into: /Users/davidarthur/Code/Apache/kafka/metadata/build/test-results/test
Generating HTML test report...
Finished generating test html results (0.03 secs) into: /Users/davidarthur/Code/Apache/kafka/metadata/build/reports/tests/test
Copy JUnit XML for metadata to /Users/davidarthur/Code/Apache/kafka/build/junit-xml/metadata
[ant:copy] Copying 105 files to /Users/davidarthur/Code/Apache/kafka/build/junit-xml/metadata
Stored cache entry for task ':metadata:test' with cache key b2da6d0b8f4fbf99a090c0287dbf2f2c

BUILD SUCCESSFUL in 18s
31 actionable tasks: 6 executed, 25 up-to-date

The tests are all run, Gradle exits with 0, and we see the cache get populated. Now re-running the same gradlew command results in

> Task :metadata:test UP-TO-DATE
Custom actions are attached to task ':metadata:test'.
Build cache key for task ':metadata:test' is b2da6d0b8f4fbf99a090c0287dbf2f2c
Skipping task ':metadata:test' as it is up-to-date.

BUILD SUCCESSFUL in 810ms
31 actionable tasks: 31 up-to-date

No XML files moved, exit code 0.

@chia7712 chia7712 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mumrah thanks for this patch!

Comment thread build.gradle Outdated

// If there were any test failures, we want to fail the task to prevent the failures
// from being cached.
if (ext.hadFailure) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we honor the "ignoreFailures=true" in this condition?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you have fixed it 😄

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 I was thinking the same. I just pushed an update to preserve the existing ignoreFailures behavior

@mumrah mumrah requested a review from chia7712 September 6, 2024 13:40

@chia7712 chia7712 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mumrah thanks for your patch. please take a look at two minor comments

Comment thread build.gradle Outdated
Comment thread build.gradle
@mumrah

mumrah commented Sep 6, 2024

Copy link
Copy Markdown
Member Author

Locally, I added a test that will always be flaky (fail, then pass) in the :metadata module. Here are the results with the latest code 111fcf7.

Before each run, I emptied the build cache. Here is the base command I used:

./gradlew --build-cache --info --no-scan --continue :metadata:clean :metadata:test

Results

env maxTestRetries ignoreFailures gradle exit xml moved cache saved junit.py comment
- 0 false 1 no no - default invocation
- 1 false 0 no yes - -
- 0 true 0 no yes - -
- 1 true 0 no yes - -
GITHUB_ACTIONS 0 false (i) 1 yes no 1 -
GITHUB_ACTIONS 1 false (i) 1 yes no 0 suggested CI invocation (ii)

i) ignoreFailures is overwritten by GITHUB_ACTIONS
ii) relies on "Parse JUnit tests" to fail the overall pipeline

@mumrah mumrah requested a review from chia7712 September 6, 2024 15:25
Comment thread build.gradle

// If there were any test failures, we want to fail the task to prevent the failures
// from being cached.
if (ext.hadFailure) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the retry issue is still existent in normal PR. We can't use the "retry" to convert the result from "failed" to "successful". Maybe we can remove the -PmaxTestRetries=1 -PmaxTestRetryFailures=10 directly? Personally, I'd like to make tests stable instead of retry.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd like to make tests stable instead of retry.

Yes, agreed. 😄 Until we get the tests in better shape, I think the retry thing is okay.

It seems the retry issue is still existent in normal PR

Normal PR will look like:

  • Any failure (flaky or not) will cause Gradle to exit with 1
  • continue-on-error: true will mark that step green (I think... need to verify this)
  • "Parse JUnit tests" will process the results correctly and exit 0 if only flaky, exit 1 if any true failures

So the overall CI workflow will be green if there are only flaky failures. Red if there are any real failures.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the overall CI workflow will be green if there are only flaky failures. Red if there are any real failures.

Got it. Thanks for detailed explanation

@chia7712 chia7712 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mumrah mumrah merged commit 62379d7 into trunk Sep 6, 2024
@mumrah

mumrah commented Sep 6, 2024

Copy link
Copy Markdown
Member Author

@chia7712 I realized I still did not address the timeout case. I'll fix that in a follow-up shortly.

@mumrah mumrah deleted the gh-ignoreFailures branch September 6, 2024 18:47
tedyu pushed a commit to tedyu/kafka that referenced this pull request Jan 6, 2025
The ignoreFailures property was removed in apache#17066 to prevent test failures from being cached. However, this breaks the JUnit report and makes the github workflow less user friendly.

The problem is that we are copying the junit test report files into a new directory (added in apache#17098) in a Gradle doLast closure. If we don't run with ignoreFailures=true, then this closure will not run and the test failures won't be processed by junit.py.

This patch adds logic to ensure the doLast closure of :test is always run. The user provided -PignoreFailures is still honored for the test tasks so local developer workflows should not be disturbed.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Gradle build or GitHub Actions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants