KAFKA-17479 Allow ":jar" tasks to be cached [1/n]#17066
Conversation
|
I am going to verify this fix over at mumrah/kafka |
|
In this build we can see that all the tasks, including tests, are FROM-CACHE https://github.com/mumrah/kafka/actions/runs/10648128235/job/29516852943 However, the test results were included in the cache, which is kind of questionable. |
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This will prevent these reports from being cached.
| def __repr__(self): | ||
| return f"{self.class_name} {self.test_name}" |
There was a problem hiding this comment.
Unrelated to this PR, but was causing issues in some cases where a test stacktrace was really long
|
This has been fixed and verified over at https://github.com/mumrah/kafka/actions/runs/10693770259. We'll merge it in two parts. The first part is this PR which fixes the ":jar" task and makes trunk use the |
| run: | | ||
| ./gradlew --build-cache --scan --continue \ | ||
| -PtestLoggingEvents=started,passed,skipped,failed \ | ||
| -PignoreFailures=true -PmaxParallelForks=2 \ |
There was a problem hiding this comment.
-PignoreFailures=true causes Gradle to consider the build successful, even if a test task fails. This means that failing tests would be cached, which we don't want.
| inputs.property "commitId", commitId | ||
| inputs.property "version", version | ||
| outputs.file receiptFile | ||
| outputs.cacheIf { true } |
There was a problem hiding this comment.
That means the jars could have old kafka-version.properties when its module has no change. This change is for CI, right? we want to cache the jar if no source files change. Maybe we can enable the cacheIf when the env GITHUB_ACTIONS exists. Otherwise, we keep origin behavior. WDYT?
There was a problem hiding this comment.
outputs.cacheIf is one way to tell Gradle that a task is cacheable https://docs.gradle.org/current/userguide/build_cache.html#using_the_runtime_api.
Even with this declaration, the task will only be up-to-date / cacheable if the inputs are unchanged.
Taking the previous task definition:
task createVersionFile() {
inputs.property "commitId", commitId
inputs.property "version", version
outputs.file receiptFile
For incremental builds, this task will be considered UP-TO-DATE as long as the two inputs don't change. In order to actually allow this output to be cached, we need to tell gradle that it's cacheable. That's done by adding the @CacheableTask for custom task classes, or by adding outputs.cacheIf { true }
Edit: sorry, I misinterpreted your question I think. I'll leave the original comment here anyways.
That means the jars could have old kafka-version.properties when its module has no change
The contents of kafka-version.properties will be the same because we're overriding the commit ID with -PcommitId=xxxxxxxxxxxxxxxx.
| -PignoreFailures=true -PmaxParallelForks=2 \ | ||
| -PmaxParallelForks=2 \ | ||
| -PmaxTestRetries=1 -PmaxTestRetryFailures=10 \ | ||
| -PcommitId=xxxxxxxxxxxxxxxx \ |
There was a problem hiding this comment.
No, this is needed for the CI to avoid breaking the cache for the :jar and other downstream tasks. Otherwise, we could never cache things like kafka-clients.jar because its contents would change on every commit, regardless of changes to the client source files.
We really only need this kafka-version.properties file to include an actual Git SHA for release artifacts.
The ignoreFailures property was removed in #17066 to prevent test failures from being cached. However, this breaks the JUnit report and makes the github workflow less user friendly. The problem is that we are copying the junit test report files into a new directory (added in #17098) in a Gradle doLast closure. If we don't run with ignoreFailures=true, then this closure will not run and the test failures won't be processed by junit.py. This patch adds logic to ensure the doLast closure of :test is always run. The user provided -PignoreFailures is still honored for the test tasks so local developer workflows should not be disturbed. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
For several modules, we include a kafka-version.properties in the Jar file. This file includes the Git SHA of the project at the time of the build. This means that even if no source files change, the :jar task will never be UP-TO-DATE between two git commits. Ultimately, this breaks Gradle caching. This patch marks all of the createVersionFile tasks as cacheable and also changes our Gradle invocation to override the commit ID to a dummy static value. This will allow the :jar task to be cacheable and reusable between builds. This patch also configures the trunk build to only write to the build cache and not read from it. This will prevent any cache pollution/corruption from propagating from build to build. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
The ignoreFailures property was removed in apache#17066 to prevent test failures from being cached. However, this breaks the JUnit report and makes the github workflow less user friendly. The problem is that we are copying the junit test report files into a new directory (added in apache#17098) in a Gradle doLast closure. If we don't run with ignoreFailures=true, then this closure will not run and the test failures won't be processed by junit.py. This patch adds logic to ensure the doLast closure of :test is always run. The user provided -PignoreFailures is still honored for the test tasks so local developer workflows should not be disturbed. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
For several modules, we include a
kafka-version.propertiesin the Jar file. This file includes the Git SHA of the project at the time of the build. This means that even if no source files change, the:jartask will never be UP-TO-DATE between two git commits. Ultimately, this breaks Gradle caching.This patch marks all of the
createVersionFiletasks as cacheable and also changes our Gradle invocation to override the commit ID to a dummy static value. This will allow the:jartask to be cacheable and reusable between builds.This patch also configures the trunk build to only write to the build cache and not read from it. This will prevent any cache pollution/corruption from propagating from build to build.