bazci: enable --flaky_test_attempts=3 for Unit tests on release branches#86891
Conversation
|
The potential slow down may not be as bad as it seems because (a) we run a lot of sharded tests, (b) many tests are IO-bound. Thus, some of the retry attempts would be concurrent. E.g., a test emulated to flake with p=0.5 results in a very modest slow down, |
ebc557b to
9d2b3a5
Compare
Not sure I'm missing what you were trying to say here, but I would believe this change does change the status of CI, in the sense that now they'll start "passing" a lot more frequently, giving the impression that things suddenly got a lot more stable (I believe that majority of our flakes have a flake rate < 33%, so chances are pretty high that at least one attempt will succeed). |
Good point! and poor phrasing on my part; amending. |
9d2b3a5 to
93a206d
Compare
rickystewart
left a comment
There was a problem hiding this comment.
After c8d66a9c334dd87acd62190f314a4f3ec236ce82 the changes to bazci are unnecessary. Also I'll be deleting the bazci testdata in #87116. So you can rebase and just keep the change to build/teamcity/cockroach/ci/tests/unit_tests_impl.sh.
93a206d to
e234e32
Compare
|
@rickystewart @renatolabs PTAL. Ricky's PR to leverage BEP did the heavy lifting. So, the only change we're making at this time is adding Testing this locally I can see the retry attempt while the test passed. |
renatolabs
left a comment
There was a problem hiding this comment.
LGTM.
After this is merged, is there any automated way to find out how often tests are retried?
|
I'd also mention in the commit message "on master and release branches". |
We can use data from beaver hub to create something in Snowflake. This should do it: |
rickystewart
left a comment
There was a problem hiding this comment.
I have not validated that bazci works the way you describe, but I assume that your description is correct. Assuming that in each test.xml a failed test attempt followed by a success is not labeled as a failure, then this should be fine. AFAIC we can check this in and then keep an eye on CI results to make sure nothing looks wrong.
One question: with this change, flaky tests may become "invisible", i.e. we'll stop seeing failures for them, no GitHub issues will be posted, etc. Is this desirable?
When a test fails, --flaky_test_attempts=3 will cause up to two retry attempts by executing the corresponding test binary (shard). If a test passes on either attempt, it is reported 'FLAKY' in the test result summary. In addition, each attempt yields an (XML) log under 'test_attempts' subdirectory. The attempts are now copied into the artifactsDir. Note, while the test result summary may say the test is 'FLAKY', it's reported as 'PASS' as per main test.xml. Thus, a 'FLAKY' test will not fail the CI; i.e., a test succeeding after up to two retries is assumed to have _passed_ whereas three consecutive failures result in a failed test. Consequently, CI now has a higher probability of passing than before this change. However, the two additional attempts per test binary (shard) could slow down the build. Hence, initially the plan is to enable only on release branches and not staging (i.e., bors). This change supports [1]. In a follow-up PR, a test reporter could be augmented to parse attempt logs thereby obtaining a set of FLAKY tests which should be skipped on subsequent builds. [1] cockroachdb#81293 Release justification: CI improvement Release note: None
e234e32 to
c85a03f
Compare
|
TFTR! bors r=rickystewart,healthy-pod,renatolabs |
|
Build failed (retrying...): |
|
Build succeeded: |
When a test fails, --flaky_test_attempts=3 will cause up to two retry
attempts by executing the corresponding test binary (shard).
If a test passes on either attempt, it is reported 'FLAKY' in the
test result summary. In addition, each attempt yields an (XML) log
under 'test_attempts' subdirectory. The attempts are now copied
into the artifactsDir.
Note, while the test result summary may say the test is 'FLAKY', it's
reported as 'PASS' as per main test.xml. Thus, a 'FLAKY' test will
not fail the CI; i.e., a test succeeding after up to two
retries is assumed to have passed whereas three consecutive
failures result in a failed test. Consequently, CI now has a higher
probability of passing than before this change. However, the two additional
attempts per test binary (shard) could slow down the build. Hence, initially
the plan is to enable only on release branches and not staging (i.e., bors).
This change supports [1]. In a follow-up PR, a test reporter could be
augmented to parse attempt logs thereby obtaining a set of FLAKY tests
which should be skipped on subsequent builds.
[1] #81293
Release justification: CI improvement
Release note: None
Epic: None