Skip to content

[wip] skip tests based on central registry#81293

Closed
tbg wants to merge 1 commit intocockroachdb:masterfrom
tbg:easy-skip
Closed

[wip] skip tests based on central registry#81293
tbg wants to merge 1 commit intocockroachdb:masterfrom
tbg:easy-skip

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented May 16, 2022

This has a big problem: we need each subtest to invoke leaktest.
So this isn't as useful as I thought though it's still a lot better
than skipping manually. But with this as is there's a good chance
someone will use the registry to skip a test but it doesn't actually
skip, and that surely sucks

Release note: None

This has a big problem: we need each subtest to invoke leaktest.
So this isn't as useful as I thought though it's still a lot better
than skipping manually. But with this as is there's a good chance
someone will use the registry to skip a test but it doesn't actually
skip, and that surely sucks

Release note: None
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@srosenberg
Copy link
Copy Markdown
Member

But with this as is there's a good chance someone will use the registry to skip a test but it doesn't actually skip, and that surely sucks.

With a bit of hack we could pass our own predicate used for matching all tests, including subtests,

func TestMain(m *testing.M) {
	testing.RunTests(func(pat, str string) (bool, error) {
		return false, nil
	}, reflect.ValueOf(m).Elem().FieldByName("tests").Interface().([]testing.InternalTest))

	os.Exit(m.Run())
}

The actual predicate could just do exact matches against the registry. The same predicate is propagated into subtests [1].
The underlying issue [2] may get improved via -skip flag, but it's still rather inflexible when compared to a custom predicate.

[1] https://github.com/golang/go/blob/master/src/testing/testing.go#L1831
[2] golang/go#41583

@tbg
Copy link
Copy Markdown
Member Author

tbg commented May 18, 2022

Good sleuthing! Let's investigate this approach.

I also wanted to mention yet another approach - we could rally around TeamCity's mute feature.

Pro

  • it's already there, so there is probably an API endpoint, no poking into Go internals, etc
  • no need to go through CI (and no extra work to make it happen)
  • probably a decent UX right out of the box
  • runs the tests but without affecting build status, so in particular will continue posting issues (maybe this is actually a con)

Con

  • runs the tests... so if they crash the process, other tests won't run and the build will still fail, so it won't get the job done for all kinds of failures

@srosenberg
Copy link
Copy Markdown
Member

srosenberg commented May 19, 2022

I also wanted to mention yet another approach - we could rally around TeamCity's mute feature.

Indeed, I considered that option too but ruled against it for the same reasons; plus, we don't want tight coupling with TeamCity because some day we just might end up using something else.

I think we can make this work via testing.Main [1]. Note the comment in [1],

It is no longer used by "go test" but preserved, as much as possible, for other systems that simulate "go test" using Main

It invokes M->Run() directly, so it follows the same path unlike my initial attempt via testing.RunTests which skips some initialization steps, e.g., parseCpuList [2], before the test runner loop. I tested the following (rough) snippet in pkg/sql/randgen, and it works as expected,

func TestMain(m *testing.M) {
	rf := reflect.ValueOf(m).Elem().FieldByName("tests")
	tests := reflect.NewAt(rf.Type(), unsafe.Pointer(rf.UnsafeAddr())).Elem().Interface().([]testing.InternalTest)
	rf = reflect.ValueOf(m).Elem().FieldByName("benchmarks")
	benchmarks := reflect.NewAt(rf.Type(), unsafe.Pointer(rf.UnsafeAddr())).Elem().Interface().([]testing.InternalBenchmark)
	rf = reflect.ValueOf(m).Elem().FieldByName("examples")
	examples := reflect.NewAt(rf.Type(), unsafe.Pointer(rf.UnsafeAddr())).Elem().Interface().([]testing.InternalExample)

	testing.Main(func(_, testName string) (bool, error) {
                 // Insert registry lookup to skip the test if it's known to be flaky.
		return true, nil
	}, tests, benchmarks, examples)

I will do a more exhaustive test tomorrow. Note that we can auto-generate TestMain for all packages, so this approach seems more than promising.

[1] https://github.com/golang/go/blob/master/src/testing/testing.go#L1603
[2] https://github.com/golang/go/blob/master/src/testing/testing.go#L2048

@tbg
Copy link
Copy Markdown
Member Author

tbg commented May 19, 2022

Nice! This should really do it. We could even call MainStart directly (by cribbing the testDeps shim in the first arg which does the matching), though Main is unlikely to go away and if it does, we can still switch over.

Have you thought where to source the list of skipped tests from? Avoiding the CI round-trip is helpful, and yet this stuff needs to be maintained by branch so it's difficult to do it with an env var in TC. Besides, we ultimately want an API endpoint to interact with the skipped tests. And, also, we do want the tests to be skipped in local development! And probably we also want to avoid building too much of "our own".

Probably at first env vars are still the way to go to get off the ground ASAP, with one per branch. For example, on the Cockroach project, we'd have an env var

image

We would probably pivot on the format a few times (as we find a need to maybe only skip tests for purpose of CI, but not for purpose of nightlies, etc, and maybe we'd get in the habit of putting an expiration date for the "mute" on each skip to prevent rot, who knows).

The next question is what processes we need around this new way to skip tests, to make sure that it is discoverable and that folks across the org have a good view of which tests are skipped. Maybe rather than the env var, we should right away put the skiplist into something like a gist (but with better access control...), something that folks can easily access and that can be edited (while being versioned). (If that external source is down, we'd just not skip tests, this is the best we can do). Then we can have tools in the main repo that pull the current list to a file in the repo (which isn't checked in) for local development in case it's needed, and we can have tools like skip-test perform the edit automatically.

plus, we don't want tight coupling with TeamCity because some day we just might end up using something else.

I'm not sure I'm too worried about that. If TC gets the job done, and we are principled about interacting with it using clean APIs, it wouldn't be too hard to switch it out. But I think it will be hard to use anyway.

@srosenberg
Copy link
Copy Markdown
Member

Have you thought where to source the list of skipped tests from? Avoiding the CI round-trip is helpful, and yet this stuff needs to be maintained by branch so it's difficult to do it with an env var in TC. Besides, we ultimately want an API endpoint to interact with the skipped tests. And, also, we do want the tests to be skipped in local development! And probably we also want to avoid building too much of "our own".

What about a github repo? We already have the tooling to pull/push. We could either inline the branch into the file format or use an actual branch with the same name; I'd prefer the latter. This way we have the full revision history of a skiplist per branch.

I'm not sure I'm too worried about that. If TC gets the job done, and we are principled about interacting with it using clean APIs, it wouldn't be too hard to switch it out. But I think it will be hard to use anyway.

Right, the coupling with TC is not a deal breaker. But the alternative seems more flexible and portable.

@tbg
Copy link
Copy Markdown
Member Author

tbg commented May 23, 2022

What about a github repo? We already have the tooling to pull/push. We could either inline the branch into the file format or use an actual branch with the same name; I'd prefer the latter. This way we have the full revision history of a skiplist per branch.

Yeah, that seems nice. Agree re: using branch names for the skiplists. We will need to create release branches on that repo as well then, so it does create a little bit of extra work. We could entertain using the main repo for this instead (i.e. go back to putting this stuff in the actual release branches, next to code) and suitably speed up CI for them (bypass bors and don't run general tests, just test the skip file in ~seconds). One interface could be blathers skip foo/bar:TestBaz which does it all for you, including the merge (and/or an improved skip-test that does the same).

srosenberg added a commit to srosenberg/cockroach that referenced this pull request Aug 25, 2022
When a test fails, --flaky_test_attempts=3 will cause up to two retry
attempts by executing the corresponding test binary (shard).
If a test passes on either attempt, it is reported 'FLAKY' in the
test result summary. In addition, each attempt yields an (XML) log
under 'test_attempts' subdirectory. The attempts are now copied
into the artifactsDir.

Note, while the test result summary may say the test is 'FLAKY', it's
reported as 'PASS' as per main test.xml. Thus, this change doesn't
change the status of CI, semantically; i.e., a test succeeding after up to two
retry attempts is assumed to have _passed_ whereas three consecutive
failures result in a failed test. However, the two additional attempts
per test binary (shard) could slow down the build. Hence, initially the
plan is to enable only on release branches and not staging (i.e., bors).

This change supports [1]. In a follow-up PR, a test reporter could be
augmented to parse attempt logs thereby obtaining a set of FLAKY tests
which should be skipped on subsequent builds.

[1] cockroachdb#81293

Release justification: CI improvement
Release note: None
@tbg tbg closed this Sep 20, 2022
srosenberg added a commit to srosenberg/cockroach that referenced this pull request Dec 16, 2022
When a test fails, --flaky_test_attempts=3 will cause up to two retry
attempts by executing the corresponding test binary (shard).
If a test passes on either attempt, it is reported 'FLAKY' in the
test result summary. In addition, each attempt yields an (XML) log
under 'test_attempts' subdirectory. The attempts are now copied
into the artifactsDir.

Note, while the test result summary may say the test is 'FLAKY', it's
reported as 'PASS' as per main test.xml. Thus, a 'FLAKY' test will
not fail the CI; i.e., a test succeeding after up to two
retries is assumed to have _passed_ whereas three consecutive
failures result in a failed test. Consequently, CI now has a higher
probability of passing than before this change. However, the two additional
attempts per test binary (shard) could slow down the build. Hence, initially
the plan is to enable only on release branches and not staging (i.e., bors).

This change supports [1]. In a follow-up PR, a test reporter could be
augmented to parse attempt logs thereby obtaining a set of FLAKY tests
which should be skipped on subsequent builds.

[1] cockroachdb#81293

Release justification: CI improvement
Release note: None
srosenberg added a commit to srosenberg/cockroach that referenced this pull request Jan 4, 2023
When a test fails, --flaky_test_attempts=3 will cause up to two retry
attempts by executing the corresponding test binary (shard).
If a test passes on either attempt, it is reported 'FLAKY' in the
test result summary. In addition, each attempt yields an (XML) log
under 'test_attempts' subdirectory. The attempts are now copied
into the artifactsDir.

Note, while the test result summary may say the test is 'FLAKY', it's
reported as 'PASS' as per main test.xml. Thus, a 'FLAKY' test will
not fail the CI; i.e., a test succeeding after up to two
retries is assumed to have _passed_ whereas three consecutive
failures result in a failed test. Consequently, CI now has a higher
probability of passing than before this change. However, the two additional
attempts per test binary (shard) could slow down the build. Hence, initially
the plan is to enable only on release branches and not staging (i.e., bors).

This change supports [1]. In a follow-up PR, a test reporter could be
augmented to parse attempt logs thereby obtaining a set of FLAKY tests
which should be skipped on subsequent builds.

[1] cockroachdb#81293

Release justification: CI improvement
Release note: None
craig bot pushed a commit that referenced this pull request Jan 4, 2023
86891: bazci: enable --flaky_test_attempts=3 for Unit tests on release branches r=rickystewart,healthy-pod,renatolabs a=srosenberg

When a test fails, --flaky_test_attempts=3 will cause up to two retry
attempts by executing the corresponding test binary (shard).
If a test passes on either attempt, it is reported 'FLAKY' in the
test result summary. In addition, each attempt yields an (XML) log
under 'test_attempts' subdirectory. The attempts are now copied
into the artifactsDir.

Note, while the test result summary may say the test is 'FLAKY', it's
reported as 'PASS' as per main test.xml. Thus, a 'FLAKY' test will
not fail the CI; i.e., a test succeeding after up to two
retries is assumed to have _passed_ whereas three consecutive
failures result in a failed test. Consequently, CI now has a higher
probability of passing than before this change. However, the two additional
attempts per test binary (shard) could slow down the build. Hence, initially
the plan is to enable only on release branches and not staging (i.e., bors).

This change supports [1]. In a follow-up PR, a test reporter could be
augmented to parse attempt logs thereby obtaining a set of FLAKY tests
which should be skipped on subsequent builds.

[1] #81293

Release justification: CI improvement
Release note: None
Epic: None

92955: opt: inline UDFs as subqueries r=mgartner a=mgartner

UDFs are now inlined as subqueries by a normalization rule when possible, speeding up their evaluation.

Epic: CRDB-20370

Release note (performance improvement): Some types of user-defined functions are now inlined in query plans as relation expressions, which speeds up their evaluation. UDFs must be non-volatile and have a single statement in the function body to be inlined.

94660: colserde: perform memory accounting for scratch slices in the converter r=yuzefovich a=yuzefovich

This commit introduces the memory accounting for values and offsets
scratch slices that are being reused across `BatchToArrow` calls since
recently. This required a lot of plumbing to propagate the memory
account from all places. The converter is careful to only grow and
shrink the account by its own usage, so the same memory account can be
reused across multiple converters (as long as there is no concurrency
going on, and we only can have concurrency for hash router outputs, so
there we give each output a separate account).

An additional minor change is that now in `diskQueue.Enqueue` we
properly `Close` the `FileSerializer` before `nil`-ing it out. This
isn't a problem per se since it is the caller's responsibility to close
the account used by the serializer, but it's nice to properly release
the accounted for bytes.

Epic: None

Release note: None

94702: workload: fix non-determinism in TPC-H data generation r=rytaft a=rytaft

This commit fixes the non-determinism in the TPC-H data generation by using a local slice for the random permutation of indexes into `randPartNames`. It also fixes the permutation algorithm to use a modified Fisher–Yates shuffle.

Fixes #93958

Release note: None

94713: sql: add user-facing error for invalid job type in job_payload_type builtin r=jayshrivastava a=jayshrivastava

### sql: add user-facing error for invalid job type in job_payload_type builtin

Previously, the `job_payload_type` builtin would return an internal error
if the payload could be unmarshalled, but the type could not be determined.
This changes ensures the builtin returns an appropriate user-facing error
in this case.

Epic: none
Fixes: #94680

Release note: None

94723: kvserver: check lease rebalance target exists in cached storelist r=erikgrinaker a=kvoli

From #91593 it was possible for a new lease target to have its store descriptor dereferenced despite not existing in a list of candidate stores earlier.

This patch resolves the dereference.

fixes: #94688

Release note: None

Co-authored-by: Stan Rosenberg <stan@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Co-authored-by: Jayant Shrivastava <jayants@cockroachlabs.com>
Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants