roachtest: exit with failure on github post errors#149479
Conversation
pkg/cmd/roachtest/test_runner.go
Outdated
| shout(ctx, l, lopt.stdout, passFailLine) | ||
|
|
||
| if r.numClusterErrs > 0 { | ||
| // Prioritize returning GitHub POST errors because we want those to fail the |
There was a problem hiding this comment.
nit: could you explain what you mean by "prioritize returning GitHub POST errors"? It seems like this code is taking both into account equally.
There was a problem hiding this comment.
i was trying to capture the idea that we could technically have any combination of 3 errors now, errSomeClusterProvisioningFailed, errGithubPostFailed, errTestsFailed
and the comment was supposed to help explain why the if else clauses are set up in the order that they are, BUT I just realized that I should probably just return the joined error up to main and let main handle what to do (and it already does)
will make that change rn
There was a problem hiding this comment.
oh i just realized this is what you were getting at in your other comment, but let me know if the way it is now makes more sense @DarrylWong
There was a problem hiding this comment.
Yep, the prioritization comment makes a lot more sense now. Thanks!
pkg/cmd/roachtest/test_runner.go
Outdated
| if r.numClusterErrs > 0 { | ||
| // Prioritize returning GitHub POST errors because we want those to fail the | ||
| // run. | ||
| if r.numClusterErrs > 0 && r.numGithubPostErrs > 0 { |
There was a problem hiding this comment.
nit: seems fine to just log to teamcity twice. e.g.
var err error
if r.numGithubPostErrs > 0 {
shout(ctx, l, lopt.stdout, "%d errors occurred while posting to github", r.numGithubPostErrs)
err = errGithubPostFailed
}
if r.numClusterErrs > 0 {
shout(ctx, l, lopt.stdout, "%d clusters could not be created", r.numClusterErrs)
err = errors.CombineErrors(err, errSomeClusterProvisioningFailed)
}
if err != nil {
return err
}
There was a problem hiding this comment.
i can take out the combined log msg and just do the same one twice in the first if clause i.e. i think that makes it more simple like
if r.numClusterErrs > 0 && r.numGithubPostErrs > 0 {
shout(ctx, l, lopt.stdout, "%d errors occurred while posting to github", r.numGithubPostErrs)
shout(ctx, l, lopt.stdout, "%d clusters could not be created", r.numClusterErrs)
err = ...
} else if ...
but i don't like potentially rewriting err, i think the verbosity is worth for the readability, but i don't feel that strongly about this. also if that's how the switch case logic is done in the rest of the package (like in main) / go in general then down to switch
There was a problem hiding this comment.
ended up rewriting this logic, see this comment #149479 (comment)
65a4385 to
e09507a
Compare
|
As tempting as it is to use the issue this addresses in the title, it should be moved to the body of the PR and on the commit message in the following form: You can update the PR title to be the same as your commit -> |
herkolategan
left a comment
There was a problem hiding this comment.
Nice work! Left a few minor comments, but overall the high level implementation looks good.
e09507a to
3ead89f
Compare
| Suites: registry.Suites(registry.Nightly), | ||
| CockroachBinary: registry.StandardCockroach, | ||
| Run: func(ctx context.Context, t test.Test, c cluster.Cluster) { | ||
| panic("this test should never run") |
There was a problem hiding this comment.
nit: The test runner will recover this panic as a test failure. So you'd have to also add a require.NotErrorIs(t, err, errTestsFailed) check to assert that the test itself didn't fail.
There was a problem hiding this comment.
oops got it will add ty
…b issue creation fails Previously, pipeline would not fail when encountering an issue posting to github, which would cause thsee failures to be "silent". Now pipelines will fail when there's an error posting to github increasing visability. New mechanism is very similar to how provisioning errors are tracked and returned to main. Does not trigger test short circuiting mechanism because testRunner.runWorkers() doesn't return an error Introduces dependency injection for GitHub in Run so GitHub failures can be mocked in testing Resolves cockroachdb#147116 Epic: None Release note: None
3ead89f to
3ac7c44
Compare
|
The TeamCity build has unfortunately been very inconsistent for me and just me, i've just been assuming it's intermittent failures but i'm the only branch that's hitting it. I've been able to get it to pass by just rerunning but gonna take a look to see what's going on, noticing heap memory issues when publishing artifacts but i'm not adding any new artifacts so i dont know why i'd be getting OOMs |
some extra context (https://cockroachlabs.slack.com/archives/CJ0H8Q97C/p1752259095090199) |
|
TFTR! bors r=herkolategan,DarrylWong |
|
Nice work! As a precaution we usually fire up a smoke test using [1] https://cockroachlabs.atlassian.net/wiki/spaces/TE/pages/146538717/Roachtest#Roachtest-OverridingSELECT_PROBABILITY |
During the migration of the github poster initialization in cockroachdb#149479, the default team loader configuration was inadvertently omitted. This change restores the default team loader setting to resolve the issue. Informs: cockroachdb#149479 Epic: None Release note: None
150223: roachtest: fix nil team loader r=williamchoe3,srosenberg a=herkolategan During the migration of the github poster initialization in #149479, the default team loader configuration was inadvertently omitted. This change restores the default team loader setting to resolve the issue. Informs: #149479 Epic: None Release note: None Co-authored-by: Herko Lategan <herko@cockroachlabs.com>
Fixes #147116
Changes
Highlevel Changes
Added a new failure path first by
testRunnerstruct which get's incremented whengithub.MaybePost()(called intestRunner.runWorkers()andtestRunner.runTests())returns an error. When this count > 0,testRunner.Run()will return a new errorerrGithubPostFailedand whenmain()sees that error, it will return a new exit code12which will fail the pipeline (unlike exit code 10, 11)main()testRunner.runWorkers()doesn't return an errorDesign
In order to do verification via unit tests, i'm used to using something like Python's magic mock, but that's not available in GoLang so i opted for a Dependency Injection approach. (This was the best I could come up with, I wanted to avoid "if unit test, do this" logic. If anyone has any other approaches / suggestions let me know!)
I made a new interface
GithubPosterin such a way that the originalgithubIssuesimplements that new interface. I then pass this interface in function signatures all the way fromRun()torunTests(). Then in the unit tests, I could pass a different implementation ofGithubPosterthat has aMaybePost()that always fails.github.goAnother issue with this approach is the original
githubIssueshas information that is cluster specific, but because of dependency injection, it's now a shared struct among all the workers, so it doesn't make sense to store certain fields that are worker dependent.For the fields that are worker specific, I created a new struct
githubIssueInfothat is created inrunWorkers(), similar to howgithubIssuesused to be created there.Note: I don't love the name
githubIssueInfo, but i wanted to stick with a similar naming convention togithubIssues, open to name suggestionsAll this was very verbose and didn't love that i had to change all the function signatures to do this, open to other ways to do verification.
Misc
Also first time writing in Go in like ~3 years very open to general go semantic feedback / best practices / design patterns
Verification
Diff of binary I used to manually confirm if you wanna see where I hardcoded to return errors: 611adcc
Manual Test Logs