github-post: add timeout handling to the stress issue poster#29987
github-post: add timeout handling to the stress issue poster#29987craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
petermattis
left a comment
There was a problem hiding this comment.
Thanks for tackling this! I gave the changes in listFailures a quick review. Tons of logic in that method now. I wonder if it should be broken into smaller pieces. Regardless, the test case additions give me some confidence in this PR.
Reviewable status:
complete! 0 of 0 LGTMs obtained
build/teamcity-stress.sh, line 38 at r1 (raw file):
make stress PKG="$PKG" TESTTIMEOUT=40m GOFLAGS="$GOFLAGS" TAGS="$TAGS" STRESSFLAGS="-maxruns 100 -maxfails 1 -stderr $STRESSFLAGS" \ | tee artifacts/stress.log \ | go tool test2json -t | github-post
@benesch or someone else who is knowledgable about shell pipelines will have to review this.
pkg/cmd/github-post/main.go, line 97 at r1 (raw file):
// tests. Those tests will remain "outstanding" and will be ignored for the // purpose of issue reporting. outstendingOutput := make(map[string][]testEvent)
s/outstending/outstanding/g
pkg/cmd/github-post/main.go, line 100 at r1 (raw file):
failures := make(map[string][]testEvent) slowPassingTests := make([]testEvent, 0, 1000) slowFailingTests := make([]testEvent, 0, 1000)
I doubt there is a measurable effect for pre-allocating these slices. In non-performance sensitive code, I'd go with something more like:
var slowPassingTests []testEvent
var slowFailingTests []testEvent
pkg/cmd/github-post/main.go, line 304 at r1 (raw file):
} // Sort slow test descendingly by duration.
s/test/tests/g
pkg/cmd/github-post/main.go, line 393 at r1 (raw file):
if _, err := f.WriteString(report); err != nil { return err }
The above can be written as:
if err := ioutil.WriteFile("artifacts/slow-tests-report.txt", []byte(report), 0644); err != nil {
return err
}
benesch
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained
build/teamcity-stress.sh, line 38 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
@benesch or someone else who is knowledgable about shell pipelines will have to review this.
LGTM but can you stick a 2>&1 onto the make stress invocation? We should have had that there all along, I think.
Add explicit support to the issue poster for timeouts: - on all runs, publish an artifacts file with a list of slow tests - when timeouts happen, distinguish between the case where the test currently running at the timeout point is the culprit (i.e. if it is the longest running test) versus situations where that test is just an innocent bystender This patch also spruces up the github-post script in various ways. Among them there's now better support for running it on an input that comes directly from a test run, and not from a stress wrapper. Release note: None
e73e5c9 to
da4fd85
Compare
andreimatei
left a comment
There was a problem hiding this comment.
well, I have tried to leave the place cleaner than I found it... I'm not keen to attempt more refactoring now cause I'm not entirely sure what I would do :)
Reviewable status:
complete! 0 of 0 LGTMs obtained
build/teamcity-stress.sh, line 38 at r1 (raw file):
Previously, benesch (Nikhil Benesch) wrote…
LGTM but can you stick a
2>&1onto themake stressinvocation? We should have had that there all along, I think.
done
pkg/cmd/github-post/main.go, line 97 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
s/outstending/outstanding/g
Done.
pkg/cmd/github-post/main.go, line 100 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I doubt there is a measurable effect for pre-allocating these slices. In non-performance sensitive code, I'd go with something more like:
var slowPassingTests []testEvent var slowFailingTests []testEvent
Done.
pkg/cmd/github-post/main.go, line 304 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
s/test/tests/g
Done.
pkg/cmd/github-post/main.go, line 393 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
The above can be written as:
if err := ioutil.WriteFile("artifacts/slow-tests-report.txt", []byte(report), 0644); err != nil { return err }
Done.
petermattis
left a comment
There was a problem hiding this comment.
well, I have tried to leave the place cleaner than I found it... I'm not keen to attempt more refactoring now cause I'm not entirely sure what I would do :)
Ack. You should probably prepare yourself for a bit of inevitable fallout. I find that occurs whenever these helper programs are changed.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
benesch
left a comment
There was a problem hiding this comment.
The changes to the shell script . Deferring to Peter's LGTM on the rest.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale)
andreimatei
left a comment
There was a problem hiding this comment.
TFTRs
Btw, the real thing to do, in my opinion, is not so much refactoring the listFailures method more, but considering options for not relying on a pipeline of less-than-ideal tools: stress -> go test runner -> test2son -> github-post. The combination of the first 3 produces input that's really hard to operate on, so whatever you do on top is probably going to ugly.
bors r+
Reviewable status:
complete! 2 of 0 LGTMs obtained
16354: storage: Delay initial metrics until system config is ready r=bdarnell a=bdarnell This was previously done too soon at startup and would clutter the logs (mainly in tests) with "failed initial metrics computation". Fixes #13560 29817: client, roachpb: eliminate log spam when loadgens are killed r=andreimatei a=andreimatei When the TPCC loadgen is CTRL-C'ed, the logs are spammed with: I180906 22:02:41.239771 27177127 internal/client/txn.go:625 [n1] async rollback failed: TransactionStatusError: already committed (REASON_UNKNOWN): "sql txn" id=57c518aa key=/Table/61/1/1224/0 rw=true pri=0.03781170 iso=SERIALIZABLE stat=COMMITTED epo=0 ts=1536271361.068457287,0 orig=1536271361.068457287,0 max=1536271361.073978228,0 wto=false rop=false seq=11 int=5 The "async rollback" part refers to the rollback being done with a canceled ctx (presumably a dropped connection's ctx). I believe the error happens because there's a commit in flight when the ctx is canceled. This patch lowers the message's level for this case. Release note: None 29987: github-post: add timeout handling to the stress issue poster r=andreimatei a=andreimatei Add explicit support to the issue poster for timeouts: - on all runs, publish an artifacts file with a list of slow tests - when timeouts happen, distinguish between the case where the test currently running at the timeout point is the culprit (i.e. if it is the longest running test) versus situations where that test is just an innocent bystender This patch also spruces up the github-post script in various ways. Among them there's now better support for running it on an input that comes directly from a test run, and not from a stress wrapper. Release note: None 30006: opt: fix panic during SELECT MIN(NULL) r=rytaft a=arjunravinarayan Fixes #29695. Release note (bug fix): fix a crash when SELECT MIN(NULL) was run with the optimizer enabled. 30008: gossip: allow receipt of "loopback infos" r=bdarnell a=petermattis Receipt of loopback infos was disabled in #29398, but doing so had the unfortunate effect of allowing gossip state to temporarily diverge between nodes. Rather than disallowing loopback infos, we now ratchet the gossip monotonic clock in order to avoid the assertion around the gossip highwater timestamps. Fixes #29992 Fixes #20986 Release note: None Co-authored-by: Ben Darnell <ben@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Arjun Narayan <arjun@cockroachlabs.com> Co-authored-by: Peter Mattis <petermattis@gmail.com>
Build succeeded |
Add explicit support to the issue poster for timeouts:
currently running at the timeout point is the culprit (i.e. if it is the
longest running test) versus situations where that test is just an
innocent bystender
This patch also spruces up the github-post script in various ways. Among
them there's now better support for running it on an input that comes
directly from a test run, and not from a stress wrapper.
Release note: None