github-post: add timeout handling to the stress issue poster by andreimatei · Pull Request #29987 · cockroachdb/cockroach

andreimatei · 2018-09-09T16:58:13Z

Add explicit support to the issue poster for timeouts:

on all runs, publish an artifacts file with a list of slow tests
when timeouts happen, distinguish between the case where the test
currently running at the timeout point is the culprit (i.e. if it is the
longest running test) versus situations where that test is just an
innocent bystender

This patch also spruces up the github-post script in various ways. Among
them there's now better support for running it on an input that comes
directly from a test run, and not from a stress wrapper.

Release note: None

cockroach-teamcity · 2018-09-09T16:58:18Z

This change is

petermattis

Thanks for tackling this! I gave the changes in listFailures a quick review. Tons of logic in that method now. I wonder if it should be broken into smaller pieces. Regardless, the test case additions give me some confidence in this PR.

Reviewable status: complete! 0 of 0 LGTMs obtained

build/teamcity-stress.sh, line 38 at r1 (raw file):

make stress PKG="$PKG" TESTTIMEOUT=40m GOFLAGS="$GOFLAGS" TAGS="$TAGS" STRESSFLAGS="-maxruns 100 -maxfails 1 -stderr $STRESSFLAGS" \
  | tee artifacts/stress.log \
  | go tool test2json -t | github-post

@benesch or someone else who is knowledgable about shell pipelines will have to review this.

pkg/cmd/github-post/main.go, line 97 at r1 (raw file):

	// tests. Those tests will remain "outstanding" and will be ignored for the
	// purpose of issue reporting.
	outstendingOutput := make(map[string][]testEvent)

s/outstending/outstanding/g

pkg/cmd/github-post/main.go, line 100 at r1 (raw file):

	failures := make(map[string][]testEvent)
	slowPassingTests := make([]testEvent, 0, 1000)
	slowFailingTests := make([]testEvent, 0, 1000)

I doubt there is a measurable effect for pre-allocating these slices. In non-performance sensitive code, I'd go with something more like:

var slowPassingTests []testEvent
var slowFailingTests []testEvent

pkg/cmd/github-post/main.go, line 304 at r1 (raw file):

	}

	// Sort slow test descendingly by duration.

s/test/tests/g

pkg/cmd/github-post/main.go, line 393 at r1 (raw file):

	if _, err := f.WriteString(report); err != nil {
		return err
	}

The above can be written as:

if err := ioutil.WriteFile("artifacts/slow-tests-report.txt", []byte(report), 0644); err != nil {
  return err
}

benesch

Reviewable status: complete! 0 of 0 LGTMs obtained

build/teamcity-stress.sh, line 38 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

@benesch or someone else who is knowledgable about shell pipelines will have to review this.

LGTM but can you stick a 2>&1 onto the make stress invocation? We should have had that there all along, I think.

Add explicit support to the issue poster for timeouts: - on all runs, publish an artifacts file with a list of slow tests - when timeouts happen, distinguish between the case where the test currently running at the timeout point is the culprit (i.e. if it is the longest running test) versus situations where that test is just an innocent bystender This patch also spruces up the github-post script in various ways. Among them there's now better support for running it on an input that comes directly from a test run, and not from a stress wrapper. Release note: None

andreimatei

well, I have tried to leave the place cleaner than I found it... I'm not keen to attempt more refactoring now cause I'm not entirely sure what I would do :)

Reviewable status: complete! 0 of 0 LGTMs obtained

build/teamcity-stress.sh, line 38 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

LGTM but can you stick a 2>&1 onto the make stress invocation? We should have had that there all along, I think.

done

pkg/cmd/github-post/main.go, line 97 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/outstending/outstanding/g

Done.

pkg/cmd/github-post/main.go, line 100 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I doubt there is a measurable effect for pre-allocating these slices. In non-performance sensitive code, I'd go with something more like:
var slowPassingTests []testEvent
var slowFailingTests []testEvent

Done.

pkg/cmd/github-post/main.go, line 304 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/test/tests/g

Done.

pkg/cmd/github-post/main.go, line 393 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The above can be written as:

if err := ioutil.WriteFile("artifacts/slow-tests-report.txt", []byte(report), 0644); err != nil {
  return err
}

Done.

petermattis

well, I have tried to leave the place cleaner than I found it... I'm not keen to attempt more refactoring now cause I'm not entirely sure what I would do :)

Ack. You should probably prepare yourself for a bit of inevitable fallout. I find that occurs whenever these helper programs are changed.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

benesch

The changes to the shell script . Deferring to Peter's LGTM on the rest.

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale)

andreimatei

TFTRs
Btw, the real thing to do, in my opinion, is not so much refactoring the listFailures method more, but considering options for not relying on a pipeline of less-than-ideal tools: stress -> go test runner -> test2son -> github-post. The combination of the first 3 produces input that's really hard to operate on, so whatever you do on top is probably going to ugly.

bors r+

Reviewable status: complete! 2 of 0 LGTMs obtained

16354: storage: Delay initial metrics until system config is ready r=bdarnell a=bdarnell This was previously done too soon at startup and would clutter the logs (mainly in tests) with "failed initial metrics computation". Fixes #13560 29817: client, roachpb: eliminate log spam when loadgens are killed r=andreimatei a=andreimatei When the TPCC loadgen is CTRL-C'ed, the logs are spammed with: I180906 22:02:41.239771 27177127 internal/client/txn.go:625 [n1] async rollback failed: TransactionStatusError: already committed (REASON_UNKNOWN): "sql txn" id=57c518aa key=/Table/61/1/1224/0 rw=true pri=0.03781170 iso=SERIALIZABLE stat=COMMITTED epo=0 ts=1536271361.068457287,0 orig=1536271361.068457287,0 max=1536271361.073978228,0 wto=false rop=false seq=11 int=5 The "async rollback" part refers to the rollback being done with a canceled ctx (presumably a dropped connection's ctx). I believe the error happens because there's a commit in flight when the ctx is canceled. This patch lowers the message's level for this case. Release note: None 29987: github-post: add timeout handling to the stress issue poster r=andreimatei a=andreimatei Add explicit support to the issue poster for timeouts: - on all runs, publish an artifacts file with a list of slow tests - when timeouts happen, distinguish between the case where the test currently running at the timeout point is the culprit (i.e. if it is the longest running test) versus situations where that test is just an innocent bystender This patch also spruces up the github-post script in various ways. Among them there's now better support for running it on an input that comes directly from a test run, and not from a stress wrapper. Release note: None 30006: opt: fix panic during SELECT MIN(NULL) r=rytaft a=arjunravinarayan Fixes #29695. Release note (bug fix): fix a crash when SELECT MIN(NULL) was run with the optimizer enabled. 30008: gossip: allow receipt of "loopback infos" r=bdarnell a=petermattis Receipt of loopback infos was disabled in #29398, but doing so had the unfortunate effect of allowing gossip state to temporarily diverge between nodes. Rather than disallowing loopback infos, we now ratchet the gossip monotonic clock in order to avoid the assertion around the gossip highwater timestamps. Fixes #29992 Fixes #20986 Release note: None Co-authored-by: Ben Darnell <ben@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Arjun Narayan <arjun@cockroachlabs.com> Co-authored-by: Peter Mattis <petermattis@gmail.com>

craig · 2018-09-10T18:41:14Z

Build succeeded

GitHub CI (Cockroach)

andreimatei assigned benesch, tbg and petermattis Sep 9, 2018

andreimatei requested a review from a team September 9, 2018 16:58

petermattis reviewed Sep 9, 2018

View reviewed changes

benesch reviewed Sep 10, 2018

View reviewed changes

andreimatei force-pushed the slow-tests branch from e73e5c9 to da4fd85 Compare September 10, 2018 16:03

andreimatei commented Sep 10, 2018

View reviewed changes

petermattis approved these changes Sep 10, 2018

View reviewed changes

benesch approved these changes Sep 10, 2018

View reviewed changes

andreimatei commented Sep 10, 2018

View reviewed changes

craig bot merged commit da4fd85 into cockroachdb:master Sep 10, 2018

andreimatei deleted the slow-tests branch September 10, 2018 19:49

andreimatei mentioned this pull request Sep 10, 2018

release-2.1: teamcity stress improvements #30034

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

github-post: add timeout handling to the stress issue poster#29987

github-post: add timeout handling to the stress issue poster#29987
craig[bot] merged 1 commit intocockroachdb:masterfrom
andreimatei:slow-tests

andreimatei commented Sep 9, 2018

Uh oh!

cockroach-teamcity commented Sep 9, 2018

Uh oh!

petermattis left a comment

Uh oh!

benesch left a comment

Uh oh!

andreimatei left a comment

Uh oh!

petermattis left a comment

Uh oh!

benesch left a comment

Uh oh!

andreimatei left a comment

Uh oh!

craig bot commented Sep 10, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

andreimatei commented Sep 9, 2018

Uh oh!

cockroach-teamcity commented Sep 9, 2018

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

benesch left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

benesch left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

craig bot commented Sep 10, 2018

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants