Skip to content

roachtest: properly fail when uploading binaries fails#41083

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/putFailRoachtest
Sep 25, 2019
Merged

roachtest: properly fail when uploading binaries fails#41083
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/putFailRoachtest

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Sep 25, 2019

Closes #41016.
Closes #40864.
Closes #40578.

In all of the referenced issues, we were seeing uploading cockroach
binaries fail (which should be idempotent). We could see this in the
log:

05:11:12 test.go:182: test status: uploading binary
05:11:12 cluster.go:315: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod put teamcity-1569301790-07-n4cpu4 /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 ./cockroach
teamcity-1569301790-07-n4cpu4: putting (dist) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 ./cockroach
................
   1: done
   2: ~ scp -r -C -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa -i /root/.ssh/google_compute_engine /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 root@35.222.255.152:./cockroach
Warning: Permanently added '35.222.255.152' (ECDSA) to the list of known hosts.
packet_write_wait: Connection to 35.222.255.152 port 22: Broken pipe
lost connection
: exit status 1
   3: done
   4: done
I190924 05:11:29.022222 1 cluster_synced.go:1088  put /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 failed

The test would then ignore the failure and proceed to get tripped up
when starting cockroach:

05:11:34 cluster.go:315: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --racks=1 --args=--locality-advertise-addr=rack=0@35.222.255.152 teamcity-1569301790-07-n4cpu4:2
teamcity-1569301790-07-n4cpu4: starting
0: exit status 255
~ ./cockroach version

github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.getCockroachVersion
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:88
github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:149
github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1535
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1337:

The problem was that we were ignoring the Put error accidentally, so the
tests got very confused. This commit fixes this by properly handling the
Put error. This doesn't actually fix the referenced issues entirely, but
it gets us a step closer to doing so, so I'm going to use it as an
opportunity to close them.

Release justification: Testing only.

Release note: None

Closes cockroachdb#41016.
Closes cockroachdb#40864.
Closes cockroachdb#40578.

In all of the referenced issues, we were seeing uploading cockroach
binaries fail (which should be idempotent). We could see this in the
log:
```
05:11:12 test.go:182: test status: uploading binary
05:11:12 cluster.go:315: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod put teamcity-1569301790-07-n4cpu4 /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 ./cockroach
teamcity-1569301790-07-n4cpu4: putting (dist) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 ./cockroach
................
   1: done
   2: ~ scp -r -C -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa -i /root/.ssh/google_compute_engine /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 root@35.222.255.152:./cockroach
Warning: Permanently added '35.222.255.152' (ECDSA) to the list of known hosts.
packet_write_wait: Connection to 35.222.255.152 port 22: Broken pipe
lost connection
: exit status 1
   3: done
   4: done
I190924 05:11:29.022222 1 cluster_synced.go:1088  put /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 failed
```

The test would then ignore the failure and proceed to get tripped up
when starting cockroach:
```
05:11:34 cluster.go:315: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --racks=1 --args=--locality-advertise-addr=rack=0@35.222.255.152 teamcity-1569301790-07-n4cpu4:2
teamcity-1569301790-07-n4cpu4: starting
0: exit status 255
~ ./cockroach version

github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.getCockroachVersion
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:88
github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:149
github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1535
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1337:
```

The problem was that we were ignoring the Put error accidentally, so the
tests got very confused. This commit fixes this by properly handling the
Put error. This doesn't actually fix the referenced issues entirely, but
it gets us a step closer to doing so, so I'm going to use it as an
opportunity to close them.

Release justification: Testing only.

Release note: None
@nvb nvb requested a review from andreimatei September 25, 2019 16:30
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@andreimatei
Copy link
Copy Markdown
Contributor

andreimatei commented Sep 25, 2019 via email

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Sep 25, 2019

bors r+

craig bot pushed a commit that referenced this pull request Sep 25, 2019
41083: roachtest: properly fail when uploading binaries fails r=nvanbenschoten a=nvanbenschoten

Closes #41016.
Closes #40864.
Closes #40578.

In all of the referenced issues, we were seeing uploading cockroach
binaries fail (which should be idempotent). We could see this in the
log:
```
05:11:12 test.go:182: test status: uploading binary
05:11:12 cluster.go:315: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod put teamcity-1569301790-07-n4cpu4 /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 ./cockroach
teamcity-1569301790-07-n4cpu4: putting (dist) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 ./cockroach
................
   1: done
   2: ~ scp -r -C -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa -i /root/.ssh/google_compute_engine /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 root@35.222.255.152:./cockroach
Warning: Permanently added '35.222.255.152' (ECDSA) to the list of known hosts.
packet_write_wait: Connection to 35.222.255.152 port 22: Broken pipe
lost connection
: exit status 1
   3: done
   4: done
I190924 05:11:29.022222 1 cluster_synced.go:1088  put /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 failed
```

The test would then ignore the failure and proceed to get tripped up
when starting cockroach:
```
05:11:34 cluster.go:315: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --racks=1 --args=--locality-advertise-addr=rack=0@35.222.255.152 teamcity-1569301790-07-n4cpu4:2
teamcity-1569301790-07-n4cpu4: starting
0: exit status 255
~ ./cockroach version

github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.getCockroachVersion
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:88
github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:149
github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1535
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1337:
```

The problem was that we were ignoring the Put error accidentally, so the
tests got very confused. This commit fixes this by properly handling the
Put error. This doesn't actually fix the referenced issues entirely, but
it gets us a step closer to doing so, so I'm going to use it as an
opportunity to close them.

Release justification: Testing only.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 25, 2019

Build succeeded

@craig craig bot merged commit 1ecaa0f into cockroachdb:master Sep 25, 2019
@nvb nvb deleted the nvanbenschoten/putFailRoachtest branch October 14, 2019 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: acceptance/gossip/locality-address failed teamcity: failed test: gossip/restart teamcity: failed test: version-upgrade

3 participants