roachtest: acceptance/cli/node-status is flaky

It's failed 17 of the last ~100 runs: https://teamcity.cockroachdb.com/test/-4112874617452834710?currentProjectId=Cockroach_Ci_Tests&expandTestHistoryChartSection=true

Failure is:
```
 acceptance/cli/node-status
14:02:57 test_runner.go:956: [w0] --- FAIL: acceptance/cli/node-status (3896.57s)
    (cli.go:91).func3: expected [is_available is_live false false false false false false], but found [] from:
    (test_runner.go:1122).func1: 2 dead node(s) detected
    test artifacts and logs in: /artifacts/acceptance/cli/node-status/run_1
    --- FAIL: acceptance/cli/node-status (3896.57s)
    (cli.go:91).func3: expected [is_available is_live false false false false false false], but found [] from:
    (test_runner.go:1122).func1: 2 dead node(s) detected
    test artifacts and logs in: /artifacts/acceptance/cli/node-status/run_1
```

@srosenberg points out that perhaps an even greater problem for our CI pipeline is that the cleanup logic after this failure takes up to an hour without proper timeouts:

> looks like `acceptance/cli/node-status` took 1 hour longer than expected… test failed (dunno why yet), but on teardown we try to `FetchTimeseriesData` which in turn calls `gosql.Open("postgres", dataSourceName);`. it’s unable to connect because nodes are down, so it retries until the timeout for `collectArtifacts` (1hr) expires. The leaked goroutines [1] confirm the story.
> It looks like a flake… few things to resolve,
> - why no timeout for gosql.Open?
> - postTestAssertions uses /health?ready=1 which timeouts immediately
> - why should FetchTimeseriesData be allowed nearly a whole 1hr?
> - why is the local roachtest job allowed to run > 1hr?
> - lastly, why did the test fail?
>
> [1] https://teamcity.cockroachdb.com/repository/download/Cockroach_Ci_Tests_LocalRoachtest/11093729:id/_runner-logs/test_runner-1690507817.log

Jira issue: CRDB-30197
Epic: CRDB-28893

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest: acceptance/cli/node-status is flaky #107791

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

roachtest: acceptance/cli/node-status is flaky #107791

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions