Skip to content

Find and fix tests that use sleep for synchronization #6858

@creachadair

Description

@creachadair

A number of our tests call time.Sleep during concurrent operations. In some cases this is probably correct, e.g., to wait for an expected timeout to fire. In other cases, however, it is intended as a synchronization mechanism. These latter should be fixed, for a few reasons:

  1. Arbitrary sleeps slow down test runs.
  2. If a test needs to wait for something to happen, production code often will too.
  3. Sleep is not guaranteed to correctly synchronize, leading to flaky tests.

We should audit our use of explicit sleep across all our tests, to see whether there are points of synchronization that should be exposed in the API. Where practical, we should attempt to replace unreliable sleep-wait with explicit synchronization.

A few examples (from git grep):

// Probably wrong.
abci/example/example_test.go: time.Sleep(time.Second * 1) // Wait for a bit to allow counter overflow
internal/p2p/router_test.go: time.Sleep(100 * time.Millisecond) // yes yes, but Close() is async...

// Suspect, though possibly OK (timeouts are implicated).
internal/consensus/wal_test.go: time.Sleep(1 * time.Millisecond) // wait groupCheckDuration, make sure RotateFile run
rpc/client/local/local.go: time.Sleep((10 << uint(attempts)) * time.Millisecond) // 10ms -> 20ms -> 40ms
test/e2e/runner/perturb.go: time.Sleep(20 * time.Second) // give network some time to recover between each

Metadata

Metadata

Assignees

No one assigned

    Labels

    stalefor use by stalebot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions