Skip to content

Flaky: services/blockassembly system tests (FSM-startup / gRPC connection races under CI load) #997

@oskarszoon

Description

@oskarszoon

Six services/blockassembly system tests fail intermittently with daemon-startup timing errors. Confirmed flaky: the same commit (33e17d537) failed, then passed on an immediate re-run with no code change.

Tests

All in services/blockassembly/blockassembly_system_test.go (via test/nodeHelpers/blockchainDaemon.go):

  • Test_CoinbaseSubsidyHeight
  • TestShouldAddSubtreesToLongerChain
  • TestShouldHandleReorg
  • TestShouldHandleReorgWithLongerChain
  • TestResetWithBlockchainAhead_Integration
  • TestHandleReorgWithInvalidBlock_Integration

Symptoms

Two signatures:

  • timeout waiting for FSM to transition to RUNNING state (current state: IDLE)blockchainDaemon.go:156
  • failed to connect to blockchain service for 'localhost:<port>', retried 3 times: ... connection refused

Failing tests sit at ~45s (the client connect-retry budget) or ~10s (the FSM poll timeout).

Mechanism

StartBlockchainService() starts the blockchain service in-process (goroutine + gRPC server on a local port), then polls the FSM for RUNNING with a 10s timeout (blockchainDaemon.go:148). Under heavy concurrent CI load — these ran alongside 14 benchmark jobs and the e2e shard matrices — the in-process gRPC server doesn't bind / the FSM doesn't reach RUNNING within the deadline.

Evidence

PR #996, "Teranode PR tests" run 26752608043:

  • attempt 1 — test job FAILED (these 6)
  • attempt 2 (re-run, identical SHA) — test job PASSED

Same commit, opposite outcomes ⇒ flaky.

Suggested directions

  • Raise the 10s FSM-RUNNING poll timeout (and the client connect-retry budget); 10s is tight on a loaded runner.
  • Or wrap these system tests in the existing retry mechanism (the e2e suites use gotestsum_with_retry.sh).
  • Longer term: have StartBlockchainService() wait on a readiness signal rather than a fixed timeout.

Not blocking a specific PR — surfaced while profiling the unit-test suite.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions