Skip to content

asim: non-deterministic result when overloaded with stores and replicas #105904

@wenyihu6

Description

@wenyihu6

Describe the problem

The allocator simulator exhibits non-determinism when overloaded with a high
number of stores and replicas on nodes.

To Reproduce

  1. Set the number of stores and replicasPerStore setup in
    TestAllocatorSimulatorDeterministic to 11 and 200.
  2. Run ./dev test pkg/kv/kvserver/asim -f TestAllocatorSimulatorDeterministic --ignore-cache --count=500

Additional context

Some observations from playing around:

  • This issue seems to be alleviated by gossiping more aggressively. The default
    gossip interval is currently set to 10. Changing this value to 1s avoided the failure.
    But this doesn’t address the root cause since the result should be deterministic
    independent of gossip frequency.
    defaultStateExchangeInterval = 10 * time.Second
  • This issue seems to be only reproducible when both replicasPerStore and number
    of stores are overloaded. If only one is overloaded, the test seems to do fine.
    Overloading both parameters might have amplified the exhibited non-deterministic
    behavior.

Jira issue: CRDB-29265

Metadata

Metadata

Assignees

Labels

A-kv-simulationRelating to allocation simulation.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.C-test-failureBroken test (automatically or manually discovered).T-kvKV Teamskipped-test

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions