-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: gossip frequency is implicitly 10 seconds #81669
Description
Description
Store gossip occurs more than what is expected. It occurs every ~10 seconds under no changes whilst it should occur every minute currently.
These updates are triggered by capacity changes, specifically lease add events .
This function declares itself as idempotent however that is not the case, as each call may cause a new gossip to kick off of the latest store descriptor.
Reproduce
To reproduce, run https://github.com/cockroachdb/cockroach/compare/master...kvoli:220519.gossip-metrics?expand=1 and open up DB console. Using kv.allocator.staleness you can examine the histogram of gossip store descriptor staleness used in allocation decisions.
What triggers gossip updates
We only update the storepool state with newer information here. We also update the storepool state following lease transfers and replica changes with the estimated impact.
Gossip updates occur for the local store every 1 minute and will also be triggered if between now and the last gossip, any of these are true:
| Condition | Condition Check Trigger | Description |
|---|---|---|
| (replica count delta) > 2 | Replica change | code |
| (replica count delta)/(last gossipped replica count) > 1% of last gossipped value | Replica change | code |
| (lease count delta) > 0 and (lease count delta)/(last gossipped lease count) >1% | Lease transfer | code |
| (qps delta)/(last gossipped qps) > 50% and qps delta > 100 | Every 10 seconds (when updating metrics gauges) | code |
Expected behavior
Gossip should occur only when the above conditions are met. Additionally we may wish to investigate lowering the timer from 1 minute, given this bug has existed for some time without issue. Making the interval explicit rather than implicit.
Jira issue: CRDB-16019