Add lightweight cluster PING/PONG heartbeats with full-refresh safeguards by hpatro · Pull Request #5 · hpatro/valkey

hpatro · 2026-03-20T21:31:49Z

Motivation

Reduce steady-state cluster-bus bandwidth by using a lightweight PING/PONG header for routine failure detection while preserving existing safety semantics.
Avoid regressions where topology/metadata and failover-sensitive flags (e.g. NOFAILOVER) become slow to propagate under light heartbeats.
Ensure recovered-node failure-report cleanup remains reliable by retransmitting clear signals until stale reports can no longer affect quorum decisions.

Description

Add light-heartbeat support guarded by a new capability bit CLUSTER_NODE_LIGHT_HDR_PING_SUPPORTED and per-node bookkeeping fields fail_report_clear_time and last_full_heartbeat_sent in clusterNode (see src/cluster_legacy.h).
Split gossip processing into clusterProcessGossipEntries() to allow both full clusterMsg and light clusterMsgLight handling, and add clusterMsgLight count helpers to validate/process light packets (src/cluster_legacy.c).
Implement clusterSendPingWithOptions() and clusterShouldSendFullHeartbeat() to choose between light and full heartbeats; light heartbeats carry compact failure gossip and retransmitted clear entries while full heartbeats still include topology, extensions and metadata (src/cluster_legacy.c).
Preserve fast propagation for header-only metadata by forcing immediate full broadcasts from clusterUpdateMyself*/update helpers (use CLUSTER_TODO_BROADCAST_ALL) and ensure periodic per-peer full heartbeats at the configured direct-ping interval (cluster_ping_interval or cluster_node_timeout/2).
Keep recovery-clear behavior deterministic by retransmitting recovered-node "clear" gossip until CLUSTER_FAIL_REPORT_VALIDITY_MULT * cluster_node_timeout expires, preventing stale external reports from causing false FAIL decisions (src/cluster_legacy.c).
Add unit test coverage and small test adjustments: extend tests/unit/cluster/failure-marking.tcl with a retransmit/regression scenario and adjust tests/unit/cluster/cross-version-cluster.tcl to reflect mixed-version expectations.

Testing

Built the server successfully with make -C src valkey-server -j4 (binary built).
Ran ./runtest --single unit/cluster/failure-marking --clients 1 and the suite completed with all tests passing (regressions for recovery-clear addressed by the implementation).
Ran a set of related cluster unit tests: unit/cluster/no-failover-option, unit/cluster/hostnames, unit/cluster/announce-client-ip, unit/cluster/announce-client-ports, and unit/cluster/announced-endpoints, and all executed tests passed.
Ran ./runtest --single unit/cluster/cross-version-cluster --clients 1; mixed-version scenarios that require an external older binary were skipped in this environment (these tests are preserved and exercise fallback/compatibility when run with --other-server-path).

Codex Task

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

hpatro added 2 commits March 16, 2026 23:41

Add AGENTS.md file for agentic coding assistant steering

b7cdeb5

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

Add lightweight cluster heartbeats

eeaecb2

hpatro added the codex label Mar 20, 2026 — with ChatGPT Codex Connector

github-actions Bot assigned hpatro Mar 20, 2026

github-advanced-security AI found potential problems Mar 20, 2026

View reviewed changes

Comment thread src/cluster_legacy.c Dismissed

hpatro closed this Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lightweight cluster PING/PONG heartbeats with full-refresh safeguards#5

Add lightweight cluster PING/PONG heartbeats with full-refresh safeguards#5
hpatro wants to merge 2 commits into
unstablefrom
codex/execute-plan-and-ensure-tests-pass

hpatro commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hpatro commented Mar 20, 2026

Motivation

Description

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants