Skip to content

Conversation

@varshith257
Copy link
Contributor

@varshith257 varshith257 commented Jun 6, 2025

NOTE: This reproducer have been taken from redpandas official examples repo https://github.com/redpanda-data/redpanda-examples/tree/main/docker-compose

Details

Introduced a new rule for detecting when a Redpanda node becomes isolated (heartbeats fail) and triggers a Raft re-election, indicating quorum loss and leader election.

  • Updated category and tags to include high-availability and Raft/leader-election context.

Test Environment

Reproducible test setup (Maintainers invited) : Redpanda Cluster
Live CRE link: CRE Playground Link
Check here for Sample Logs

Sample Detected Patterns

[2025-06-06T05:50:30] cluster - Error occurred while sending node status request: rpc::errc::exponential_backoff
[2025-06-06T06:27:17] r/heartbeat - Received error when sending heartbeats to node 1 - rpc::errc::exponential_backoff
[2025-06-06T06:27:31] cluster - unable to get node health report from 1 - rpc::errc::disconnected_endpoint(node down), marking node as down
[2025-06-06T06:28:43] raft - vote_stm.cc:77 - vote reply from {…} - vote_granted: 1, log_ok:1
[2025-06-06T06:28:43] raft - vote_stm.cc:264 - becoming the leader term:2

Reproduction Steps

#  Bring up a 3-node Redpanda cluster
docker-compose -f docker-compose.yaml up -d

sleep 20

# Simulate node isolation (stop broker on node 2)
docker stop redpanda2

# Watch for detected patterns in broker-0’s logs
docker logs redpanda1 --tail 50 | grep -E "rpc::errc::exponential_backoff|disconnected_endpoint|vote_stm"

# Observe Raft re-election:
   # You should see “vote reply … vote_granted” followed by “becoming the leader term…” lines

/claim #59
Closes #59

@varshith257
Copy link
Contributor Author

@tonymeehan Can i have your eye on this 😄?

@Ani-4x
Copy link
Contributor

Ani-4x commented Jun 7, 2025

@varshith257 Heyy , can i get you help with a conflict i am facing , i saw you had the same conflicts with the tags yaml files, can you me to resolve it ?

@varshith257
Copy link
Contributor Author

You need to pull or rebase your local branch with upstream main and resolve merge conflicts in your code editor

@Harsh9485
Copy link
Contributor

I think @tonymeehan gives a chance to new contributors. 😁

@Ani-4x
Copy link
Contributor

Ani-4x commented Jun 7, 2025

@varshith257 Thanks , i think it should work now.

@varshith257 varshith257 requested a review from tonymeehan June 13, 2025 08:34
@tonymeehan
Copy link
Contributor

Looks good. Can you please rename the CRE id folder and id in the rule to CRE-2025-0092? Will avoid conflicts with rules we've merged since this PR was opened. Then should be good to go.

@varshith257
Copy link
Contributor Author

@tonymeehan Renamed CRE to 92

@Lyndon-prequel Lyndon-prequel merged commit 36f46a0 into prequel-dev:main Jun 15, 2025
2 checks passed
@varshith257 varshith257 deleted the cre-redpanda branch June 15, 2025 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[New Rule] Redpanda: Reproduce A High-Severity Failure & Write a Detection Rule

5 participants