-
Notifications
You must be signed in to change notification settings - Fork 51
[New Rule] Redpanda: Add High-Severity Failures Detection Rule #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hey @asr2003, your PR is very good! However, it should include a working reproduction project and a demo video. If you add these, your PR will be strong enough to be merged, I hope. 😁 |
|
Thanks for the reminder! I have added a link to the reproducer repository in the PR description. Due to device memory constraints, I am unable to submit a video recording, attempting to capture all reproducers would cause system instability. However, the repo includes detailed logs and reproduction instructions |
|
If you’re unable to upload a demo video, it might be problematic for the reviewers. I’m happy to help—if you send me an invite to your repository, I can run it on my machine and check whether the reproduction is valid. |
|
@asr2003 we do require a video for all submissions. Please submit. |
|
@Lyndon-prequel Okay. I will take help of Prequel Community member |
|
Okay, I will keep only official high severity failures stated by Redpanda
in their docs and will cleanup remaining
…On Wed, 11 Jun, 2025, 8:55 pm Tony Meehan, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In rules/cre-2025-0080/redpanda-high-severity-issues.yaml
<#75 (comment)>:
> + - "https://docs.redpanda.com/current/manage/raft-group-reconfiguration/"
+ - "https://docs.redpanda.com/current/manage/cluster-maintenance/node-property-configuration/"
+ - "https://docs.redpanda.com/current/manage/monitoring/"
+ - "https://vectorized.io"
+ reports: 0
+ version: "0.1.0"
+ applications:
+ - name: "redpanda"
+ processName: "redpanda"
+ version: "24.3+"
+ rule:
+ set:
+ event:
+ source: cre.log.redpanda
+ match:
+ - regex:
This regular expression is too big and has too many conditions. It also is
adding a \n between the ORs.
Please break this up into multiple regex: statements to make this a bit
more readable and avoid the \n problem. Also, do we really need 17
conditions to detect this problem? Can we use 3 or 4?
The regular expressions compile down and are fast, but it's hard to
understand and follow this problem with 17 conditions.
—
Reply to this email directly, view it on GitHub
<#75 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BGXZB6HQ5KZUWPLZ3VN4MWL3DBC67AVCNFSM6AAAAAB63Y3YS2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDSMJXG4YTCNZVG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Done! |
/claim #69
Closes #69
This PR adds a new Prequel rule
CRE-2025-0080to detect critical Redpanda startup/runtime issues that can lead to degraded performance, unavailability, or data loss. The rule targets system log patterns related to disk errors, RPC failures, Raft instability and snapshot corruption.Based on Redpanda Official HA Recommendations
The following failure scopes are directly drawn from Redpanda’s official documentation on high availability:
https://docs.redpanda.com/current/deploy/deployment-option/self-hosted/manual/high-availability/#multi-broker-deployment
Additional Failure Modes Covered by This Rule
In addition to the HA docs, this rule also detects:
kvstoreor bad mountsparser.cclogsWhy It Matters
These patterns are frequently encountered in real-world outages and cluster stalls. This CRE helps teams proactively detect:
Playground Link - Check here|
Sample logs: test.log
REPRODUCER LINK (Maintainers-only invited): REDPANDA HA
DEMO VIDEO
Screencast from 2025-06-10 00-08-02.webm
2025-06-10.03-59-15-VEED.mp4