plugin/errors: add show_first option to consolidate (#7702)#7703
Conversation
Test Failure InvestigationSummaryThe EvidenceI tested the master branch (commit docker run --rm -v /path/to/coredns:/workspace -w /workspace/test golang:1.25 \
bash -c "go install github.com/fatih/faillint@c56e3ec6dbfc933bbeb884fd31f2bcd41f712657 && \
go test -race -count=3 ./..."Result: The same data race in TestReadme was reproduced on master branch.
Details
Test Status for This PR All tests in plugin/errors pass successfully: The errors plugin changes are isolated and thoroughly tested. The e2e failure is a known flaky test issue that should be addressed separately. |
| if cnt > 0 { | ||
| h.patterns[i].logCallback("%d errors like '%s' occurred in last %s", | ||
| cnt, h.patterns[i].pattern.String(), h.patterns[i].period) | ||
| } |
There was a problem hiding this comment.
Please avoid duplicated code in different branches. Consider a single condition like
if cnt > 1 || (cnt == 0 && !h.patterns[i].showFirst) {Parentheses are probably not required, but they make the code clearer.
Alternatively, you can consider the code like
if cnt == 0 {
return
}
if cnt > 1 || !h.patterns[i].showFirst {| for i := range h.patterns { | ||
| if h.patterns[i].pattern.MatchString(strErr) { | ||
| if h.inc(i) { | ||
| if h.recordError(i, rcode, state.Name(), state.Type(), strErr) { |
There was a problem hiding this comment.
consider just returning false from recordError when cnt == 1 and showFirst is configured. The error message will be printed by the log.Errorf below.
This way, you will not need to pass more parameters to recordError and to duplicate the error printing code.
The function recordError needs to have a clear description
There was a problem hiding this comment.
I make following changes.
- Rename inc to consolidateError
- Removed all parameters, keeping function not change
- Returns false when showFirst is enabled, letting caller handle logging
- Unified two log call sites using function pointer
- First error now uses pattern's configured log level (not always ERROR)
| expCnt := uint32(tt.errorCount) | ||
| actCnt := atomic.LoadUint32(&h.patterns[0].count) | ||
| if actCnt != expCnt { | ||
| t.Fatalf("Unexpected 'count', expected %d, actual %d", expCnt, actCnt) |
There was a problem hiding this comment.
consider following to continue with other test cases
t.Errorf(...)
continue| // Check for show_first option | ||
| showFirst := false | ||
| if len(args) > 2+logLevelArgCount { | ||
| if args[2+logLevelArgCount] == "show_first" { |
There was a problem hiding this comment.
instead, consider refactoring the parseLogLevel (rename to parseOptionalParams) to loop through remaining arguments and return logger and showFirst flag.
There was a problem hiding this comment.
I make following changes.
- Loop through remaining arguments and return logger and showFirst flag.
- Map-based dispatch replaces switch-case
- Added parameter order validation (log level must precede show_first)
- Added validation for multiple log levels
- Clearer error messages
5b4239f to
57fc718
Compare
rdrozhdzh
left a comment
There was a problem hiding this comment.
thanks, look good to me. See one minor note
| if len(args) != 3 { | ||
| return log.Errorf, nil | ||
| // parseOptionalParams parses optional parameters (log level and show_first flag) | ||
| // from the remaining arguments starting at index 2. |
There was a problem hiding this comment.
instead, you can pass args[2:] to this func and avoid checking/mentioning this magic number inside this func
There was a problem hiding this comment.
Fixed as suggested. Thanks for the review!
Add optional show_first flag to consolidate directive that logs the first error immediately and then consolidates subsequent errors. When show_first is enabled: - The first matching error is logged immediately with full details (rcode, domain, type, error message) using the configured log level - Subsequent matching errors are consolidated during the period - At period end: - If only one error occurred, no summary is printed (already logged) - If multiple errors occurred, summary shows the total count Syntax: consolidate DURATION REGEXP [LEVEL] [show_first] Example with 3 errors: [WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout [WARNING] 3 errors like '^read udp .* i/o timeout$' occurred in last 30s Example with 1 error: [WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout Implementation details: - Add showFirst bool to pattern struct - Rename inc() to consolidateError(), return false for showFirst case - Use function pointer in ServeDNS to unify log calls with proper level - Simplify logPattern() with single condition (cnt > 1 || !showFirst) - Refactor parseLogLevel() to parseOptionalParams() with map-based dispatch - Validate parameter order: log level must come before show_first - Update README.md with show_first documentation and examples - Add comprehensive test cases for show_first functionality Signed-off-by: cangming <cangming@cangming.app>
57fc718 to
b17240c
Compare

1. Why is this pull request needed and what does it do?
Add optional show_first flag to consolidate directive that logs
the first error immediately and then consolidates subsequent errors.
When show_first is enabled:
The first matching error is logged immediately with full details
(rcode, domain, type, error message) using the configured log level
Subsequent matching errors are consolidated during the period
At period end:
If only one error occurred, no summary is printed (already logged)
If multiple errors occurred, summary shows the total count
2. Which issues (if any) are related?
#7702
3. Which documentation changes (if any) need to be made?
Update errors plugin docs to include show_first settings and examples
4. Does this introduce a backward incompatible change or deprecation?
No breaking changes. The show_first option disable by default .