Skip to content

plugin/errors: add show_first option to consolidate (#7702)#7703

Merged
yongtang merged 1 commit into
coredns:masterfrom
cangming:errors-show-first
Dec 10, 2025
Merged

plugin/errors: add show_first option to consolidate (#7702)#7703
yongtang merged 1 commit into
coredns:masterfrom
cangming:errors-show-first

Conversation

@cangming

@cangming cangming commented Nov 22, 2025

Copy link
Copy Markdown
Contributor

1. Why is this pull request needed and what does it do?

Add optional show_first flag to consolidate directive that logs
the first error immediately and then consolidates subsequent errors.

When show_first is enabled:

The first matching error is logged immediately with full details
(rcode, domain, type, error message) using the configured log level
Subsequent matching errors are consolidated during the period
At period end:
If only one error occurred, no summary is printed (already logged)
If multiple errors occurred, summary shows the total count

2. Which issues (if any) are related?

#7702

3. Which documentation changes (if any) need to be made?

Update errors plugin docs to include show_first settings and examples

4. Does this introduce a backward incompatible change or deprecation?

No breaking changes. The show_first option disable by default .

@cangming

cangming commented Nov 23, 2025

Copy link
Copy Markdown
Contributor Author

Test Failure Investigation

Summary

The Test e2e failure is not related to this PR. This is a pre-existing flaky test in the master branch.

Evidence

I tested the master branch (commit de010910e) locally with the same test setup:

docker run --rm -v /path/to/coredns:/workspace -w /workspace/test golang:1.25 \
  bash -c "go install github.com/fatih/faillint@c56e3ec6dbfc933bbeb884fd31f2bcd41f712657 && \
  go test -race -count=3 ./..."

Result: The same data race in TestReadme was reproduced on master branch.

image

Details

Test Status for This PR

All tests in plugin/errors pass successfully:
✅ Unit tests: PASS (with -race)
✅ Integration tests: PASS
✅ golangci-lint: 0 issues

The errors plugin changes are isolated and thoroughly tested. The e2e failure is a known flaky test issue that should be addressed separately.

Comment thread plugin/errors/errors.go Outdated
if cnt > 0 {
h.patterns[i].logCallback("%d errors like '%s' occurred in last %s",
cnt, h.patterns[i].pattern.String(), h.patterns[i].period)
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid duplicated code in different branches. Consider a single condition like

	if cnt > 1 || (cnt == 0 && !h.patterns[i].showFirst) {

Parentheses are probably not required, but they make the code clearer.
Alternatively, you can consider the code like

	if cnt == 0 {
		return
	}
	if cnt > 1 || !h.patterns[i].showFirst {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Comment thread plugin/errors/errors.go Outdated
for i := range h.patterns {
if h.patterns[i].pattern.MatchString(strErr) {
if h.inc(i) {
if h.recordError(i, rcode, state.Name(), state.Type(), strErr) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider just returning false from recordError when cnt == 1 and showFirst is configured. The error message will be printed by the log.Errorf below.
This way, you will not need to pass more parameters to recordError and to duplicate the error printing code.
The function recordError needs to have a clear description

@cangming cangming Nov 26, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I make following changes.

  • Rename inc to consolidateError
  • Removed all parameters, keeping function not change
  • Returns false when showFirst is enabled, letting caller handle logging
  • Unified two log call sites using function pointer
  • First error now uses pattern's configured log level (not always ERROR)

Comment thread plugin/errors/errors_test.go Outdated
expCnt := uint32(tt.errorCount)
actCnt := atomic.LoadUint32(&h.patterns[0].count)
if actCnt != expCnt {
t.Fatalf("Unexpected 'count', expected %d, actual %d", expCnt, actCnt)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider following to continue with other test cases

  t.Errorf(...)
  continue

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Comment thread plugin/errors/setup.go Outdated
// Check for show_first option
showFirst := false
if len(args) > 2+logLevelArgCount {
if args[2+logLevelArgCount] == "show_first" {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead, consider refactoring the parseLogLevel (rename to parseOptionalParams) to loop through remaining arguments and return logger and showFirst flag.

@cangming cangming Nov 26, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I make following changes.

  • Loop through remaining arguments and return logger and showFirst flag.
  • Map-based dispatch replaces switch-case
  • Added parameter order validation (log level must precede show_first)
  • Added validation for multiple log levels
  • Clearer error messages

@cangming cangming force-pushed the errors-show-first branch 2 times, most recently from 5b4239f to 57fc718 Compare November 26, 2025 04:33

@rdrozhdzh rdrozhdzh left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, look good to me. See one minor note

Comment thread plugin/errors/setup.go Outdated
if len(args) != 3 {
return log.Errorf, nil
// parseOptionalParams parses optional parameters (log level and show_first flag)
// from the remaining arguments starting at index 2.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead, you can pass args[2:] to this func and avoid checking/mentioning this magic number inside this func

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as suggested. Thanks for the review!

Add optional show_first flag to consolidate directive that logs
the first error immediately and then consolidates subsequent errors.

When show_first is enabled:
- The first matching error is logged immediately with full details
  (rcode, domain, type, error message) using the configured log level
- Subsequent matching errors are consolidated during the period
- At period end:
  - If only one error occurred, no summary is printed (already logged)
  - If multiple errors occurred, summary shows the total count

Syntax:
  consolidate DURATION REGEXP [LEVEL] [show_first]

Example with 3 errors:
  [WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout
  [WARNING] 3 errors like '^read udp .* i/o timeout$' occurred in last 30s

Example with 1 error:
  [WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout

Implementation details:
- Add showFirst bool to pattern struct
- Rename inc() to consolidateError(), return false for showFirst case
- Use function pointer in ServeDNS to unify log calls with proper level
- Simplify logPattern() with single condition (cnt > 1 || !showFirst)
- Refactor parseLogLevel() to parseOptionalParams() with map-based dispatch
- Validate parameter order: log level must come before show_first
- Update README.md with show_first documentation and examples
- Add comprehensive test cases for show_first functionality

Signed-off-by: cangming <cangming@cangming.app>
@yongtang yongtang merged commit cad961f into coredns:master Dec 10, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants