Skip to content

fix: pre-filter retry scanner DAG runs at index level (#2261)#2262

Merged
yohamta0 merged 1 commit into
dagucloud:mainfrom
four-flames:feat/GH-2261-prefilter-dayruns
Jun 5, 2026
Merged

fix: pre-filter retry scanner DAG runs at index level (#2261)#2262
yohamta0 merged 1 commit into
dagucloud:mainfrom
four-flames:feat/GH-2261-prefilter-dayruns

Conversation

@four-bytes-robby

@four-bytes-robby four-bytes-robby commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Single fix for the memory leak reported in #546, root-caused in #2261.

The file-based DAG run store's loadDayRuns function creates DAGRun objects for ALL runs in a day before resolveStatus filters them by status. The day index already contains the status — use it for pre-filtering.

Changes

internal/persis/file/dagrun/pagination.go — Pass the status filter to loadDayRuns. When the day index is present and a status filter is active, check each index entry's Status before calling NewDAGRun. Skip entries whose known status doesn't match the filter.

This avoids creating 12,960+ DAGRun + DAGRunStatus objects per retry-scanner scan for installations with many historical runs (9 DAGs × every minute = 12,960 runs/day in a 24h window).

Design Decision: No periodic FreeOSMemory()

We considered adding a periodic debug.FreeOSMemory() call to the scheduler's cronLoop to return accumulated heap to the OS. Rejected — two reasons:

  1. The pre-filter eliminates the root cause. With ~0 allocations per scan (instead of ~12MB), the heap doesn't grow in the first place. No wound to bandage.
  2. FreeOSMemory() adds runtime overhead. It triggers a full stop-the-world GC + madvise syscall every 5 minutes, introducing scheduling jitter. Unnecessary once the allocation rate is fixed.

No new imports, no runtime penalty.

Closes #2261
Related: #546

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9cecde39-e987-43a8-8170-2ab28e033c04

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR fixes a memory leak by preventing unnecessary allocation of DAGRun objects when filtering by status and explicitly releasing heap memory in the scheduler loop. The index-level pre-filter skips runs that do not match the requested status, while periodic debug.FreeOSMemory() calls in the scheduler trigger Go's address-space reclamation.

Changes

Memory Leak Fix: Pre-filter DAG Runs and Release Heap

Layer / File(s) Summary
DAG run status filtering at index level
internal/persis/file/dagrun/pagination.go
loadDay passes status filter parameters to loadDayRuns, which now conditionally skips index entries that do not match the requested status filter before constructing DAGRun objects.
Periodic scheduler heap memory release
internal/service/scheduler/scheduler.go
Scheduler's cronLoop imports runtime/debug and calls debug.FreeOSMemory() every 5 ticks to force return of retained heap pages to the OS.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: pre-filtering retry scanner DAG runs at the index level, which directly addresses the primary objective and is the key issue being fixed.
Linked Issues check ✅ Passed The PR implements both required fixes from issue #2261: pre-filtering DAG runs at index level in pagination.go and periodic debug.FreeOSMemory() calls in scheduler.go, directly addressing the memory leak identified in the linked issue.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the two-part fix outlined in issue #2261: pre-filter logic in pagination.go and memory management in scheduler.go. No unrelated or scope-creeping changes are present.
Description check ✅ Passed The PR description comprehensively covers all required template sections with thorough technical details and design rationale.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@four-bytes-robby four-bytes-robby marked this pull request as draft June 4, 2026 14:32

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

@four-bytes-robby four-bytes-robby force-pushed the feat/GH-2261-prefilter-dayruns branch from ab2f9b6 to faa8dbe Compare June 4, 2026 14:45
@four-bytes-robby four-bytes-robby marked this pull request as ready for review June 4, 2026 14:48

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

@yohamta0

yohamta0 commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

LGTM, thank you very much for the improvement!

@yohamta0 yohamta0 merged commit 24d79c8 into dagucloud:main Jun 5, 2026
17 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FIX] Memory leak: RetryScanner loads all DAG runs into memory before filtering by status

2 participants