fix: reduce API rate limit pressure in train-drain3-weights workflow#24392
fix: reduce API rate limit pressure in train-drain3-weights workflow#24392
Conversation
…n, limit to 100 logs Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d3933290-6e8e-4ccb-a118-7bdc43be5ad0 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Reduces GitHub API rate-limit pressure and CI log noise in the daily train-drain3-weights workflow by lowering run volume, adding caching, and adjusting CLI behavior for CI/non-interactive environments.
Changes:
- Throttle
gh-aw logspagination by adding a fixed cooldown between batch iterations. - Suppress the
\r-based progress bar in CI to avoid flooding workflow logs. - Update the training workflow to process fewer runs and cache downloaded logs between daily executions.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/train-drain3-weights.yml |
Lowers run count and adds an Actions cache for /tmp/drain3-logs to reduce repeated downloads. |
pkg/cli/logs_orchestrator.go |
Adds API-call cooldown between pagination iterations; disables progress bar output in CI. |
pkg/cli/logs_models.go |
Introduces APICallCooldown constant used to pace API pagination. |
pkg/agentdrain/data/default_weights.json |
Updates embedded default drain3 weights content/format from training output. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 4/4 changed files
- Comments generated: 3
| // The first iteration (iteration == 0) runs immediately; subsequent iterations | ||
| // pause briefly to give the API rate limit window time to recover. | ||
| if iteration > 0 { | ||
| time.Sleep(APICallCooldown) |
There was a problem hiding this comment.
time.Sleep(APICallCooldown) does not respect ctx cancellation. If the user cancels (Ctrl+C) during the cooldown, the loop will block until the sleep completes before returning. Consider making the cooldown context-aware (e.g., select on ctx.Done() vs a time.NewTimer(APICallCooldown)), so cancellation is immediate while still throttling API calls.
| time.Sleep(APICallCooldown) | |
| timer := time.NewTimer(APICallCooldown) | |
| select { | |
| case <-ctx.Done(): | |
| if !timer.Stop() { | |
| <-timer.C | |
| } | |
| fmt.Fprintln(os.Stderr, console.FormatWarningMessage("Operation cancelled")) | |
| return ctx.Err() | |
| case <-timer.C: | |
| } |
| // Create progress bar for tracking run processing (only in non-verbose, non-CI mode) | ||
| // In CI environments \r is treated as a newline, producing excessive output for each update. | ||
| var progressBar *console.ProgressBar | ||
| if !verbose { | ||
| if !verbose && !IsRunningInCI() { | ||
| progressBar = console.NewProgressBar(int64(totalRuns)) | ||
| fmt.Fprintf(os.Stderr, "Processing runs: %s\r", progressBar.Update(0)) |
There was a problem hiding this comment.
The comment mentions “non-TTY” behavior, but the guard only checks !IsRunningInCI(). In other non-TTY contexts (e.g., piping stderr to a file locally), the \r updates will still emit noisy output. Consider gating progress rendering on tty.IsStderrTerminal() (or equivalent) rather than CI detection, since the underlying issue is terminal capability, not CI specifically.
| - name: Restore logs cache | ||
| id: cache-logs | ||
| uses: actions/cache@668228422ae6a00e4ad889ee87cd7109ec5666a7 # v5.0.4 | ||
| with: | ||
| path: /tmp/drain3-logs | ||
| key: drain3-logs-${{ runner.os }}-${{ github.run_id }} | ||
| restore-keys: | | ||
| drain3-logs-${{ runner.os }}- |
There was a problem hiding this comment.
With key: drain3-logs-${{ runner.os }}-${{ github.run_id }}, every workflow run will save a new cache entry (even when a previous cache is restored via restore-keys), which can churn the Actions cache and increase eviction pressure if /tmp/drain3-logs is large. If the goal is to reuse the same cache across daily runs, consider using a more stable key (e.g., per-OS + date/week) or switching to explicit actions/cache/restore + actions/cache/save so you only write a new cache when you intentionally rotate it.
The daily
train-drain3-weightsworkflow hit GitHub API rate limits when processing 1000 runs — making ~400 API calls (downloads + job status fetches) — and flooded CI logs with hundreds of\r-based progress bar lines that don't overwrite in non-TTY environments.Changes
train-drain3-weights.ymlactions/cachefor/tmp/drain3-logs: restores previously downloaded run artifacts on subsequent daily runs; the in-code cache (run_summary.json) then skips re-downloading already-seen runspkg/cli/logs_orchestrator.go\rdoesn't overwrite in non-TTY — addedIsRunningInCI()guard so the progress bar only renders in interactive terminalstime.Sleep(APICallCooldown)before each iteration after the first to reduce burst API pressure during paginated fetchespkg/cli/logs_models.goAPICallCooldown = 500 * time.Millisecondconstant