feat(logs): query GitHub API rate limit before each batch iteration#24424
feat(logs): query GitHub API rate limit before each batch iteration#24424
Conversation
Add checkAndWaitForRateLimit to logs_rate_limit.go which queries `gh api rate_limit` before each batch iteration. When the core remaining budget falls at or below RateLimitThreshold (10) the helper sleeps until the reset window plus a small buffer; otherwise it applies the existing APICallCooldown. Replace the bare time.Sleep in the logs orchestrator with calls to this helper. Add RateLimitThreshold and rateLimitResetBuffer constants to logs_models.go, and unit tests in logs_rate_limit_test.go." Agent-Logs-Url: https://github.com/github/gh-aw/sessions/b0580c30-d803-450d-a8e6-77a0aadce917 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Change spinner message from 'Checking rate limit...' to 'Verifying API quota...' - Clarify warning message to 'X of Y requests remaining' instead of 'X/Y remaining' Agent-Logs-Url: https://github.com/github/gh-aw/sessions/b0580c30-d803-450d-a8e6-77a0aadce917 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds adaptive GitHub API rate-limit awareness to the logs download batching loop to reduce the likelihood of 403/429 errors during long runs.
Changes:
- Introduces a new rate-limit helper that queries
gh api rate_limitand waits until reset when the core quota is low. - Adds rate-limit threshold and reset buffer constants alongside existing iteration/batch constants.
- Updates the logs orchestrator to use the helper instead of a fixed sleep; adds basic unit tests for rate-limit JSON parsing and constant sanity.
Show a summary per file
| File | Description |
|---|---|
| pkg/cli/logs_rate_limit.go | New helper to query core rate limit and sleep adaptively between batch iterations |
| pkg/cli/logs_models.go | Adds RateLimitThreshold and rateLimitResetBuffer constants used by the helper |
| pkg/cli/logs_orchestrator.go | Replaces fixed time.Sleep(APICallCooldown) with checkAndWaitForRateLimit between iterations |
| pkg/cli/logs_rate_limit_test.go | Adds unit tests for JSON unmarshalling and simple threshold/constant checks |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
pkg/cli/logs_rate_limit.go:122
- The new
checkAndWaitForRateLimitbehavior (fallback on fetch error, threshold wait until reset, and verbose/non-verbose messaging) isn’t exercised by unit tests—current tests only cover JSON unmarshalling and a boolean boundary expression. Consider making the gh-call and sleep injectable (e.g., package-level function vars) so tests can validate the wait/fallback paths without real network calls or long sleeps.
func checkAndWaitForRateLimit(verbose bool) error {
rl, err := fetchRateLimit()
if err != nil {
// Best-effort: fall back to static cooldown so the caller can continue.
logsRateLimitLog.Printf("Could not fetch rate limit, using static cooldown: %v", err)
time.Sleep(APICallCooldown)
return err
}
if rl.Remaining <= RateLimitThreshold {
resetAt := time.Unix(rl.Reset, 0)
waitDur := time.Until(resetAt)
if waitDur <= 0 {
// Reset has already passed; apply minimal cooldown and carry on.
logsRateLimitLog.Print("Rate limit reset has already passed, applying minimal cooldown")
time.Sleep(APICallCooldown)
return nil
}
// Add a small buffer so we don't resume right on the boundary.
waitDur += rateLimitResetBuffer
msg := fmt.Sprintf(
"GitHub API rate limit nearly exhausted (%d of %d requests remaining). Waiting %.0f seconds until reset at %s",
rl.Remaining, rl.Limit, waitDur.Seconds(), resetAt.UTC().Format(time.RFC3339),
)
fmt.Fprintln(os.Stderr, console.FormatWarningMessage(msg))
logsRateLimitLog.Printf("Sleeping for rate limit reset: duration=%s", waitDur)
time.Sleep(waitDur)
return nil
}
if verbose {
fmt.Fprintln(os.Stderr, console.FormatVerboseMessage(
fmt.Sprintf("Rate limit OK: %d/%d requests remaining", rl.Remaining, rl.Limit),
))
}
// Even when budget is healthy, apply the minimum inter-call cooldown.
time.Sleep(APICallCooldown)
return nil
- Files reviewed: 4/4 changed files
- Comments generated: 3
| logsRateLimitLog.Print("Querying GitHub API rate limit") | ||
|
|
||
| output, err := workflow.RunGHCombined("Verifying API quota...", "api", "rate_limit") | ||
| if err != nil { | ||
| return rateLimitResource{}, fmt.Errorf("failed to query rate limit: %w", err) | ||
| } | ||
|
|
||
| var resp rateLimitResponse | ||
| if err := json.Unmarshal(output, &resp); err != nil { | ||
| return rateLimitResource{}, fmt.Errorf("failed to parse rate limit response: %w", err) | ||
| } |
There was a problem hiding this comment.
fetchRateLimit uses workflow.RunGHCombined, which returns combined stdout+stderr. If gh emits any warnings on stderr while still succeeding, json.Unmarshal can fail on otherwise valid JSON. Prefer workflow.RunGH (stdout-only) for JSON parsing and rely on enrichGHError for stderr details on failure.
This issue also appears on line 82 of the same file.
| // checkAndWaitForRateLimit queries the GitHub API rate limit and sleeps until | ||
| // the reset window when the remaining core request budget falls at or below | ||
| // RateLimitThreshold. It always waits at least APICallCooldown between | ||
| // successive calls so that even when requests are plentiful the orchestrator | ||
| // does not hammer the API. | ||
| // | ||
| // If the rate limit cannot be fetched (e.g. network error) the function falls | ||
| // back to the static APICallCooldown sleep and returns the error so callers | ||
| // can decide whether to surface it. | ||
| func checkAndWaitForRateLimit(verbose bool) error { | ||
| rl, err := fetchRateLimit() | ||
| if err != nil { | ||
| // Best-effort: fall back to static cooldown so the caller can continue. | ||
| logsRateLimitLog.Printf("Could not fetch rate limit, using static cooldown: %v", err) | ||
| time.Sleep(APICallCooldown) | ||
| return err | ||
| } | ||
|
|
||
| if rl.Remaining <= RateLimitThreshold { | ||
| resetAt := time.Unix(rl.Reset, 0) | ||
| waitDur := time.Until(resetAt) | ||
| if waitDur <= 0 { | ||
| // Reset has already passed; apply minimal cooldown and carry on. | ||
| logsRateLimitLog.Print("Rate limit reset has already passed, applying minimal cooldown") | ||
| time.Sleep(APICallCooldown) | ||
| return nil | ||
| } | ||
|
|
||
| // Add a small buffer so we don't resume right on the boundary. | ||
| waitDur += rateLimitResetBuffer | ||
|
|
||
| msg := fmt.Sprintf( | ||
| "GitHub API rate limit nearly exhausted (%d of %d requests remaining). Waiting %.0f seconds until reset at %s", | ||
| rl.Remaining, rl.Limit, waitDur.Seconds(), resetAt.UTC().Format(time.RFC3339), | ||
| ) | ||
| fmt.Fprintln(os.Stderr, console.FormatWarningMessage(msg)) | ||
| logsRateLimitLog.Printf("Sleeping for rate limit reset: duration=%s", waitDur) | ||
| time.Sleep(waitDur) | ||
| return nil |
There was a problem hiding this comment.
checkAndWaitForRateLimit can sleep until the GitHub rate-limit reset, which may be many minutes, but it does not observe caller cancellation (context) during time.Sleep. This can make Ctrl-C / context cancellation unresponsive; consider accepting a context.Context and using a timer + select on ctx.Done() while waiting.
| if rlErr := checkAndWaitForRateLimit(verbose); rlErr != nil { | ||
| logsOrchestratorLog.Printf("Rate limit check failed (using static cooldown): %v", rlErr) | ||
| } |
There was a problem hiding this comment.
The loop’s rate-limit wait can now block for a long time and currently can’t be interrupted via ctx because checkAndWaitForRateLimit doesn’t take a context. Passing ctx through (and making the wait context-aware) would keep cancellation/timeout behavior consistent with the rest of DownloadWorkflowLogs.
| if rlErr := checkAndWaitForRateLimit(verbose); rlErr != nil { | |
| logsOrchestratorLog.Printf("Rate limit check failed (using static cooldown): %v", rlErr) | |
| } | |
| rateLimitWaitCanceled := false | |
| rlDone := make(chan error, 1) | |
| go func() { | |
| rlDone <- checkAndWaitForRateLimit(verbose) | |
| }() | |
| select { | |
| case <-ctx.Done(): | |
| rateLimitWaitCanceled = true | |
| case rlErr := <-rlDone: | |
| if rlErr != nil { | |
| logsOrchestratorLog.Printf("Rate limit check failed (using static cooldown): %v", rlErr) | |
| } | |
| } | |
| if rateLimitWaitCanceled { | |
| if verbose { | |
| fmt.Fprintln(os.Stderr, console.FormatWarningMessage("Context canceled while waiting for rate limit, stopping download")) | |
| } | |
| break | |
| } |
Log downloads and workflow-run listing draw from the GitHub API core quota. The previous approach used a fixed 500 ms sleep between iterations with no awareness of the actual remaining budget, leaving runs at risk of hitting 403/429 errors mid-download.
Changes
pkg/cli/logs_rate_limit.go(new) — rate-limit helperfetchRateLimit()— callsgh api rate_limitand returns the parsedcoreresource bucketcheckAndWaitForRateLimit(verbose bool)— checks remaining vs.RateLimitThreshold; if exhausted, sleeps untilreset + 2sbuffer and emits a warning; otherwise falls back toAPICallCooldown; on API failure, falls back toAPICallCooldownand surfaces the errorpkg/cli/logs_models.go— two new constants:RateLimitThreshold = 10— minimum remaining requests before we waitrateLimitResetBuffer = 2s— cushion added on top of the reset timestamppkg/cli/logs_orchestrator.go— replaces the baretime.Sleep(APICallCooldown)between iterations withcheckAndWaitForRateLimit, preserving the minimum inter-call spacing while adding adaptive waiting when the budget is lowpkg/cli/logs_rate_limit_test.go(new) — unit tests for JSON unmarshalling, constant sanity, and threshold boundary conditions