Skip to content

feat(logs): query GitHub API rate limit before each batch iteration#24424

Merged
pelikhan merged 3 commits intomainfrom
copilot/add-rate-limit-helper-to-logs-command
Apr 4, 2026
Merged

feat(logs): query GitHub API rate limit before each batch iteration#24424
pelikhan merged 3 commits intomainfrom
copilot/add-rate-limit-helper-to-logs-command

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 4, 2026

Log downloads and workflow-run listing draw from the GitHub API core quota. The previous approach used a fixed 500 ms sleep between iterations with no awareness of the actual remaining budget, leaving runs at risk of hitting 403/429 errors mid-download.

Changes

  • pkg/cli/logs_rate_limit.go (new) — rate-limit helper

    • fetchRateLimit() — calls gh api rate_limit and returns the parsed core resource bucket
    • checkAndWaitForRateLimit(verbose bool) — checks remaining vs. RateLimitThreshold; if exhausted, sleeps until reset + 2s buffer and emits a warning; otherwise falls back to APICallCooldown; on API failure, falls back to APICallCooldown and surfaces the error
  • pkg/cli/logs_models.go — two new constants:

    • RateLimitThreshold = 10 — minimum remaining requests before we wait
    • rateLimitResetBuffer = 2s — cushion added on top of the reset timestamp
  • pkg/cli/logs_orchestrator.go — replaces the bare time.Sleep(APICallCooldown) between iterations with checkAndWaitForRateLimit, preserving the minimum inter-call spacing while adding adaptive waiting when the budget is low

  • pkg/cli/logs_rate_limit_test.go (new) — unit tests for JSON unmarshalling, constant sanity, and threshold boundary conditions

// Before (static sleep, no rate-limit awareness)
if iteration > 0 {
    time.Sleep(APICallCooldown)
}

// After (queries actual quota, waits for reset if needed)
if iteration > 0 {
    if rlErr := checkAndWaitForRateLimit(verbose); rlErr != nil {
        logsOrchestratorLog.Printf("Rate limit check failed (using static cooldown): %v", rlErr)
    }
}

Copilot AI and others added 2 commits April 4, 2026 02:38
Add checkAndWaitForRateLimit to logs_rate_limit.go which queries
`gh api rate_limit` before each batch iteration.  When the core
remaining budget falls at or below RateLimitThreshold (10) the helper
sleeps until the reset window plus a small buffer; otherwise it applies
the existing APICallCooldown.  Replace the bare time.Sleep in the logs
orchestrator with calls to this helper.

Add RateLimitThreshold and rateLimitResetBuffer constants to
logs_models.go, and unit tests in logs_rate_limit_test.go."

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/b0580c30-d803-450d-a8e6-77a0aadce917

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Change spinner message from 'Checking rate limit...' to 'Verifying API quota...'
- Clarify warning message to 'X of Y requests remaining' instead of 'X/Y remaining'

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/b0580c30-d803-450d-a8e6-77a0aadce917

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan pelikhan marked this pull request as ready for review April 4, 2026 02:50
Copilot AI review requested due to automatic review settings April 4, 2026 02:50
@pelikhan pelikhan merged commit dd0eadb into main Apr 4, 2026
51 checks passed
@pelikhan pelikhan deleted the copilot/add-rate-limit-helper-to-logs-command branch April 4, 2026 02:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds adaptive GitHub API rate-limit awareness to the logs download batching loop to reduce the likelihood of 403/429 errors during long runs.

Changes:

  • Introduces a new rate-limit helper that queries gh api rate_limit and waits until reset when the core quota is low.
  • Adds rate-limit threshold and reset buffer constants alongside existing iteration/batch constants.
  • Updates the logs orchestrator to use the helper instead of a fixed sleep; adds basic unit tests for rate-limit JSON parsing and constant sanity.
Show a summary per file
File Description
pkg/cli/logs_rate_limit.go New helper to query core rate limit and sleep adaptively between batch iterations
pkg/cli/logs_models.go Adds RateLimitThreshold and rateLimitResetBuffer constants used by the helper
pkg/cli/logs_orchestrator.go Replaces fixed time.Sleep(APICallCooldown) with checkAndWaitForRateLimit between iterations
pkg/cli/logs_rate_limit_test.go Adds unit tests for JSON unmarshalling and simple threshold/constant checks

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

pkg/cli/logs_rate_limit.go:122

  • The new checkAndWaitForRateLimit behavior (fallback on fetch error, threshold wait until reset, and verbose/non-verbose messaging) isn’t exercised by unit tests—current tests only cover JSON unmarshalling and a boolean boundary expression. Consider making the gh-call and sleep injectable (e.g., package-level function vars) so tests can validate the wait/fallback paths without real network calls or long sleeps.
func checkAndWaitForRateLimit(verbose bool) error {
	rl, err := fetchRateLimit()
	if err != nil {
		// Best-effort: fall back to static cooldown so the caller can continue.
		logsRateLimitLog.Printf("Could not fetch rate limit, using static cooldown: %v", err)
		time.Sleep(APICallCooldown)
		return err
	}

	if rl.Remaining <= RateLimitThreshold {
		resetAt := time.Unix(rl.Reset, 0)
		waitDur := time.Until(resetAt)
		if waitDur <= 0 {
			// Reset has already passed; apply minimal cooldown and carry on.
			logsRateLimitLog.Print("Rate limit reset has already passed, applying minimal cooldown")
			time.Sleep(APICallCooldown)
			return nil
		}

		// Add a small buffer so we don't resume right on the boundary.
		waitDur += rateLimitResetBuffer

		msg := fmt.Sprintf(
			"GitHub API rate limit nearly exhausted (%d of %d requests remaining). Waiting %.0f seconds until reset at %s",
			rl.Remaining, rl.Limit, waitDur.Seconds(), resetAt.UTC().Format(time.RFC3339),
		)
		fmt.Fprintln(os.Stderr, console.FormatWarningMessage(msg))
		logsRateLimitLog.Printf("Sleeping for rate limit reset: duration=%s", waitDur)
		time.Sleep(waitDur)
		return nil
	}

	if verbose {
		fmt.Fprintln(os.Stderr, console.FormatVerboseMessage(
			fmt.Sprintf("Rate limit OK: %d/%d requests remaining", rl.Remaining, rl.Limit),
		))
	}

	// Even when budget is healthy, apply the minimum inter-call cooldown.
	time.Sleep(APICallCooldown)
	return nil
  • Files reviewed: 4/4 changed files
  • Comments generated: 3

Comment on lines +51 to +61
logsRateLimitLog.Print("Querying GitHub API rate limit")

output, err := workflow.RunGHCombined("Verifying API quota...", "api", "rate_limit")
if err != nil {
return rateLimitResource{}, fmt.Errorf("failed to query rate limit: %w", err)
}

var resp rateLimitResponse
if err := json.Unmarshal(output, &resp); err != nil {
return rateLimitResource{}, fmt.Errorf("failed to parse rate limit response: %w", err)
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchRateLimit uses workflow.RunGHCombined, which returns combined stdout+stderr. If gh emits any warnings on stderr while still succeeding, json.Unmarshal can fail on otherwise valid JSON. Prefer workflow.RunGH (stdout-only) for JSON parsing and rely on enrichGHError for stderr details on failure.

This issue also appears on line 82 of the same file.

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +111
// checkAndWaitForRateLimit queries the GitHub API rate limit and sleeps until
// the reset window when the remaining core request budget falls at or below
// RateLimitThreshold. It always waits at least APICallCooldown between
// successive calls so that even when requests are plentiful the orchestrator
// does not hammer the API.
//
// If the rate limit cannot be fetched (e.g. network error) the function falls
// back to the static APICallCooldown sleep and returns the error so callers
// can decide whether to surface it.
func checkAndWaitForRateLimit(verbose bool) error {
rl, err := fetchRateLimit()
if err != nil {
// Best-effort: fall back to static cooldown so the caller can continue.
logsRateLimitLog.Printf("Could not fetch rate limit, using static cooldown: %v", err)
time.Sleep(APICallCooldown)
return err
}

if rl.Remaining <= RateLimitThreshold {
resetAt := time.Unix(rl.Reset, 0)
waitDur := time.Until(resetAt)
if waitDur <= 0 {
// Reset has already passed; apply minimal cooldown and carry on.
logsRateLimitLog.Print("Rate limit reset has already passed, applying minimal cooldown")
time.Sleep(APICallCooldown)
return nil
}

// Add a small buffer so we don't resume right on the boundary.
waitDur += rateLimitResetBuffer

msg := fmt.Sprintf(
"GitHub API rate limit nearly exhausted (%d of %d requests remaining). Waiting %.0f seconds until reset at %s",
rl.Remaining, rl.Limit, waitDur.Seconds(), resetAt.UTC().Format(time.RFC3339),
)
fmt.Fprintln(os.Stderr, console.FormatWarningMessage(msg))
logsRateLimitLog.Printf("Sleeping for rate limit reset: duration=%s", waitDur)
time.Sleep(waitDur)
return nil
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkAndWaitForRateLimit can sleep until the GitHub rate-limit reset, which may be many minutes, but it does not observe caller cancellation (context) during time.Sleep. This can make Ctrl-C / context cancellation unresponsive; consider accepting a context.Context and using a timer + select on ctx.Done() while waiting.

Copilot uses AI. Check for mistakes.
Comment on lines +120 to +122
if rlErr := checkAndWaitForRateLimit(verbose); rlErr != nil {
logsOrchestratorLog.Printf("Rate limit check failed (using static cooldown): %v", rlErr)
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop’s rate-limit wait can now block for a long time and currently can’t be interrupted via ctx because checkAndWaitForRateLimit doesn’t take a context. Passing ctx through (and making the wait context-aware) would keep cancellation/timeout behavior consistent with the rest of DownloadWorkflowLogs.

Suggested change
if rlErr := checkAndWaitForRateLimit(verbose); rlErr != nil {
logsOrchestratorLog.Printf("Rate limit check failed (using static cooldown): %v", rlErr)
}
rateLimitWaitCanceled := false
rlDone := make(chan error, 1)
go func() {
rlDone <- checkAndWaitForRateLimit(verbose)
}()
select {
case <-ctx.Done():
rateLimitWaitCanceled = true
case rlErr := <-rlDone:
if rlErr != nil {
logsOrchestratorLog.Printf("Rate limit check failed (using static cooldown): %v", rlErr)
}
}
if rateLimitWaitCanceled {
if verbose {
fmt.Fprintln(os.Stderr, console.FormatWarningMessage("Context canceled while waiting for rate limit, stopping download"))
}
break
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants