Skip to content

Harden gallery-agent Hugging Face fetches against transient rate limiting#10187

Merged
mudler merged 3 commits into
masterfrom
copilot/fix-gallery-agent-workflow
Jun 5, 2026
Merged

Harden gallery-agent Hugging Face fetches against transient rate limiting#10187
mudler merged 3 commits into
masterfrom
copilot/fix-gallery-agent-workflow

Conversation

Copilot AI commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Gallery Agent runs were failing in Run gallery agent when Hugging Face returned a transient 429, causing the workflow to exit before any gallery logic could continue. This change makes model discovery resilient to upstream throttling while preserving fast failure for non-retryable client errors.

  • Hugging Face client retry policy (pkg/huggingface-api/client.go)

    • Added bounded retry loop for SearchModels on transient failures:
      • retries on 429 and 5xx
      • exponential backoff with cap
      • Retry-After support (both delta-seconds and HTTP-date)
    • Added typed sentinel error:
      • ErrRateLimited for persistent 429 exhaustion
    • Kept non-retryable 4xx behavior fail-fast (no retry noise on bad requests).
  • Gallery agent behavior on persistent throttling (.github/gallery-agent/main.go)

    • GetTrending failures classified as ErrRateLimited now result in a graceful skip:
      • write summary with ModelsAdded: 0
      • exit without failing the run
    • Other errors still fail the step normally.
  • Coverage for new behavior (pkg/huggingface-api/client_test.go)

    • Added tests for:
      • retry + Retry-After handling on 429
      • fast-fail for non-retryable 4xx
      • persistent 429 returning ErrRateLimited.
if errors.Is(err, hfapi.ErrRateLimited) {
    fmt.Printf("HuggingFace API is rate limited after retries, skipping this run: %v\n", err)
    writeSummary(AddedModelSummary{
        SearchTerm:   searchTerm,
        ModelsAdded:  0,
        Quantization: quantization,
    })
    return
}
Original prompt

Fix the failing GitHub Actions job for the Gallery Agent workflow in mudler/LocalAI.

Context:

  • Failing workflow run: 27027765104
  • Failing job: 79772009990
  • Workflow file: .github/workflows/gallery-agent.yaml
  • Commit where the failure was observed: e837921c2cd49ed91359e7b60dd277bc587148ec
  • The failure occurs in the Run gallery agent step, which runs go run ./.github/gallery-agent with GALLERY_INDEX_PATH=$PWD/gallery/index.yaml.

Observed logs:

  • The decisive failure is:
    • Loaded 1427 existing gallery entries
    • Searching for trending models on HuggingFace...
    • Error fetching models: failed to fetch models. Status code: 429
    • exit status 1
  • There is also an earlier cache restore warning from /usr/bin/tar reporting many Cannot open: File exists messages while restoring the Go toolchain cache, but the job continues past that, prints go version go1.26.0 linux/amd64, and only fails later in the gallery-agent execution. Treat the 429 from Hugging Face as the root cause of the job failure.

Goal:
Make the gallery agent resilient to transient Hugging Face rate limiting so scheduled runs do not fail just because the upstream API returns HTTP 429.

Requirements:

  1. Find the gallery agent code responsible for fetching trending models from Hugging Face.
  2. Implement retry logic for transient HTTP failures, especially 429 responses.
  3. Honor Retry-After when present.
  4. Use exponential backoff with a sensible cap for retries.
  5. Ensure non-retryable 4xx errors still fail fast.
  6. Prefer a solution that keeps the workflow green for temporary upstream throttling during scheduled runs. If appropriate based on the code structure, treat persistent 429/rate-limit conditions as a no-op/skip rather than a hard failure, but do not hide genuine logic bugs.
  7. Update or add tests if the repository has an appropriate test location/pattern for this code.
  8. Keep the change focused; avoid unrelated workflow churn.

Acceptance criteria:

  • The gallery agent no longer fails immediately on a single HTTP 429 from Hugging Face.
  • Retries/backoff are implemented in code rather than by blindly rerunning the whole workflow.
  • The resulting PR explains the root cause and the chosen mitigation.

Please open a pull request with the fix against master.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: Please find a solution for this failing job. Use the logs, job definition, and any referenced files where the failure occurred. Keep your response focused on the solution and include code suggestions when appropriate.

Assistant: The failure is in the Run gallery agent step of .github/workflows/gallery-agent.yaml at gallery-agent.yaml#L126-L136, where the workflow executes:

- name: Run gallery agent
  env:
    SEARCH_TERM: ${{ github.event.inputs.search_term || 'GGUF' }}
    LIMIT: ${{ github.event.inputs.limit || '15' }}
    QUANTIZATION: ${{ github.event.inputs.quantization || 'Q4_K_M' }}
    MAX_MODELS: ${{ github.event.inputs.max_models || '1' }}
    EXTRA_SKIP_URLS: ${{ steps.open_prs.outputs.urls }}
  run: |
    export GALLERY_INDEX_PATH=$PWD/gallery/index.yaml
    go run ./.github/gallery-agent

Root cause

The actual error in the logs is:

Loaded 1427 existing gallery entries
Searching for trending models on HuggingFace...
Error fetching models: failed to fetch models. Status code: 429
exit status 1

A 429 means the Hugging Face API rate-limited the job.
So this is not a codegen/protobuf/setup-go issue, and not the tar cache warning either — that restore warning is noisy but non-fatal. The job fails because the gallery agent exits immediately on a temporary upstream rate limit.

Solution

Make the gallery agent resilient to transient 429s by adding:

  1. retry with exponential backoff
  2. special handling for Retry-After
  3. non-fatal behavior when no data can be fetched after retries for scheduled runs

Recommended code change

In the code that fetches Hugging Face models, wrap the HTTP call with retries. The exact file wasn’t included in the retrieved context, but the failing code is the Hugging Face model fetch path used by go run ./.github/gallery-agent.

Use a pattern like this:

package main

import (
	"errors"
	"fmt"
	"io"
	"math"
	"net/http"
	"strconv"
	"time"
)

func fetchWithRetry(url string) ([]byte, error) {
	client := &http.Client{
		Timeout: 30 * time.Second,
	}

	var lastErr error
	maxAttempts := 6

	for attempt := 1; attempt <= maxAttempts; attempt++ {
		resp, err := client.Get(url)
		if err != nil {
			lastErr = err
		} else {
			defer resp.Body.Close()

			if resp.StatusCode >= 200 && resp.StatusCode < 300 {
				return io.ReadAll(resp.Body)
			}

			if resp.StatusCode == http.StatusTooManyRequests {
				wait := retryDelay(res...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

Copilot AI changed the title [WIP] Fix failing GitHub Actions job for Gallery Agent workflow Harden gallery-agent Hugging Face fetches against transient rate limiting Jun 5, 2026
Copilot finished work on behalf of mudler June 5, 2026 17:27
Copilot AI requested a review from mudler June 5, 2026 17:27
@mudler mudler marked this pull request as ready for review June 5, 2026 19:02
Comment thread pkg/huggingface-api/client.go Fixed
Comment thread pkg/huggingface-api/client.go Fixed
@mudler

mudler commented Jun 5, 2026

Copy link
Copy Markdown
Owner

@copilot fix linting issues

Copilot AI commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

@copilot fix linting issues

Fixed the lint failures in commit b098cc0a by handling resp.Body.Close() errors in pkg/huggingface-api/client.go and checking the w.Write error in pkg/huggingface-api/client_test.go. I also re-ran focused tests and golangci-lint for pkg/huggingface-api/....

Copilot finished work on behalf of mudler June 5, 2026 20:26
@mudler mudler merged commit 352b7ec into master Jun 5, 2026
58 checks passed
@mudler mudler deleted the copilot/fix-gallery-agent-workflow branch June 5, 2026 21:43
@localai-bot localai-bot added the enhancement New feature or request label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants