Harden gallery-agent Hugging Face fetches against transient rate limiting by Copilot · Pull Request #10187 · mudler/LocalAI

Copilot · 2026-06-05T17:22:31Z

Gallery Agent runs were failing in Run gallery agent when Hugging Face returned a transient 429, causing the workflow to exit before any gallery logic could continue. This change makes model discovery resilient to upstream throttling while preserving fast failure for non-retryable client errors.

Hugging Face client retry policy (pkg/huggingface-api/client.go)
- Added bounded retry loop for SearchModels on transient failures:
  - retries on 429 and 5xx
  - exponential backoff with cap
  - Retry-After support (both delta-seconds and HTTP-date)
- Added typed sentinel error:
  - ErrRateLimited for persistent 429 exhaustion
- Kept non-retryable 4xx behavior fail-fast (no retry noise on bad requests).
Gallery agent behavior on persistent throttling (.github/gallery-agent/main.go)
- GetTrending failures classified as ErrRateLimited now result in a graceful skip:
  - write summary with ModelsAdded: 0
  - exit without failing the run
- Other errors still fail the step normally.
Coverage for new behavior (pkg/huggingface-api/client_test.go)
- Added tests for:
  - retry + Retry-After handling on 429
  - fast-fail for non-retryable 4xx
  - persistent 429 returning ErrRateLimited.

if errors.Is(err, hfapi.ErrRateLimited) {
    fmt.Printf("HuggingFace API is rate limited after retries, skipping this run: %v\n", err)
    writeSummary(AddedModelSummary{
        SearchTerm:   searchTerm,
        ModelsAdded:  0,
        Quantization: quantization,
    })
    return
}

Original prompt

Fix the failing GitHub Actions job for the Gallery Agent workflow in mudler/LocalAI.

Context:

Failing workflow run: 27027765104
Failing job: 79772009990
Workflow file: .github/workflows/gallery-agent.yaml
Commit where the failure was observed: e837921c2cd49ed91359e7b60dd277bc587148ec
The failure occurs in the Run gallery agent step, which runs go run ./.github/gallery-agent with GALLERY_INDEX_PATH=$PWD/gallery/index.yaml.

Observed logs:

The decisive failure is:
- Loaded 1427 existing gallery entries
- Searching for trending models on HuggingFace...
- Error fetching models: failed to fetch models. Status code: 429
- exit status 1
There is also an earlier cache restore warning from /usr/bin/tar reporting many Cannot open: File exists messages while restoring the Go toolchain cache, but the job continues past that, prints go version go1.26.0 linux/amd64, and only fails later in the gallery-agent execution. Treat the 429 from Hugging Face as the root cause of the job failure.

Goal:
Make the gallery agent resilient to transient Hugging Face rate limiting so scheduled runs do not fail just because the upstream API returns HTTP 429.

Requirements:

Find the gallery agent code responsible for fetching trending models from Hugging Face.
Implement retry logic for transient HTTP failures, especially 429 responses.
Honor Retry-After when present.
Use exponential backoff with a sensible cap for retries.
Ensure non-retryable 4xx errors still fail fast.
Prefer a solution that keeps the workflow green for temporary upstream throttling during scheduled runs. If appropriate based on the code structure, treat persistent 429/rate-limit conditions as a no-op/skip rather than a hard failure, but do not hide genuine logic bugs.
Update or add tests if the repository has an appropriate test location/pattern for this code.
Keep the change focused; avoid unrelated workflow churn.

Acceptance criteria:

The gallery agent no longer fails immediately on a single HTTP 429 from Hugging Face.
Retries/backoff are implemented in code rather than by blindly rerunning the whole workflow.
The resulting PR explains the root cause and the chosen mitigation.

Please open a pull request with the fix against master.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: Please find a solution for this failing job. Use the logs, job definition, and any referenced files where the failure occurred. Keep your response focused on the solution and include code suggestions when appropriate.

Assistant: The failure is in the Run gallery agent step of .github/workflows/gallery-agent.yaml at gallery-agent.yaml#L126-L136, where the workflow executes:

- name: Run gallery agent
  env:
    SEARCH_TERM: ${{ github.event.inputs.search_term || 'GGUF' }}
    LIMIT: ${{ github.event.inputs.limit || '15' }}
    QUANTIZATION: ${{ github.event.inputs.quantization || 'Q4_K_M' }}
    MAX_MODELS: ${{ github.event.inputs.max_models || '1' }}
    EXTRA_SKIP_URLS: ${{ steps.open_prs.outputs.urls }}
  run: |
    export GALLERY_INDEX_PATH=$PWD/gallery/index.yaml
    go run ./.github/gallery-agent

Root cause

The actual error in the logs is:

Loaded 1427 existing gallery entries
Searching for trending models on HuggingFace...
Error fetching models: failed to fetch models. Status code: 429
exit status 1

A 429 means the Hugging Face API rate-limited the job.
So this is not a codegen/protobuf/setup-go issue, and not the tar cache warning either — that restore warning is noisy but non-fatal. The job fails because the gallery agent exits immediately on a temporary upstream rate limit.

Solution

Make the gallery agent resilient to transient 429s by adding:

retry with exponential backoff
special handling for Retry-After
non-fatal behavior when no data can be fetched after retries for scheduled runs

Recommended code change

In the code that fetches Hugging Face models, wrap the HTTP call with retries. The exact file wasn’t included in the retrieved context, but the failing code is the Hugging Face model fetch path used by go run ./.github/gallery-agent.

Use a pattern like this:

package main

import (
	"errors"
	"fmt"
	"io"
	"math"
	"net/http"
	"strconv"
	"time"
)

func fetchWithRetry(url string) ([]byte, error) {
	client := &http.Client{
		Timeout: 30 * time.Second,
	}

	var lastErr error
	maxAttempts := 6

	for attempt := 1; attempt <= maxAttempts; attempt++ {
		resp, err := client.Get(url)
		if err != nil {
			lastErr = err
		} else {
			defer resp.Body.Close()

			if resp.StatusCode >= 200 && resp.StatusCode < 300 {
				return io.ReadAll(resp.Body)
			}

			if resp.StatusCode == http.StatusTooManyRequests {
				wait := retryDelay(res...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

mudler · 2026-06-05T20:18:21Z

@copilot fix linting issues

Copilot · 2026-06-05T20:25:57Z

@copilot fix linting issues

Fixed the lint failures in commit b098cc0a by handling resp.Body.Close() errors in pkg/huggingface-api/client.go and checking the w.Write error in pkg/huggingface-api/client_test.go. I also re-ran focused tests and golangci-lint for pkg/huggingface-api/....

Initial plan

376ff1b

Copilot AI assigned Copilot and mudler Jun 5, 2026

Copilot started work on behalf of mudler June 5, 2026 17:22 View session

fix: retry HuggingFace trending fetch on transient rate limits

b61a974

Copilot AI changed the title ~~[WIP] Fix failing GitHub Actions job for Gallery Agent workflow~~ Harden gallery-agent Hugging Face fetches against transient rate limiting Jun 5, 2026

Copilot finished work on behalf of mudler June 5, 2026 17:27

Copilot AI requested a review from mudler June 5, 2026 17:27

mudler marked this pull request as ready for review June 5, 2026 19:02

github-advanced-security AI found potential problems Jun 5, 2026

View reviewed changes

Comment thread pkg/huggingface-api/client.go Fixed

Comment thread pkg/huggingface-api/client.go Fixed

Copilot started work on behalf of mudler June 5, 2026 20:18 View session

fix: handle body close/write errors in huggingface retry paths

b098cc0

Copilot finished work on behalf of mudler June 5, 2026 20:26

mudler merged commit 352b7ec into master Jun 5, 2026
58 checks passed

mudler deleted the copilot/fix-gallery-agent-workflow branch June 5, 2026 21:43

localai-bot added the enhancement New feature or request label Jun 10, 2026

BrewTestBot mentioned this pull request Jun 10, 2026

localai 4.4.0 Homebrew/homebrew-core#287347

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Harden gallery-agent Hugging Face fetches against transient rate limiting#10187

Harden gallery-agent Hugging Face fetches against transient rate limiting#10187
mudler merged 3 commits into
masterfrom
copilot/fix-gallery-agent-workflow

Copilot AI commented Jun 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

mudler commented Jun 5, 2026

Uh oh!

Copilot AI commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Copilot AI commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause

Solution

Recommended code change

Uh oh!

Uh oh!

Uh oh!

mudler commented Jun 5, 2026

Uh oh!

Copilot AI commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Jun 5, 2026 •

edited

Loading