Harden gallery-agent Hugging Face fetches against transient rate limiting#10187
Merged
Conversation
Copilot
AI
changed the title
[WIP] Fix failing GitHub Actions job for Gallery Agent workflow
Harden gallery-agent Hugging Face fetches against transient rate limiting
Jun 5, 2026
Owner
|
@copilot fix linting issues |
Contributor
Author
Fixed the lint failures in commit |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Gallery Agent runs were failing in
Run gallery agentwhen Hugging Face returned a transient429, causing the workflow to exit before any gallery logic could continue. This change makes model discovery resilient to upstream throttling while preserving fast failure for non-retryable client errors.Hugging Face client retry policy (
pkg/huggingface-api/client.go)SearchModelson transient failures:429and5xxRetry-Aftersupport (both delta-seconds and HTTP-date)ErrRateLimitedfor persistent 429 exhaustion4xxbehavior fail-fast (no retry noise on bad requests).Gallery agent behavior on persistent throttling (
.github/gallery-agent/main.go)GetTrendingfailures classified asErrRateLimitednow result in a graceful skip:ModelsAdded: 0Coverage for new behavior (
pkg/huggingface-api/client_test.go)Retry-Afterhandling on 4294xxErrRateLimited.Original prompt
Fix the failing GitHub Actions job for the Gallery Agent workflow in
mudler/LocalAI.Context:
2702776510479772009990.github/workflows/gallery-agent.yamle837921c2cd49ed91359e7b60dd277bc587148ecRun gallery agentstep, which runsgo run ./.github/gallery-agentwithGALLERY_INDEX_PATH=$PWD/gallery/index.yaml.Observed logs:
Loaded 1427 existing gallery entriesSearching for trending models on HuggingFace...Error fetching models: failed to fetch models. Status code: 429exit status 1/usr/bin/tarreporting manyCannot open: File existsmessages while restoring the Go toolchain cache, but the job continues past that, printsgo version go1.26.0 linux/amd64, and only fails later in the gallery-agent execution. Treat the 429 from Hugging Face as the root cause of the job failure.Goal:
Make the gallery agent resilient to transient Hugging Face rate limiting so scheduled runs do not fail just because the upstream API returns HTTP 429.
Requirements:
Retry-Afterwhen present.Acceptance criteria:
Please open a pull request with the fix against
master.The following is the prior conversation context from the user's chat exploration (may be truncated):
User: Please find a solution for this failing job. Use the logs, job definition, and any referenced files where the failure occurred. Keep your response focused on the solution and include code suggestions when appropriate.
Assistant: The failure is in the Run gallery agent step of
.github/workflows/gallery-agent.yamlatgallery-agent.yaml#L126-L136, where the workflow executes:Root cause
The actual error in the logs is:
A 429 means the Hugging Face API rate-limited the job.
So this is not a codegen/protobuf/setup-go issue, and not the tar cache warning either — that restore warning is noisy but non-fatal. The job fails because the gallery agent exits immediately on a temporary upstream rate limit.
Solution
Make the gallery agent resilient to transient 429s by adding:
Retry-AfterRecommended code change
In the code that fetches Hugging Face models, wrap the HTTP call with retries. The exact file wasn’t included in the retrieved context, but the failing code is the Hugging Face model fetch path used by
go run ./.github/gallery-agent.Use a pattern like this: