Skip to content

[Agent] Dynamic agent harness registry#4325

Merged
hanouticelina merged 8 commits into
mainfrom
fetch-agent-harnesses-from-hub
Jun 11, 2026
Merged

[Agent] Dynamic agent harness registry#4325
hanouticelina merged 8 commits into
mainfrom
fetch-agent-harnesses-from-hub

Conversation

@Wauplin

@Wauplin Wauplin commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Follow-up to huggingface/huggingface.js#2209 (registry in @huggingface/tasks) and the new GET /api/agent-harnesses endpoint (see https://github.com/huggingface-internal/moon-landing/pull/18454).

Until now, agent harness detection relied on a hardcoded list in _detect_agent.py. This moves the source of truth to the Hub so the list can be updated without a client release.

With this PR:

  • detect_agent now fetches the registry from /api/agent-harnesses, at most once a day, cached on disk (new AGENT_HARNESSES_PATH, same spirit as CHECK_FOR_UPDATE_DONE_PATH). The result is also cached in-process so hot paths (is_agent() for ANSI coloring / CLI output mode) don't pay repeatedly.
  • Implements the envVars matching from the registry: "*" (set to any value) and exact value match.
  • No hardcoded fallback: detection is best-effort. If the registry can't be fetched (offline, HTTP error, …) and there's no cached copy, it just reports "no agent". Errors are ignored: detection never makes a process fail.

🤖 Generated with Claude Code


Note

Low Risk
Best-effort telemetry with short HTTP timeout, graceful degradation when offline or on errors, and no change to auth or data paths; main behavioral shift is no hardcoded fallback when the registry is unavailable.

Overview
Agent harness detection no longer uses a hardcoded list in _detect_agent.py. It loads harness definitions from the Hub GET /api/agent-harnesses, with a 24-hour on-disk cache at AGENT_HARNESSES_PATH and an in-process cache so repeated is_agent() / CLI output checks stay cheap.

Matching follows the registry: per-harness envVars with "*" (any non-empty value) or exact values, plus standard AI_AGENT / AGENT ids (case-insensitive for unknown vs known). First match wins by registry order. If fetch fails and there is no cache, detection reports no agent; errors are swallowed so detection never breaks the process. Offline mode skips the fetch.

Tests pin an empty registry in conftest, add test_utils_detect_agent.py for detection and cache behavior, and adjust CLI output tests to override the registry when simulating agents.

Reviewed by Cursor Bugbot for commit c087fb5. Bugbot is set up for automated code reviews on this repo. Configure here.

Move the source of truth for AI agent harness detection from a hardcoded
list to the Hub's `/api/agent-harnesses` endpoint, fetched at most once a
day and cached locally. No hardcoded fallback: detection is best-effort and
degrades to "no agent" when the registry can't be fetched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread tests/conftest.py
Comment thread src/huggingface_hub/utils/_detect_agent.py
@Wauplin Wauplin marked this pull request as draft June 8, 2026 13:27
@bot-ci-comment

bot-ci-comment Bot commented Jun 8, 2026

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…x agent test

- detect_agent: use `or [...]` instead of `.get(default)` so explicit
  `null` values in the (cached/fetched) registry degrade gracefully
  instead of raising.
- test_auto_resolves_to_agent: pin a registry that recognizes AI_AGENT
  (conftest pins an empty registry by default).
- Add a test covering the malformed-registry (null values) case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# No harness matched but a standard var is set => unrecognized agent.
for var in standard_vars:
if os.environ.get(var, "").strip():
return "unknown"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought this could potentially be the name of the harness (name if name in _KNOWN_AGENTS else "unknown") but not sure if this is actually used in real life

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I'll keep the name if name in _KNOWN_AGENTS else "unknown" logic which preexisted this PR. I did not review yet^^

Wauplin and others added 2 commits June 8, 2026 16:24
Add `Registry` / `HarnessInfo` TypedDicts and type the registry
functions accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/huggingface_hub/utils/_detect_agent.py
Comment thread src/huggingface_hub/utils/_detect_agent.py
@Wauplin Wauplin changed the title [Detection] Fetch agent harness registry from the Hub [Agent] Fetch agent harness registry from the Hub Jun 8, 2026
@Wauplin Wauplin changed the title [Agent] Fetch agent harness registry from the Hub [Agent] Dynamic agent harness registry Jun 8, 2026
@Wauplin Wauplin marked this pull request as ready for review June 8, 2026 15:08
@Wauplin Wauplin requested a review from hanouticelina June 8, 2026 15:49

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c087fb5. Configure here.

if value := os.environ.get(var, "").strip().lower():
if value in lowercased_harnesses:
return value
return "unknown"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Malformed registry crashes detection

Medium Severity

detect_agent assumes the cached or fetched registry is always a dict with a dict-shaped harnesses and dict-shaped envVars. JSON that is valid but wrongly typed (for example a list harnesses or non-mapping envVars) is stored and can make detect_agent raise, despite the module stating detection must never fail the process.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c087fb5. Configure here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a problem

@hanouticelina hanouticelina left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment thread src/huggingface_hub/utils/_detect_agent.py Outdated
Comment thread src/huggingface_hub/utils/_detect_agent.py Outdated
Comment thread src/huggingface_hub/utils/_detect_agent.py Outdated
Co-authored-by: célina <hanouticelina@gmail.com>
@hanouticelina hanouticelina merged commit 69ef7d7 into main Jun 11, 2026
25 checks passed
@hanouticelina hanouticelina deleted the fetch-agent-harnesses-from-hub branch June 11, 2026 09:10
@huggingface-hub-bot

Copy link
Copy Markdown
Contributor

This PR has been shipped as part of the v1.19.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants