Skip to content

feat(copilot): dynamic model catalog from /models API#73216

Closed
MapleRecall wants to merge 1 commit into
openclaw:mainfrom
MapleRecall:pr/copilot-dynamic-models
Closed

feat(copilot): dynamic model catalog from /models API#73216
MapleRecall wants to merge 1 commit into
openclaw:mainfrom
MapleRecall:pr/copilot-dynamic-models

Conversation

@MapleRecall

@MapleRecall MapleRecall commented Apr 28, 2026

Copy link
Copy Markdown

Summary

  • Problem: The Copilot extension uses a static model table with conservative defaults (128K context, 8K max output). Models like gpt-5.5 (400K context, 128K output) or claude-opus-4.7 (200K context, 32K output) are either capped incorrectly or missing entirely. Thinking levels (e.g. xhigh) are hardcoded to a fixed list of model IDs.
  • Why it matters: Users get wrong context window limits, missing models in /models list, and incorrect thinking level options — all of which degrade the Copilot experience without any visible error.
  • What changed: Fetch the model list from the Copilot /models API at runtime. Three hooks (catalog.run, augmentModelCatalog, prepareDynamicModel) cooperate to populate capability-aware model definitions. A shared fetchAndCacheModels helper with in-flight deduplication ensures the API is called at most once per gateway lifecycle.
  • What did NOT change (scope boundary): Auth flows, stream wrappers, replay policy, embedding provider, and the resolveDynamicModel catch-all fallback are untouched.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Related: N/A (no existing issue)

Root Cause (if applicable)

N/A — this is a new feature.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/github-copilot/models-mapping.test.ts, extensions/github-copilot/thinking.test.ts
  • Scenario the test should lock in: API response mapping (transport API resolution, vision detection, reasoning flag, deduplication, filtering) and thinking profile generation from API capabilities.
  • Why this is the smallest reliable guardrail: The mapping logic is pure functions with no I/O — unit tests cover all edge cases. The fetch + caching layer relies on existing Copilot token infrastructure already covered by contract tests.
  • If no new test is added, why not: 35 new tests added.

User-visible / Behavior Changes

  • /models list github-copilot now shows all models available on the user's Copilot plan (fetched from API), not just a hardcoded subset.
  • Context window and max output tokens are accurate per-model instead of a flat 128K/8K default.
  • Thinking levels (xhigh, adaptive) are determined from API capabilities instead of a hardcoded model ID list.
  • Vision support is set per-model based on API data.

Diagram (if applicable)

Before:
[startup] -> hardcoded DEFAULT_MODEL_IDS -> models: [] in catalog -> resolveDynamicModel fallback (128K/8K defaults)

After:
[startup] -> catalog.run / augmentModelCatalog -> fetchAndCacheModels -> Copilot /models API
           -> accurate contextWindow, maxTokens, vision, thinking per model
[first model use] -> prepareDynamicModel -> fetchAndCacheModels (if not already cached)

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No — reuses existing resolveCopilotApiToken infrastructure
  • New/changed network calls? Yes — one new GET request to {copilotBaseUrl}/models during startup
    • Risk: API token is sent in Authorization header to the same Copilot API endpoint already used for chat completions and embeddings. No new trust boundary.
    • Mitigation: 30s timeout, graceful fallback to empty model list on failure.
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Windows 11
  • Runtime/container: Node.js v24.14.0
  • Model/provider: GitHub Copilot (device login auth profile)
  • Integration/channel: Telegram
  • Relevant config: plugins.entries.github-copilot.enabled: true, auth via device login profile (no env vars)

Steps

  1. Configure GitHub Copilot via device login (openclaw doctor)
  2. Start gateway
  3. Run /models list github-copilot

Expected

All models from the user's Copilot plan are listed with accurate context windows and capabilities.

Actual

Before: Only hardcoded default models shown (10 models, all with 128K context).
After: All API-returned models shown (e.g. 20+ models with correct per-model limits).

Evidence

  • 35 new unit tests passing (models-mapping + thinking)
  • All 91 extension tests passing
  • Local gateway testing confirmed /models list github-copilot shows API-sourced models with correct context windows

Human Verification (required)

  • Verified scenarios: Gateway startup with device-login auth, /models list shows dynamic models, model switching works with correct context windows
  • Edge cases checked: API timeout (falls back to empty), no auth profile (returns null), duplicate model entries (deduplicated)
  • What you did not verify: CI/CD pipeline (relying on automated checks), non-Copilot provider interaction

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: Copilot /models API is unreachable or returns unexpected format
    • Mitigation: All three hooks have try/catch with graceful fallback. resolveDynamicModel catch-all still creates synthetic model definitions with conservative defaults. API call has 30s timeout.
  • Risk: API returns models the user's plan doesn't actually support
    • Mitigation: The Copilot API already scopes the model list to the user's plan. If a model is listed but not usable, the API will return an error at request time (same as before).

@greptile-apps

greptile-apps Bot commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the hardcoded Copilot model table with live data from the /models API, introducing models-api.ts, models-mapping.ts, and thinking.ts as new modules. Three plugin hooks (catalog.run, augmentModelCatalog, prepareDynamicModel) cooperate to populate capability-aware model definitions at runtime, with resolveDynamicModel retained as the conservative fallback.

Confidence Score: 4/5

Safe to merge with minor deduplication inconsistency and a misleading fallback branch; no data loss or incorrect model behavior observed.

All findings are P2. The most notable is that catalog.run bypasses fetchAndCacheModels, violating the stated at-most-one-call guarantee and risking a duplicate parallel API request during startup. The _copilotCapabilities always-object issue and the minimal level injection are style/correctness concerns that don't produce wrong user-visible behavior today.

extensions/github-copilot/index.ts (catalog.run deduplication gap), extensions/github-copilot/thinking.ts (minimal level injection)

Comments Outside Diff (1)

  1. extensions/github-copilot/thinking.ts, line 919-921 (link)

    P2 minimal thinking level injected for all models regardless of API support

    When reasoningEffort is non-empty, minimal is unconditionally prepended to the levels list, even though the Copilot API never lists "minimal" in a model's reasoning_effort array. Meanwhile compat.supportedReasoningEfforts (set in mapCopilotApiModel) only contains the raw API values. This creates a mismatch: the UI presents minimal as a selectable level, but if OpenClaw passes "minimal" verbatim to the Copilot API as the reasoning_effort parameter, the request will be rejected or ignored. If minimal is intentionally mapped to a different API value (e.g. low) before sending, that mapping should be explicit and documented here.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: extensions/github-copilot/thinking.ts
    Line: 919-921
    
    Comment:
    **`minimal` thinking level injected for all models regardless of API support**
    
    When `reasoningEffort` is non-empty, `minimal` is unconditionally prepended to the levels list, even though the Copilot API never lists `"minimal"` in a model's `reasoning_effort` array. Meanwhile `compat.supportedReasoningEfforts` (set in `mapCopilotApiModel`) only contains the raw API values. This creates a mismatch: the UI presents `minimal` as a selectable level, but if OpenClaw passes `"minimal"` verbatim to the Copilot API as the `reasoning_effort` parameter, the request will be rejected or ignored. If `minimal` is intentionally mapped to a different API value (e.g. `low`) before sending, that mapping should be explicit and documented here.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/github-copilot/index.ts
Line: 399-413

Comment:
**`catalog.run` bypasses `fetchAndCacheModels`, breaking the deduplication guarantee**

`catalog.run` calls `fetchCopilotModels` directly and updates `cachedModelCapabilities` manually, while `augmentModelCatalog` uses `fetchAndCacheModels` which relies on `fetchModelsPromise` for in-flight deduplication. The PR description states "all three share a single `fetchAndCacheModels` helper with in-flight deduplication so the API is called at most once," but the `catalog.run` path is outside that deduplication boundary. If `catalog.run` and `augmentModelCatalog` execute concurrently — which is plausible since they are both registered hooks that may fire during startup — two parallel `/models` API requests will be made instead of one.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/github-copilot/index.ts
Line: 462-470

Comment:
**Fallback branch in `resolveThinkingProfile` is unreachable for cached models**

`_copilotCapabilities` is assigned as a plain object literal in `mapCopilotApiModel` regardless of whether `supports` is defined, so it is always a non-null object (e.g. `{ adaptiveThinking: undefined, maxThinkingBudget: undefined, ... }`). For any model found in `cachedModelCapabilities`, `cached._copilotCapabilities` will always be truthy, meaning the comment `// Fallback: basic heuristic when API data is not available` and `return resolveThinkingProfileFromCapabilities(undefined)` are dead code in this branch. The unintended consequence is that models with entirely empty API capability fields go through `resolveThinkingProfileFromCapabilities` with an all-`undefined` object, which hits the `else` path (no `reasoningEffort`) and returns the 5-level default — functionally correct, but the code structure misrepresents intent.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/github-copilot/thinking.ts
Line: 919-921

Comment:
**`minimal` thinking level injected for all models regardless of API support**

When `reasoningEffort` is non-empty, `minimal` is unconditionally prepended to the levels list, even though the Copilot API never lists `"minimal"` in a model's `reasoning_effort` array. Meanwhile `compat.supportedReasoningEfforts` (set in `mapCopilotApiModel`) only contains the raw API values. This creates a mismatch: the UI presents `minimal` as a selectable level, but if OpenClaw passes `"minimal"` verbatim to the Copilot API as the `reasoning_effort` parameter, the request will be rejected or ignored. If `minimal` is intentionally mapped to a different API value (e.g. `low`) before sending, that mapping should be explicit and documented here.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat(copilot): dynamic model catalog fro..." | Re-trigger Greptile

Comment thread extensions/github-copilot/index.ts Outdated
Comment thread extensions/github-copilot/index.ts
@MapleRecall MapleRecall force-pushed the pr/copilot-dynamic-models branch from c05dda7 to c498e39 Compare April 28, 2026 03:40
@MapleRecall

Copy link
Copy Markdown
Author

Thanks for the review! Addressed the first two findings in the force-push:

  1. catalog.run deduplication — catalog.run now delegates to the shared \ etchAndCacheModels\ helper, so all three hooks share the same in-flight deduplication boundary.

  2. _copilotCapabilities empty object — now only attached when the API returns a non-null \supports\ block, so the fallback path in
    esolveThinkingProfile\ is reachable as intended.

  3. \minimal\ thinking level — intentionally kept. \minimal\ is an OpenClaw-internal thinking level that gets mapped through the reasoning effort resolution layer before reaching the provider API. It is never passed verbatim to the Copilot API as a
    easoning_effort\ parameter. The existing upstream thinking infrastructure handles this mapping (see
    esolveOpenAIReasoningEffortForModel\ and the Anthropic budget resolver).

@MapleRecall MapleRecall force-pushed the pr/copilot-dynamic-models branch from c498e39 to 3f83279 Compare April 28, 2026 04:04
@openclaw-barnacle openclaw-barnacle Bot added the triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context. label Apr 28, 2026
@MapleRecall MapleRecall force-pushed the pr/copilot-dynamic-models branch 2 times, most recently from 427c1b3 to 341e086 Compare April 28, 2026 05:06
@clawsweeper

clawsweeper Bot commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Close as superseded: the live GitHub Copilot /models catalog work landed on current main through a later merged PR, while the remaining Claude 1m xhigh/fallback gaps are already tracked by narrower open follow-ups.

So I’m closing this here and keeping the remaining discussion on the canonical linked item.

Review details

Best possible solution:

Use the merged live-catalog implementation on current main, and keep the remaining Claude 1m thinking/static fallback work on the narrower canonical issue and follow-up PRs instead of rebasing this duplicate branch.

Do we have a high-confidence way to reproduce the issue?

Yes for the PR blockers: source inspection of PR head shows the unscoped cache, missing agentDir token lookup, and fallback xhigh regression. The original feature request is otherwise superseded by current-main code from the later merged PR.

Is this the best way to solve the issue?

No, this PR is no longer the best way to solve the issue. A later merged PR implements the live catalog more cleanly on main, and the remaining thinking/fallback behavior should be handled in the focused related issue and PRs.

Security review:

Security review cleared: The PR adds a token-bearing Copilot /models request through the existing SSRF-guarded fetch helper and does not add dependencies, workflows, lifecycle scripts, or broader secret access.

What I checked:

Likely related people:

  • efpiva: Authored the merged live Copilot /models catalog discovery PR that now covers the main catalog/context-window part of this PR. (role: introduced current superseding implementation; confidence: high; commits: 9c6c64c3d85d, 507c73e9b728, b4dfefc71b6d; files: extensions/github-copilot/index.ts, extensions/github-copilot/models.ts, extensions/github-copilot/models-defaults.ts)
  • Peter Steinberger: Landed the superseding PR via rebase, added the gpt-5.5 fallback metadata preservation commit, and dominates recent local history on the Copilot provider files. (role: recent area contributor and merger; confidence: high; commits: 058e1b4ee83d, d42b3c2b4a68, b39c7eece6ea; files: extensions/github-copilot/index.ts, extensions/github-copilot/models.ts, extensions/github-copilot/model-metadata.ts)
  • Vincent Koc: Recent history shows Copilot xhigh and provider hook work near the remaining thinking-profile surface. (role: adjacent Copilot thinking/catalog contributor; confidence: medium; commits: a70fdc88e083, b56517b0eede; files: extensions/github-copilot/index.ts, extensions/github-copilot/models.ts)
  • fuller-stack-dev: Merged the catch-all GitHub Copilot dynamic model resolver that this PR and the related issues build on. (role: introduced adjacent dynamic fallback behavior; confidence: high; commits: 5137a5130746; files: extensions/github-copilot/models.ts, extensions/github-copilot/models-defaults.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against ba625e2cff29.

@MapleRecall MapleRecall force-pushed the pr/copilot-dynamic-models branch 5 times, most recently from 0fe7cfa to 5edb5f5 Compare April 29, 2026 03:49
Fetch the model list from the Copilot /models API at runtime instead of
relying on hardcoded defaults. This gives OpenClaw accurate context
windows, output limits, vision support, and thinking capabilities for
every model on the user's Copilot plan — including new models added by
GitHub without requiring a code change.

Three hooks work together:

- **catalog.run** populates models when the plugin discovery pipeline
  invokes it during ensureOpenClawModelsJson.
- **augmentModelCatalog** adds the same models to the model catalog
  so they appear in /models list. This covers environments where
  the discovery pipeline does not reach late-order provider plugins
  (e.g. auth-profile-only setups without env-var credentials).
- **prepareDynamicModel** lazily fetches capabilities on the first
  model request when neither of the above paths has run yet.

All three share a single fetchAndCacheModels helper with in-flight
deduplication so the API is called at most once per gateway lifecycle.

Capability mapping:
- contextWindow / maxTokens from capabilities.limits
- vision from capabilities.supports.vision
- transport API from supported_endpoints
- reasoning flag from reasoning_effort / model family
- thinking profile levels from reasoning_effort and adaptive_thinking
  (replaces the hardcoded xhigh model list)
- compat.supportedReasoningEfforts from API data

Filtering:
- Internal router models (accounts/msft/routers/*) excluded
- Embedding models excluded (they have their own provider path)
- Duplicate model entries deduplicated (keeps richer capabilities)

The existing resolveDynamicModel catch-all remains as a fallback for
models not returned by the API, using conservative defaults (128K
context, 8K max output).

New modules:
- models-api.ts: Copilot /models API types and fetch
- models-mapping.ts: API response -> OpenClaw model mapping
- thinking.ts: API capabilities -> thinking profile resolution

Tests: 35 new tests (models-mapping + thinking).
@wyhgoodjob

Copy link
Copy Markdown

Friendly ping from a downstream user — no pressure 🙂

This PR (along with #72829) would fix a real pain point for those of us on claude-opus-4.7-1m-internal who currently get silently capped at 128K context and lose xhigh thinking (issue #72824). The dynamic /models discovery approach here is the cleaner long-term answer.

I saw the clawsweeper feedback around agent-scoped auth, cache scoping, and preserving the known-xhigh fallback when discovery fails — happy to test a rebased branch against a live Copilot account that exposes the 1m variant if it helps validate the end-to-end path.

Thanks for the work — totally understand if it has to wait, just wanted to surface that there's a user waiting downstream whenever you have cycles.

@clawsweeper clawsweeper Bot closed this May 12, 2026
@MapleRecall

Copy link
Copy Markdown
Author

Friendly ping from a downstream user — no pressure 🙂

This PR (along with #72829) would fix a real pain point for those of us on claude-opus-4.7-1m-internal who currently get silently capped at 128K context and lose xhigh thinking (issue #72824). The dynamic /models discovery approach here is the cleaner long-term answer.

I saw the clawsweeper feedback around agent-scoped auth, cache scoping, and preserving the known-xhigh fallback when discovery fails — happy to test a rebased branch against a live Copilot account that exposes the 1m variant if it helps validate the end-to-end path.

Thanks for the work — totally understand if it has to wait, just wanted to surface that there's a user waiting downstream whenever you have cycles.

@wyhgoodjob Thanks for your feedback! Actually, my original intention was to use the internal 1M-context model with xhigh effort, but I wasn't sure if I should mention that publicly.

​I wasn't familiar with OpenClaw's contribution guidelines, and since I hadn't received any human review or discussion beyond the automated agent's comments, I assumed this PR might just get buried like most of other PRs. Anyway, I'll find some time to rebase and reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex size: L triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants