Skip to content

Fix/copilot default models gpt 5 5 opus 1m#72829

Open
iot2edge wants to merge 2 commits into
openclaw:mainfrom
iot2edge:fix/copilot-default-models-gpt-5-5-opus-1m
Open

Fix/copilot default models gpt 5 5 opus 1m#72829
iot2edge wants to merge 2 commits into
openclaw:mainfrom
iot2edge:fix/copilot-default-models-gpt-5-5-opus-1m

Conversation

@iot2edge

Copy link
Copy Markdown
Contributor

Fixes #72805

Summary

  • Problem: GitHub Copilot's /models endpoint advertises gpt-5.5 and claude-opus-4.7-1m-internal, but extensions/github-copilot/models-defaults.ts:DEFAULT_MODEL_IDS is missing both. Users have to hand-edit ~/.openclaw/openclaw.json and run openclaw config validate to use them.
  • Why it matters: Two newly-shipped Copilot models are unreachable out of the box. Additionally, even when a user adds the 1M Opus variant manually, buildCopilotModelDefinition resolves its contextWindow to the default 128_000, so the runtime would auto-compact long before the model actually runs out of context — silently capping a 1M-context session at ~13% of its real capacity.
  • What changed: Added gpt-5.5 and claude-opus-4.7-1m-internal to DEFAULT_MODEL_IDS. Added a small helper resolveCopilotContextWindow(id) that returns 1_000_000 when the id ends with -1m or -1m-internal and 128_000 otherwise; buildCopilotModelDefinition now uses it instead of the bare DEFAULT_CONTEXT_WINDOW constant. Added 5 unit tests in extensions/github-copilot/models.test.ts.
  • What did NOT change (scope boundary): Did not add the broader GPT-5.x family suggested in the issue (gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, gpt-5.2-codex) — left for a follow-up so this PR matches the explicit "at minimum" ask in GitHub Copilot provider: add gpt-5.5 and claude-opus-4.7-1m-internal to default model list #72805. cost, reasoning, input modalities, maxTokens, the resolveCopilotTransportApi routing, and the existing getDefaultCopilotModelIds() mutable-copy contract are unchanged.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause

  • Root cause: DEFAULT_MODEL_IDS is a hardcoded as const array in extensions/github-copilot/models-defaults.ts. It was last seeded against an older Copilot model availability snapshot and was never updated when Copilot rolled out gpt-5.5 and the 1M-context Opus variant. Separately, buildCopilotModelDefinition applies a single DEFAULT_CONTEXT_WINDOW = 128_000 to every model id, so even users who added the 1M model manually would get an incorrect context window.
  • Missing detection / guardrail: there is no automated reconciliation between Copilot's /models endpoint and DEFAULT_MODEL_IDS. The list is updated by hand whenever a maintainer notices a new model has shipped. There was also no per-id context-window resolver, so -1m variants silently inherited the 128k default.
  • Contributing context (if known): The file's existing comment already acknowledges the brittleness — "Copilot model ids vary by plan/org and can change. We keep this list intentionally broad; if a model isn't available Copilot will return an error and users can remove it from their config." — which made the static list acceptable for unavailable ids, but did not address the contextWindow correctness issue when a present-but-large-context id is added.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/github-copilot/models.test.ts
  • Scenario the test should lock in:
    1. getDefaultCopilotModelIds() includes "gpt-5.5".
    2. getDefaultCopilotModelIds() includes "claude-opus-4.7-1m-internal".
    3. buildCopilotModelDefinition("claude-opus-4.7-1m-internal").contextWindow === 1_000_000.
    4. buildCopilotModelDefinition("some-future-model-1m").contextWindow === 1_000_000 (covers the bare -1m suffix).
    5. buildCopilotModelDefinition("claude-opus-4.7").contextWindow === 128_000 (regression guard so non--1m ids stay on the default).
  • Why this is the smallest reliable guardrail: DEFAULT_MODEL_IDS and buildCopilotModelDefinition are both pure helpers with no I/O — unit tests are deterministic and pinpoint regressions to either the array or the resolver. The existing test file already uses the same .toContain(...) style for the other Anthropic-family Copilot ids, so the new tests slot in next to their cousins.
  • Existing test that already covers this (if any): None for gpt-5.5, claude-opus-4.7-1m-internal, or 1M context resolution. The closest existing coverage is the cluster of getDefaultCopilotModelIds().toContain(...) checks for the other Claude models, which doesn't constrain the new ids.
  • If no new test is added, why not: N/A — 5 new tests added.

User-visible / Behavior Changes

  • gpt-5.5 and claude-opus-4.7-1m-internal now appear in openclaw models list --provider github-copilot without manual config.
  • agents.defaults.models entries that target either model id (or any future Copilot id ending in -1m / -1m-internal) get a contextWindow of 1_000_000 instead of 128_000, so auto-compaction triggers at the model's actual limit rather than ~13% of it.
  • No defaults were removed or renamed. Existing user configs are untouched.

Diagram

Copilot model id resolution

Before:
buildCopilotModelDefinition("claude-opus-4.7-1m-internal")
  -> contextWindow: 128_000        (bug: misconfigured 1M model)

buildCopilotModelDefinition("gpt-4o")
  -> contextWindow: 128_000        (correct)

After:
buildCopilotModelDefinition("claude-opus-4.7-1m-internal")
  -> resolveCopilotContextWindow("claude-opus-4.7-1m-internal")
       endsWith "-1m-internal" -> true
  -> contextWindow: 1_000_000      (correct)

buildCopilotModelDefinition("gpt-4o")
  -> resolveCopilotContextWindow("gpt-4o")
       endsWith "-1m" / "-1m-internal" -> false
  -> contextWindow: 128_000        (unchanged)

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux 6.8.0-110-generic (Ubuntu)
  • Runtime/container: Node 22+ via pnpm 10.33 (repo engines.node >=22.14.0)
  • Model/provider: GitHub Copilot via OAuth (Copilot-Integration-Id: vscode-chat)
  • Integration/channel (if any): N/A — provider plugin only
  • Relevant config (redacted):
    // Before this PR, users had to add manually:
    {
      "agents": {
        "defaults": {
          "models": {
            "github-copilot/gpt-5.5": {},
            "github-copilot/claude-opus-4.7-1m-internal": { "params": { "thinking": "medium" } }
          }
        }
      }
    }

Steps

  1. Authenticate the github-copilot provider with a Copilot subscription that exposes gpt-5.5 and claude-opus-4.7-1m-internal.
  2. Run openclaw models list --provider github-copilot.
  3. Inspect the resolved contextWindow for claude-opus-4.7-1m-internal (e.g. via openclaw models inspect github-copilot/claude-opus-4.7-1m-internal --json or by triggering a long session and observing where auto-compaction fires).

Expected

  • Both gpt-5.5 and claude-opus-4.7-1m-internal appear in the listing without any user-side config edits.
  • claude-opus-4.7-1m-internal reports contextWindow: 1000000.
  • gpt-5.5 reports contextWindow: 128000.
  • Models that aren't actually granted on the user's plan still surface the existing Copilot 404 Not Found / availability error from the backend, per the file's existing "intentionally broad" policy.

Actual (before fix)

  • Neither model appears in openclaw models list --provider github-copilot.
  • Adding claude-opus-4.7-1m-internal manually resolves it with contextWindow: 128000, causing premature auto-compaction at ~13% of the model's real capacity.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)
$ pnpm test extensions/github-copilot/models.test.ts
 Test Files  1 passed (1)
      Tests  29 passed (29)   # 24 existing + 5 new (2 inclusion, 3 contextWindow)

pnpm check:changed is green: conflict markers, typecheck extensions, typecheck extension tests, lint extensions, runtime import cycles all pass.

Human Verification (required)

  • Verified scenarios:
    • Targeted vitest run for extensions/github-copilot/models.test.ts (29/29 pass locally).
    • Full pnpm check:changed gate (extensions + extension-tests + lint + import-cycles, all green).
    • Re-read extensions/github-copilot/models.ts to confirm resolveCopilotTransportApi already classifies gpt-5.x ids correctly via existing test coverage at line 206 (["gpt-5.4-codex", "gpt-5.5-codex", "gpt-5.4-codex-mini", "gpt-5.3-codex"]), so adding gpt-5.5 to the default list does not need any transport-routing change.
  • Edge cases checked:
    • Bare -1m suffix (covered by some-future-model-1m test) — future-proofs the resolver if Copilot ships a non-internal *-1m variant.
    • Non--1m ids still resolve to 128_000 — explicit regression guard so the resolver doesn't accidentally widen for unrelated ids.
    • getDefaultCopilotModelIds() mutable-copy invariant unchanged — existing test still passes after the array grew by 2.
  • What you did not verify:
    • Live Copilot API call against https://api.githubcopilot.com/models with my own token. I do not have a Copilot subscription with claude-opus-4.7-1m-internal access, so I have not exercised the end-to-end path of "auth → list → spawn a session on the new model and watch a long context fill past 128k tokens." The issue reporter has the live evidence from https://api.githubcopilot.com/models, and the file's existing policy already documents that unavailable models surface the Copilot backend error.
    • Image-input handling for claude-opus-4.7-1m-internal. The default definition keeps input: ["text", "image"] since the resolver only changes contextWindow. If the 1M variant has different modality support, that is out of scope for this PR.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

(Both will be checked once review activity lands. Currently no bot review conversations on this PR.)

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: claude-opus-4.7-1m-internal is listed by Copilot as "Internal only" and may not be granted on every Copilot plan/org.
    • Mitigation: the existing file comment already covers this — "if a model isn't available Copilot will return an error and users can remove it from their config." This PR follows the same pattern that ships every other plan-restricted entry already in DEFAULT_MODEL_IDS (e.g. o1, o3-mini).
  • Risk: a future Copilot model id could end with -1m without actually offering 1M context, which would set the wrong contextWindow.
    • Mitigation: the suffix is currently a strong signal (-1m / -1m-internal are explicit naming conventions for context-window variants). If Copilot ever ships a -1m id that means something different, the helper is one small place to update. Adding a generic naming-collision guardrail now would be speculative.
  • Risk: contributors adding more -1m variants in the future might forget that the resolver depends on the suffix.
    • Mitigation: the resolveCopilotContextWindow helper is colocated with DEFAULT_MODEL_IDS and the suffix-test test cases, so any future addition will see both at once. No separate documentation needed.

@openclaw-barnacle openclaw-barnacle Bot added channel: telegram Channel integration: telegram size: S labels Apr 27, 2026
@greptile-apps

greptile-apps Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds gpt-5.5 and claude-opus-4.7-1m-internal to the Copilot default model list and introduces resolveCopilotContextWindow to correctly assign a 1M context window to -1m/-1m-internal model ids instead of the global 128k default. A companion fix in model-buttons.ts improves the display label for nested provider-local ids (e.g. OpenRouter).

Confidence Score: 4/5

Safe to merge; the core fix is correct and all tests pass.

All changes are well-scoped and logically correct. The only finding is a P2: the dynamic forward-compat catch-all in models.ts still hardcodes 128k for manually-added -1m ids outside DEFAULT_MODEL_IDS, so the gap partially persists for user-configured models not covered by this PR.

extensions/github-copilot/models.ts — catch-all resolveCopilotForwardCompatModel at line 80 still applies DEFAULT_CONTEXT_WINDOW unconditionally.

Comments Outside Diff (1)

  1. extensions/github-copilot/models.ts, line 80 (link)

    P2 Catch-all path still hardcodes 128k for manual -1m models

    The resolveCopilotForwardCompatModel catch-all (line 80) still applies DEFAULT_CONTEXT_WINDOW = 128_000 unconditionally, so any future -1m model a user adds manually to openclaw.json — without it being in DEFAULT_MODEL_IDS — will silently get the same 128k context window that this PR is fixing for the registered ids. The fix in models-defaults.ts only covers the buildCopilotModelDefinition path; the dynamic forward-compat path is not updated.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: extensions/github-copilot/models.ts
    Line: 80
    
    Comment:
    **Catch-all path still hardcodes 128k for manual `-1m` models**
    
    The `resolveCopilotForwardCompatModel` catch-all (line 80) still applies `DEFAULT_CONTEXT_WINDOW = 128_000` unconditionally, so any future `-1m` model a user adds manually to `openclaw.json` — without it being in `DEFAULT_MODEL_IDS` — will silently get the same 128k context window that this PR is fixing for the registered ids. The fix in `models-defaults.ts` only covers the `buildCopilotModelDefinition` path; the dynamic forward-compat path is not updated.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/github-copilot/models.ts
Line: 80

Comment:
**Catch-all path still hardcodes 128k for manual `-1m` models**

The `resolveCopilotForwardCompatModel` catch-all (line 80) still applies `DEFAULT_CONTEXT_WINDOW = 128_000` unconditionally, so any future `-1m` model a user adds manually to `openclaw.json` — without it being in `DEFAULT_MODEL_IDS` — will silently get the same 128k context window that this PR is fixing for the registered ids. The fix in `models-defaults.ts` only covers the `buildCopilotModelDefinition` path; the dynamic forward-compat path is not updated.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(github-copilot): add gpt-5.5 and cla..." | Re-trigger Greptile

@iot2edge

Copy link
Copy Markdown
Contributor Author

CI failures look unrelated to this change. The failing test is src/tts/provider-registry.test.ts (3 cases in the unit-fast shard), which is the speech-provider capability registry — a different surface from extensions/github-copilot/. The mock returns 1 fake demo-speech provider but the runtime returns all 14 bundled speech providers (azure-speech, elevenlabs, google, ...), suggesting the test mock seam was bypassed.

Most likely caused by e792f96a84 ("fix: cache capability provider manifest ids", landed today on main), which changed how capability provider manifest ids are resolved. The mock now no longer intercepts the right seam.

checks-node-channels and checks-node-core failing in ~4s are the shard-aggregator gates picking up that same downstream failure.

Will push a small follow-up commit addressing @greptile-apps' P2 finding shortly.

@greptile-apps

greptile-apps Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

That analysis tracks. The provider-registry.test.ts failure is clearly on a different surface — TTS capability registry vs. Copilot model defaults — and the timing with e792f96a84 landing same-day is a strong signal. The shard-aggregator gates failing in ~4s is consistent with them just forwarding that upstream failure rather than finding anything in this PR's code paths.

Your changes (models-defaults.ts + resolveCopilotContextWindow + the 5 new unit tests) have no overlap with capability provider manifest resolution, so this PR is clean from a CI perspective. Once the follow-up commit lands to fix the mock seam, the gates should clear.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules and .greptile/config.

@iot2edge

Copy link
Copy Markdown
Contributor Author

Fair catch — addressed in :

  • Moved resolveCopilotContextWindow to models.ts (alongside the catch-all that needed it; this also avoided a new import cycle since models-defaults.ts already imports from models.ts).
  • resolveCopilotForwardCompatModel now uses the helper, so manually-added -1m / -1m-internal ids that aren't in DEFAULT_MODEL_IDS get the correct 1M context window via the synthetic catch-all.
  • Added 3 tests covering the catch-all path: -1m-internal, plain -1m, and a non--1m regression guard.

The codex-template clone path was already correct — it inherits contextWindow from the registry template.

@iot2edge iot2edge force-pushed the fix/copilot-default-models-gpt-5-5-opus-1m branch 2 times, most recently from 73aafb6 to 53b7e20 Compare April 28, 2026 01:28
@openclaw-barnacle openclaw-barnacle Bot removed the channel: telegram Channel integration: telegram label Apr 28, 2026
@iot2edge iot2edge force-pushed the fix/copilot-default-models-gpt-5-5-opus-1m branch from 53b7e20 to bfdee7c Compare April 29, 2026 02:45
@clawsweeper

clawsweeper Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed May 30, 2026, 12:54 AM ET / 04:54 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

PR surface: Source +8, Tests +41. Total +49 across 3 files.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] No close action taken because the review did not complete.

Maintainer options:

  1. Decide the mitigation before merge
    Retry the Codex review after fixing the execution failure.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P1] Review did not complete, so no work-lane recommendation was made.
Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model gpt-5.5, reasoning high; reviewed against b9933b2ec119.

Label changes

Label justifications:

  • rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
Evidence reviewed

PR surface:

Source +8, Tests +41. Total +49 across 3 files.

View PR surface stats
Area Files Added Removed Net
Source 2 12 4 +8
Tests 1 41 0 +41
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 3 53 4 +49

What I checked:

  • failure reason: codex execution failed.
  • codex failure detail: Codex review failed for this PR with exit 1.
  • codex stdout: Per-item Codex failure; continuing with the rest of the shard.

Likely related people:

  • unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

iot2edge added 2 commits May 7, 2026 13:54
…efault models

Both are advertised by the Copilot backend (api.githubcopilot.com/models)
but were missing from DEFAULT_MODEL_IDS, forcing users to add them by hand.

Also detect -1m / -1m-internal ids in buildCopilotModelDefinition so the
1M-context Opus variant gets contextWindow: 1_000_000 instead of the
default 128_000, which would otherwise trigger early compaction.

Fixes openclaw#72805
… catch-all

resolveCopilotForwardCompatModel previously hardcoded contextWindow to
128_000 in its synthetic catch-all, so manually-added -1m/-1m-internal
model ids that aren't in DEFAULT_MODEL_IDS would still be misconfigured.

Move resolveCopilotContextWindow from models-defaults.ts to models.ts
(avoids a new circular import since models-defaults.ts already imports
from models.ts), and use it in the catch-all. Add 3 regression tests
covering -1m-internal, plain -1m, and a non-1m guard.

Addresses Greptile P2 finding on #<your-PR-number>.
@iot2edge iot2edge force-pushed the fix/copilot-default-models-gpt-5-5-opus-1m branch from 1b538aa to a9fdcbc Compare May 7, 2026 11:54
@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 7, 2026
@wyhgoodjob

Copy link
Copy Markdown

Friendly nudge from a downstream user — no pressure 🙂

Just wanted to flag that this PR (and the sibling #73216) would unblock real-world use of claude-opus-4.7-1m-internal for folks like me on 2026.4.29, where the model still falls back to 128K context. Totally understand the needs-real-behavior-proof gate; if it helps, I'm happy to run the resulting build against my live Copilot account that has the 1m variant exposed and report back with /models inspect + a long-context session trace.

Thanks for the work here — no rush, just letting you know there's a user waiting downstream whenever you have cycles.

@iot2edge iot2edge marked this pull request as ready for review May 11, 2026 08:55
@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label May 30, 2026
@clawsweeper clawsweeper Bot added the rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. label May 30, 2026
@barnacle-openclaw barnacle-openclaw Bot removed the stale Marked as stale due to inactivity label May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GitHub Copilot provider: add gpt-5.5 and claude-opus-4.7-1m-internal to default model list

2 participants