fix: pass --model via CLIArgs to override user settings and experiment flights#263
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to make eval runs reproducible by forcing the embedded Copilot CLI to use the eval’s configured model (instead of user ~/.copilot/settings.json preferences or experiment flight defaults) by passing --model through copilot.ClientOptions.CLIArgs.
Changes:
- Add
--model <defaultModelID>to Copilot SDKClientOptions.CLIArgsduring engine construction. - Change session permission handling to always auto-approve permissions via
copilot.PermissionHandler.ApproveAll. - Add a new embedded Windows amd64 CLI artifact + license (
zcopilot_1.0.21_windows_amd64).
Show a summary per file
| File | Description |
|---|---|
| internal/execution/copilot.go | Passes --model via ClientOptions.CLIArgs and changes permission handler wiring during session creation/resume. |
| internal/embedded/zcopilot_1.0.21_windows_amd64.exe.zst | Adds a new embedded Copilot CLI Windows amd64 binary (Git LFS pointer). |
| internal/embedded/zcopilot_1.0.21_windows_amd64.exe.license | Adds the license text for the newly added embedded binary. |
Copilot's findings
- Files reviewed: 3/3 changed files
- Comments generated: 4
Head branch was pushed to by a user without write access
656ce52 to
861c10a
Compare
spboyer
left a comment
There was a problem hiding this comment.
This PR correctly routes the default model into the embedded CLI startup args, but the regression is not covered by an automated test.
Issues to address:
- internal/execution/copilot.go:74 - missing regression coverage for ClientOptions.CLIArgs model override
… comment Addresses review feedback on PR microsoft#263: - spboyer: add builder-level regression test asserting ClientOptions.CLIArgs carries [--model defaultModelID] when set, and is empty when no default model is configured. Captures the clientOptions via the existing NewCopilotClient hook so a future refactor cannot silently drop the startup override and still pass tests. - copilot-pull-request-reviewer: reword the inline comment so it does not hard-code the Unix '~/.copilot/settings.json' path; mention both Unix home and Windows %USERPROFILE% locations and clarify that SessionConfig.Model still wins for per-session overrides. Refs microsoft#262 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed the outstanding review feedback in ab2870a: @spboyer (CHANGES_REQUESTED) — added builder-level regression coverage for the
Both use the existing @copilot-pull-request-reviewer (line 66) — reworded the inline comment so it no longer hard-codes the Unix Local verification (Windows, Go 1.26.3):
|
…t flights The embedded CLI (v1.0.46 via SDK v1.0.0-beta.4) reads ~/.copilot/settings.json at startup and uses the user's personal model preference as its default. The Copilot experiment flight system can also assign a default model per-account (e.g. copilot_cli_opus_1m_default_model). Both override SessionConfig.Model, causing evals to run on unintended models (e.g. Opus 4.6 1M with high reasoning instead of the configured Sonnet 4.5), leading to 15+ minute task times instead of 30 seconds. Fix: pass --model <defaultModelID> in ClientOptions.CLIArgs so the CLI arg-level override takes precedence over settings.json and experiment flights. Also adds --yolo to the prompt grader's CLIArgs so the judge session auto-approves set_waza_grade_pass/fail tool permissions without going through the RPC permission handshake. Fixes microsoft#262
… comment Addresses review feedback on PR microsoft#263: - spboyer: add builder-level regression test asserting ClientOptions.CLIArgs carries [--model defaultModelID] when set, and is empty when no default model is configured. Captures the clientOptions via the existing NewCopilotClient hook so a future refactor cannot silently drop the startup override and still pass tests. - copilot-pull-request-reviewer: reword the inline comment so it does not hard-code the Unix '~/.copilot/settings.json' path; mention both Unix home and Windows %USERPROFILE% locations and clarify that SessionConfig.Model still wins for per-session overrides. Refs microsoft#262 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ab2870a to
07b83fc
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer
left a comment
There was a problem hiding this comment.
Clean — no issues found. LGTM.
I re-reviewed the rebased diff, including the shared-client CLIArgs path and regression coverage. The previously raised comments are addressed and resolved.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #263 +/- ##
=======================================
Coverage ? 75.47%
=======================================
Files ? 158
Lines ? 18483
Branches ? 0
=======================================
Hits ? 13950
Misses ? 3555
Partials ? 978
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer
left a comment
There was a problem hiding this comment.
LGTM after rebase and follow-up fixes (CLIArgs-keyed shared client, shutdown race fix, slice equality in tests). All checks green.
) (#306) * fix: skip --model CLI startup arg when BYOK provider is configured (#305) PR #263 added --model <defaultModelID> to copilot.ClientOptions.CLIArgs so the embedded Copilot CLI honors the eval-configured model. However, the CLI validates that startup --model against the GitHub Copilot catalog BEFORE the per-session ProviderConfig from #240 is applied. For BYOK / custom provider model IDs that are not in the Copilot catalog (e.g. minimax-m2.7), startup fails with 'Model "..." is not available'. Suppress the startup --model arg whenever providerFromEnv() reports an enabled custom provider. SessionConfig.Model + SessionConfig.Provider are still passed per session via Execute, so the SDK selects the right model against the custom provider without pre-validating against the GitHub catalog. Normal GitHub Copilot models retain the #263 behavior. Adds a regression test covering the BYOK + non-empty defaultModelID case, and hardens the existing CLIArgs tests against environments that already have COPILOT_BASE_URL set. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor(test): properly unset COPILOT_* env vars in clearCustomProviderEnv Replace t.Setenv(name, "") with os.Unsetenv + t.Cleanup so the helper actually removes variables from the environment rather than setting them to empty strings. This matches the comment ("unsets") and avoids any potential confusion for future readers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(test): ignore os.Setenv/Unsetenv return values in clearCustomProviderEnv Satisfies errcheck in golangci-lint v2.10.1; these calls don't realistically fail in tests and t.Cleanup ensures restoration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Shayne Boyer <spboyer@users.noreply.github.com>
Summary
Pass
--model <defaultModelID>viaClientOptions.CLIArgsso the embedded CLI uses the eval's configured model instead of the user's personal preferences.Fixes #262
Problem
The embedded CLI (v1.0.46) reads
~/.copilot/settings.jsonat startup and the Copilot experiment flight system can assign a default model per-account (e.g.copilot_cli_opus_1m_default_model). Both overrideSessionConfig.Model, causing evals to run on unintended models.Impact: A task configured for
claude-sonnet-4.5(30s) silently runs onclaude-opus-4.6-1mwith high reasoning (15+ min) with no indication to the user.Fix
Pass
--modelas a CLI startup arg inClientOptions.CLIArgs, which takes precedence over settings.json and experiment flights:Testing
Verified on Windows 11:
go build ./cmd/wazapasses