Skip to content

fix: pass --model via CLIArgs to override user settings and experiment flights#263

Merged
spboyer merged 6 commits into
microsoft:mainfrom
sebastienlevert:fix/model-override-cli-args
May 26, 2026
Merged

fix: pass --model via CLIArgs to override user settings and experiment flights#263
spboyer merged 6 commits into
microsoft:mainfrom
sebastienlevert:fix/model-override-cli-args

Conversation

@sebastienlevert

Copy link
Copy Markdown
Contributor

Summary

Pass --model <defaultModelID> via ClientOptions.CLIArgs so the embedded CLI uses the eval's configured model instead of the user's personal preferences.

Fixes #262

Problem

The embedded CLI (v1.0.46) reads ~/.copilot/settings.json at startup and the Copilot experiment flight system can assign a default model per-account (e.g. copilot_cli_opus_1m_default_model). Both override SessionConfig.Model, causing evals to run on unintended models.

Impact: A task configured for claude-sonnet-4.5 (30s) silently runs on claude-opus-4.6-1m with high reasoning (15+ min) with no indication to the user.

Fix

Pass --model as a CLI startup arg in ClientOptions.CLIArgs, which takes precedence over settings.json and experiment flights:

cliArgs := []string{}
if defaultModelID != "" {
    cliArgs = append(cliArgs, "--model", defaultModelID)
}
copilotOptions := &copilot.ClientOptions{
    CLIArgs: cliArgs,
    // ...
}

Testing

Verified on Windows 11:

  • Without fix: tasks take 15+ min (CLI uses Opus from settings.json)
  • With fix: tasks complete in ~30s (CLI uses configured Sonnet)
  • go build ./cmd/waza passes

@sebastienlevert sebastienlevert requested a review from spboyer as a code owner May 21, 2026 00:12
Copilot AI review requested due to automatic review settings May 21, 2026 00:12
@github-actions github-actions Bot enabled auto-merge (squash) May 21, 2026 00:12

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make eval runs reproducible by forcing the embedded Copilot CLI to use the eval’s configured model (instead of user ~/.copilot/settings.json preferences or experiment flight defaults) by passing --model through copilot.ClientOptions.CLIArgs.

Changes:

  • Add --model <defaultModelID> to Copilot SDK ClientOptions.CLIArgs during engine construction.
  • Change session permission handling to always auto-approve permissions via copilot.PermissionHandler.ApproveAll.
  • Add a new embedded Windows amd64 CLI artifact + license (zcopilot_1.0.21_windows_amd64).
Show a summary per file
File Description
internal/execution/copilot.go Passes --model via ClientOptions.CLIArgs and changes permission handler wiring during session creation/resume.
internal/embedded/zcopilot_1.0.21_windows_amd64.exe.zst Adds a new embedded Copilot CLI Windows amd64 binary (Git LFS pointer).
internal/embedded/zcopilot_1.0.21_windows_amd64.exe.license Adds the license text for the newly added embedded binary.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 4

Comment thread internal/execution/copilot.go
Comment thread internal/execution/copilot.go Outdated
Comment thread internal/embedded/zcopilot_1.0.21_windows_amd64.exe.zst Outdated
Comment thread internal/execution/copilot.go Outdated
auto-merge was automatically disabled May 21, 2026 00:36

Head branch was pushed to by a user without write access

@sebastienlevert sebastienlevert force-pushed the fix/model-override-cli-args branch from 656ce52 to 861c10a Compare May 21, 2026 00:36
@spboyer spboyer requested a review from Copilot May 21, 2026 20:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 1

Comment thread internal/execution/copilot.go Outdated

@spboyer spboyer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR correctly routes the default model into the embedded CLI startup args, but the regression is not covered by an automated test.

Issues to address:

  • internal/execution/copilot.go:74 - missing regression coverage for ClientOptions.CLIArgs model override

Comment thread internal/execution/copilot.go Outdated
sebastienlevert pushed a commit to sebastienlevert/waza that referenced this pull request May 25, 2026
… comment

Addresses review feedback on PR microsoft#263:
- spboyer: add builder-level regression test asserting ClientOptions.CLIArgs
  carries [--model defaultModelID] when set, and is empty when no default
  model is configured. Captures the clientOptions via the existing
  NewCopilotClient hook so a future refactor cannot silently drop the
  startup override and still pass tests.
- copilot-pull-request-reviewer: reword the inline comment so it does not
  hard-code the Unix '~/.copilot/settings.json' path; mention both Unix
  home and Windows %USERPROFILE% locations and clarify that
  SessionConfig.Model still wins for per-session overrides.

Refs microsoft#262

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sebastienlevert

Copy link
Copy Markdown
Contributor Author

Addressed the outstanding review feedback in ab2870a:

@spboyer (CHANGES_REQUESTED) — added builder-level regression coverage for the --model startup override in internal/execution/copilot_engine_test.go:

  • TestCopilotEngineBuilder_CLIArgsCarriesModel captures the copilot.ClientOptions passed to NewCopilotClient and asserts CLIArgs == ["--model", defaultModelID] when defaultModelID is set.
  • TestCopilotEngineBuilder_CLIArgsEmptyWhenNoDefaultModel asserts CLIArgs is empty when no default model is configured, preserving the embedded CLI's fallback behavior.

Both use the existing NewCopilotClient hook, so a future refactor that silently drops the startup override will fail these tests instead of regressing #262.

@copilot-pull-request-reviewer (line 66) — reworded the inline comment so it no longer hard-codes the Unix ~/.copilot/settings.json path. It now references the user's local Copilot settings.json and notes both the Unix home and Windows %USERPROFILE% locations, and clarifies that SessionConfig.Model still wins for per-session overrides via Execute.

Local verification (Windows, Go 1.26.3):

  • go build ./... — passes
  • go vet ./internal/execution/... — passes
  • go test ./internal/execution/... -count=1 — passes (both new tests green)

sebastienlevert and others added 2 commits May 26, 2026 10:55
…t flights

The embedded CLI (v1.0.46 via SDK v1.0.0-beta.4) reads ~/.copilot/settings.json
at startup and uses the user's personal model preference as its default. The
Copilot experiment flight system can also assign a default model per-account
(e.g. copilot_cli_opus_1m_default_model). Both override SessionConfig.Model,
causing evals to run on unintended models (e.g. Opus 4.6 1M with high reasoning
instead of the configured Sonnet 4.5), leading to 15+ minute task times instead
of 30 seconds.

Fix: pass --model <defaultModelID> in ClientOptions.CLIArgs so the CLI arg-level
override takes precedence over settings.json and experiment flights.

Also adds --yolo to the prompt grader's CLIArgs so the judge session auto-approves
set_waza_grade_pass/fail tool permissions without going through the RPC permission
handshake.

Fixes microsoft#262
… comment

Addresses review feedback on PR microsoft#263:
- spboyer: add builder-level regression test asserting ClientOptions.CLIArgs
  carries [--model defaultModelID] when set, and is empty when no default
  model is configured. Captures the clientOptions via the existing
  NewCopilotClient hook so a future refactor cannot silently drop the
  startup override and still pass tests.
- copilot-pull-request-reviewer: reword the inline comment so it does not
  hard-code the Unix '~/.copilot/settings.json' path; mention both Unix
  home and Windows %USERPROFILE% locations and clarify that
  SessionConfig.Model still wins for per-session overrides.

Refs microsoft#262

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 26, 2026 14:58
@spboyer spboyer force-pushed the fix/model-override-cli-args branch from ab2870a to 07b83fc Compare May 26, 2026 14:58
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 4/4 changed files
  • Comments generated: 1

Comment thread internal/execution/copilot.go

@spboyer spboyer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean — no issues found. LGTM.

I re-reviewed the rebased diff, including the shared-client CLIArgs path and regression coverage. The previously raised comments are addressed and resolved.

@codecov-commenter

codecov-commenter commented May 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.74468% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@84a5736). Learn more about missing BASE report.

Files with missing lines Patch % Lines
internal/execution/sdkclient.go 95.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #263   +/-   ##
=======================================
  Coverage        ?   75.47%           
=======================================
  Files           ?      158           
  Lines           ?    18483           
  Branches        ?        0           
=======================================
  Hits            ?    13950           
  Misses          ?     3555           
  Partials        ?      978           
Flag Coverage Δ
go-implementation 75.47% <95.74%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI added 2 commits May 26, 2026 11:08
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 26, 2026 15:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 4/4 changed files
  • Comments generated: 2

Comment thread internal/execution/sdkclient_test.go Outdated
Comment thread internal/execution/sdkclient_test.go Outdated
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@spboyer spboyer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after rebase and follow-up fixes (CLIArgs-keyed shared client, shutdown race fix, slice equality in tests). All checks green.

@spboyer spboyer merged commit 3564037 into microsoft:main May 26, 2026
7 checks passed
github-actions Bot pushed a commit that referenced this pull request Jun 1, 2026
) (#306)

* fix: skip --model CLI startup arg when BYOK provider is configured (#305)

PR #263 added --model <defaultModelID> to copilot.ClientOptions.CLIArgs so
the embedded Copilot CLI honors the eval-configured model. However, the
CLI validates that startup --model against the GitHub Copilot catalog
BEFORE the per-session ProviderConfig from #240 is applied. For BYOK /
custom provider model IDs that are not in the Copilot catalog (e.g.
minimax-m2.7), startup fails with 'Model "..." is not available'.

Suppress the startup --model arg whenever providerFromEnv() reports an
enabled custom provider. SessionConfig.Model + SessionConfig.Provider are
still passed per session via Execute, so the SDK selects the right model
against the custom provider without pre-validating against the GitHub
catalog. Normal GitHub Copilot models retain the #263 behavior.

Adds a regression test covering the BYOK + non-empty defaultModelID
case, and hardens the existing CLIArgs tests against environments that
already have COPILOT_BASE_URL set.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(test): properly unset COPILOT_* env vars in clearCustomProviderEnv

Replace t.Setenv(name, "") with os.Unsetenv + t.Cleanup so the helper
actually removes variables from the environment rather than setting them
to empty strings. This matches the comment ("unsets") and avoids any
potential confusion for future readers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(test): ignore os.Setenv/Unsetenv return values in clearCustomProviderEnv

Satisfies errcheck in golangci-lint v2.10.1; these calls don't realistically
fail in tests and t.Cleanup ensures restoration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Shayne Boyer <spboyer@users.noreply.github.com>
@spboyer spboyer mentioned this pull request Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: embedded CLI ignores SessionConfig.Model when user has model in settings.json or experiment flights

5 participants