Summary
The embedded Copilot CLI ignores SessionConfig.Model when the user has a model preference in ~/.copilot/settings.json or is assigned to a Copilot experiment flight (e.g. copilot_cli_opus_1m_default_model). This causes evals to run on unintended models.
Impact
When a user's settings.json contains:
{
"model": "claude-opus-4.6-1m",
"effortLevel": "high"
}
All waza eval tasks run on Opus 4.6 1M with high reasoning regardless of config.model: claude-sonnet-4.5 in eval.yaml. A task that completes in 30 seconds with Sonnet takes 15+ minutes with Opus, and the user has no indication the wrong model is being used.
Root Cause
The embedded CLI (v1.0.46 from SDK v1.0.0-beta.4) resolves its default model at process startup from:
- Copilot experiment flights (server-assigned per-account)
~/.copilot/settings.json model field
These are set as config_model in the CLI's telemetry before any SDK session is created. When waza creates a session via SessionConfig{Model: "claude-sonnet-4.5"}, the CLI's startup-level model takes precedence.
Workaround
Passing --model <model> via ClientOptions.CLIArgs forces the CLI arg-level override, which takes precedence over both settings.json and experiment flights:
copilotOptions := &copilot.ClientOptions{
CLIArgs: []string{"--model", defaultModelID},
// ...
}
Expected Behavior
SessionConfig.Model should be the authoritative model for that session, overriding any user-level defaults from settings.json or experiment flights. The current behavior makes eval results non-reproducible across users with different settings.
Environment
- copilot-sdk/go v1.0.0-beta.4
- Embedded CLI v1.0.46
- Windows 11, Copilot CLI 1.0.51-2
Summary
The embedded Copilot CLI ignores
SessionConfig.Modelwhen the user has a model preference in~/.copilot/settings.jsonor is assigned to a Copilot experiment flight (e.g.copilot_cli_opus_1m_default_model). This causes evals to run on unintended models.Impact
When a user's
settings.jsoncontains:{ "model": "claude-opus-4.6-1m", "effortLevel": "high" }All waza eval tasks run on Opus 4.6 1M with high reasoning regardless of
config.model: claude-sonnet-4.5ineval.yaml. A task that completes in 30 seconds with Sonnet takes 15+ minutes with Opus, and the user has no indication the wrong model is being used.Root Cause
The embedded CLI (v1.0.46 from SDK v1.0.0-beta.4) resolves its default model at process startup from:
~/.copilot/settings.jsonmodelfieldThese are set as
config_modelin the CLI's telemetry before any SDK session is created. When waza creates a session viaSessionConfig{Model: "claude-sonnet-4.5"}, the CLI's startup-level model takes precedence.Workaround
Passing
--model <model>viaClientOptions.CLIArgsforces the CLI arg-level override, which takes precedence over both settings.json and experiment flights:Expected Behavior
SessionConfig.Modelshould be the authoritative model for that session, overriding any user-level defaults from settings.json or experiment flights. The current behavior makes eval results non-reproducible across users with different settings.Environment