test(gateway): add small model live profile#88882
Conversation
|
Codex review: needs maintainer review before merge. Reviewed June 6, 2026, 10:10 AM ET / 14:10 UTC. Summary PR surface: Tests +165, Docs +4. Total +169 across 2 files. Reproducibility: not applicable. This PR adds a live-test harness mode rather than fixing a reproduced product bug. Source inspection confirms current main lacks the gateway small selector and the PR supplies config-level verification for the new path. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land after maintainer review accepts the current config-level proof or adds one credentialed small-model gateway profile run, while keeping the docs aligned with the harness behavior. Do we have a high-confidence way to reproduce the issue? Not applicable; this PR adds a live-test harness mode rather than fixing a reproduced product bug. Source inspection confirms current main lacks the gateway small selector and the PR supplies config-level verification for the new path. Is this the best way to solve the issue? Yes, this is a reasonable owner-boundary fix: it reuses the existing direct small-model live helpers instead of creating a second model matrix. The remaining question is proof depth, not a clearly better implementation location. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 5b84ebfc56c9. Label changesLabel justifications:
Evidence reviewedPR surface: Tests +165, Docs +4. Total +169 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
4ffa9e3 to
36894bf
Compare
36894bf to
cd2b54c
Compare
cd2b54c to
353a444
Compare
|
Land-ready verification for head
|
Summary
OPENCLAW_LIVE_GATEWAY_MODELS=smallto the gateway live model profile harnessRelated: #86599
Verification
node scripts/run-vitest.mjs run --config test/vitest/vitest.live.config.ts --configLoader runner src/gateway/gateway-models.profiles.live.test.ts --reporter=dot— 82 passed, 2 skipped after rebase onto28550c3847cc2525f42b4ef6354f76877f53274bnode scripts/check-docs-mdx.mjs docs/help/testing-live.mdnode scripts/docs-list.jsnode_modules/.bin/oxfmt --check src/gateway/gateway-models.profiles.live.test.ts docs/help/testing-live.mdnode scripts/run-oxlint.mjs src/gateway/gateway-models.profiles.live.test.tsgit diff --check origin/main...HEADrun_37f274101856, leasecbx_bc7ffedf1c7c:env OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changedpassed with lanescoreTests, docsReal behavior proof
Behavior addressed: small-model live validation can now run through the full gateway+agent path instead of only the direct completion path.
Real environment tested: local macOS worktree for focused live-config tests; AWS Crabbox Linux remote changed gate for typecheck/lint/docs changed lanes.
Exact steps or command run after this patch:
node scripts/run-vitest.mjs run --config test/vitest/vitest.live.config.ts --configLoader runner src/gateway/gateway-models.profiles.live.test.ts --reporter=dot;node scripts/crabbox-wrapper.mjs run --provider aws --label live-gateway-small-models-88882-final --shell -- "env OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed".Evidence after fix: targeted live-config test passed with 82 tests passed and 2 live-provider tests skipped; remote changed gate passed
coreTests, docswith exit 0.Observed result after fix:
OPENCLAW_LIVE_GATEWAY_MODELS=smallis parsed as a curated small-model gateway sweep, gets the small provider/model cap, filters provider-scoped dynamic refs correctly, loads prioritized dynamic small refs, and uses small-model selection ordering.What was not tested: real live provider execution with small-model credentials; the local live-config run skips provider calls when live env is not enabled.