Skip to content

test(gateway): add small model live profile#88882

Merged
vincentkoc merged 2 commits into
mainfrom
live-gateway-small-models-20260601
Jun 6, 2026
Merged

test(gateway): add small model live profile#88882
vincentkoc merged 2 commits into
mainfrom
live-gateway-small-models-20260601

Conversation

@vincentkoc

@vincentkoc vincentkoc commented Jun 1, 2026

Copy link
Copy Markdown
Member

Summary

  • add OPENCLAW_LIVE_GATEWAY_MODELS=small to the gateway live model profile harness
  • reuse the direct-model small allowlist, provider scoping, dynamic prioritized refs, and small-model cap for full gateway+agent validation
  • keep provider-scoped small sweeps honest by filtering dynamic refs and counted candidates through the requested provider filter
  • document the small-model gateway recipe next to the existing direct-model recipe

Related: #86599

Verification

  • node scripts/run-vitest.mjs run --config test/vitest/vitest.live.config.ts --configLoader runner src/gateway/gateway-models.profiles.live.test.ts --reporter=dot — 82 passed, 2 skipped after rebase onto 28550c3847cc2525f42b4ef6354f76877f53274b
  • node scripts/check-docs-mdx.mjs docs/help/testing-live.md
  • node scripts/docs-list.js
  • node_modules/.bin/oxfmt --check src/gateway/gateway-models.profiles.live.test.ts docs/help/testing-live.md
  • node scripts/run-oxlint.mjs src/gateway/gateway-models.profiles.live.test.ts
  • git diff --check origin/main...HEAD
  • AWS Crabbox run_37f274101856, lease cbx_bc7ffedf1c7c: env OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed passed with lanes coreTests, docs

Real behavior proof

Behavior addressed: small-model live validation can now run through the full gateway+agent path instead of only the direct completion path.
Real environment tested: local macOS worktree for focused live-config tests; AWS Crabbox Linux remote changed gate for typecheck/lint/docs changed lanes.
Exact steps or command run after this patch: node scripts/run-vitest.mjs run --config test/vitest/vitest.live.config.ts --configLoader runner src/gateway/gateway-models.profiles.live.test.ts --reporter=dot; node scripts/crabbox-wrapper.mjs run --provider aws --label live-gateway-small-models-88882-final --shell -- "env OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed".
Evidence after fix: targeted live-config test passed with 82 tests passed and 2 live-provider tests skipped; remote changed gate passed coreTests, docs with exit 0.
Observed result after fix: OPENCLAW_LIVE_GATEWAY_MODELS=small is parsed as a curated small-model gateway sweep, gets the small provider/model cap, filters provider-scoped dynamic refs correctly, loads prioritized dynamic small refs, and uses small-model selection ordering.
What was not tested: real live provider execution with small-model credentials; the local live-config run skips provider calls when live env is not enabled.

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation gateway Gateway runtime labels Jun 1, 2026
@vincentkoc vincentkoc self-assigned this Jun 1, 2026
@openclaw-barnacle openclaw-barnacle Bot added size: S maintainer Maintainer-authored PR labels Jun 1, 2026
@clawsweeper

clawsweeper Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 6, 2026, 10:10 AM ET / 14:10 UTC.

Summary
The PR adds a documented OPENCLAW_LIVE_GATEWAY_MODELS=small mode to the gateway live-model profile harness, reusing the existing small-model selection helpers with provider-scoped dynamic refs.

PR surface: Tests +165, Docs +4. Total +169 across 2 files.

Reproducibility: not applicable. This PR adds a live-test harness mode rather than fixing a reproduced product bug. Source inspection confirms current main lacks the gateway small selector and the PR supplies config-level verification for the new path.

Review metrics: 1 noteworthy metric.

  • Live harness selector mode: 1 added. OPENCLAW_LIVE_GATEWAY_MODELS=small adds a documented operator-facing live-test selector value, so maintainers should notice the new test surface before merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🌊 off-meta tidepool
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Run one credentialed OPENCLAW_LIVE_GATEWAY_MODELS=small gateway profile against a representative small provider if maintainers want end-to-end live-provider proof before merge.

Risk before merge

  • [P1] The PR body explicitly says no credentialed OPENCLAW_LIVE_GATEWAY_MODELS=small provider run was performed, so maintainers need to decide whether config/selection-level proof is enough for this live-profile addition.

Maintainer options:

  1. Decide the mitigation before merge
    Land after maintainer review accepts the current config-level proof or adds one credentialed small-model gateway profile run, while keeping the docs aligned with the harness behavior.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P2] The PR is maintainer-labeled and needs human acceptance of the current proof depth or a request for credentialed live-provider proof; there is no narrow automated repair to queue.

Security
Cleared: The diff is limited to docs and live-test harness selection logic; it does not change dependencies, workflows, package scripts, secrets handling, or production credential paths.

Review details

Best possible solution:

Land after maintainer review accepts the current config-level proof or adds one credentialed small-model gateway profile run, while keeping the docs aligned with the harness behavior.

Do we have a high-confidence way to reproduce the issue?

Not applicable; this PR adds a live-test harness mode rather than fixing a reproduced product bug. Source inspection confirms current main lacks the gateway small selector and the PR supplies config-level verification for the new path.

Is this the best way to solve the issue?

Yes, this is a reasonable owner-boundary fix: it reuses the existing direct small-model live helpers instead of creating a second model matrix. The remaining question is proof depth, not a clearly better implementation location.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5b84ebfc56c9.

Label changes

Label justifications:

  • P3: This is a low-risk docs and live-test harness improvement with no production runtime behavior change.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🌊 off-meta tidepool and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external-contributor proof gate does not apply to this MEMBER maintainer-labeled PR; the body still records terminal-style config/test proof and Crabbox changed-gate proof, with credentialed provider execution left untested.
Evidence reviewed

PR surface:

Tests +165, Docs +4. Total +169 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 0 0 0 0
Tests 1 196 31 +165
Docs 1 5 1 +4
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 201 32 +169

What I checked:

  • PR branch implements small gateway mode: The PR head parses OPENCLAW_LIVE_GATEWAY_MODELS=small, threads useSmall through provider scoping, dynamic small-ref loading, wanted-model filtering, capped selection, and not-found skip behavior. (src/gateway/gateway-models.profiles.live.test.ts:3540, 353a4444b695)
  • PR branch adds focused harness tests: The PR head covers the small default cap, curated small-provider list, provider-filter intersection, filtered dynamic refs, and avoiding counts for small models outside a provider-scoped sweep. (src/gateway/gateway-models.profiles.live.test.ts:983, 353a4444b695)
  • Docs describe the new recipe: The PR head documents the new gateway selector next to the existing direct small-model recipe and updates the cap wording for modern/all and small gateway sweeps. Public docs: docs/help/testing-live.md. (docs/help/testing-live.md:113, 353a4444b695)
  • Current main does not already include the gateway small selector: Current main has the gateway model selector and the direct small-model recipe, but no OPENCLAW_LIVE_GATEWAY_MODELS=small docs or gateway useSmall path, so the PR is not obsolete on main. Public docs: docs/help/testing-live.md. (docs/help/testing-live.md:113, 5b84ebfc56c9)
  • Existing small-model contract is already shared by direct live tests: Current main already defines the curated small priority refs, small cap, isSmallLiveModelRef, and selectSmallLiveItems; the direct live-model profile uses those helpers for OPENCLAW_LIVE_MODELS=small. (src/agents/live-model-filter.ts:39, 5b84ebfc56c9)
  • Real behavior proof remains config-level: The PR body reports targeted live-config tests and an AWS Crabbox changed gate, while explicitly saying no credentialed small-model provider execution was run. (353a4444b695)

Likely related people:

  • joshavant: Blame and git show --stat tie the current gateway live profile file, testing-live docs, and small-model filter helper surface to fbaa5a6f0a...; the checkout is grafted there, so this is strong for current-main routing but not full upstream provenance. (role: introduced current harness/helpers in local history; confidence: medium; commits: fbaa5a6f0a57; files: src/gateway/gateway-models.profiles.live.test.ts, src/agents/live-model-filter.ts, docs/help/testing-live.md)
  • Vincent Koc: Current main includes recent live gateway maintenance by Vincent in 98f52dcc..., and this PR also extends that same live-profile harness area. (role: recent area contributor; confidence: high; commits: 98f52dcc00d2; files: src/gateway/gateway-models.profiles.live.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 1, 2026
@vincentkoc vincentkoc force-pushed the live-gateway-small-models-20260601 branch from 4ffa9e3 to 36894bf Compare June 1, 2026 06:23
@vincentkoc vincentkoc marked this pull request as ready for review June 1, 2026 06:23
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P3 Low-priority cleanup, docs, polish, ergonomics, or speculative work. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 1, 2026
@vincentkoc vincentkoc force-pushed the live-gateway-small-models-20260601 branch from 36894bf to cd2b54c Compare June 6, 2026 13:53
@vincentkoc vincentkoc force-pushed the live-gateway-small-models-20260601 branch from cd2b54c to 353a444 Compare June 6, 2026 14:01
@vincentkoc

Copy link
Copy Markdown
Member Author

Land-ready verification for head 353a4444b69522ab0c889da60766043d30a3b546.

  • Local focused proof: node scripts/run-vitest.mjs run --config test/vitest/vitest.live.config.ts src/gateway/gateway-models.profiles.live.test.ts -> 1 file passed, 87 tests passed, 2 skipped.
  • CI proof: GitHub Actions run 27064309683 is green; CodeQL run 27064309687 is green; OpenGrep PR diff run 27064309689 is green.
  • Maintainer decision: accepting config-level harness proof for the new OPENCLAW_LIVE_GATEWAY_MODELS=small selector; this PR adds live-test coverage surface and docs, not production runtime behavior.
  • Proof gap: no credentialed small-provider gateway smoke was run in this landing pass.

@vincentkoc vincentkoc merged commit 51488bf into main Jun 6, 2026
159 checks passed
@vincentkoc vincentkoc deleted the live-gateway-small-models-20260601 branch June 6, 2026 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation gateway Gateway runtime maintainer Maintainer-authored PR P3 Low-priority cleanup, docs, polish, ergonomics, or speculative work. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant