Skip to content

Phase 3: Inference Provider, Routing, and Config-Shape Audit Coverage (E2E audit-coverage) #4349

@jyaunches

Description

@jyaunches

Phase 3: Inference Provider, Routing, and Config-Shape Audit Coverage

Parent epic: #3588

Goal

Cover audited assertions for inference provider integration, routing, configuration shape, and inference switching. Generic /v1/models health probes cannot satisfy provider-specific routing rows. Kimi compatibility requires trajectory tool-call splitting evidence. Inference switch requires evidence for route state, registry/session state, config hash/shape, and a live post-switch request.

Audit rows in scope

AQ Phase Legacy subject Required coverage Boundary Planned scenario/assertion Fixtures/actions
AQ-019 3 test-bedrock-runtime-compatible-anthropic.sh Bedrock-compatible path covers adapter config, runtime requests, route behavior, and secret-redaction checks. provider/integration bedrock compatible assertions fake Bedrock endpoint
AQ-020 3 test-cloud-inference-e2e.sh provider routing rows Provider-specific routing cannot be satisfied by generic /v1/models health alone. provider/integration provider route assertions fake provider fixture
AQ-021 3 test-inference-routing.sh Inference routing proves route health and routed chat/completion behavior through the expected route. provider/integration inference routing assertions route fixture
AQ-022 3 test-openclaw-inference-switch.sh and test-hermes-inference-switch.sh Inference switch proves state update, config hash/change, and live post-switch request. durable state / provider inference switch assertions provider switch action
AQ-023 3 test-kimi-inference-compat.sh Kimi compatibility covers plugin wiring and Kimi-compatible models route. provider/integration kimi compatibility assertions fake Kimi endpoint
AQ-024 3 test-model-router-provider-routed-inference.sh Model-router provider path proves healthy endpoint and routed completion. provider/integration model-router assertions fake router endpoint
AQ-025 3 test-messaging-compatible-endpoint.sh and test-brave-search-e2e.sh Compatible endpoint / provider integration checks cover route-specific config and runtime behavior. provider/integration compatible endpoint assertions fake endpoint fixture

Required manifests to add

  • openclaw-bedrock-compatible-anthropic.yaml
  • hermes-bedrock-compatible-anthropic.yaml
  • openai-openclaw-routing.yaml
  • anthropic-openclaw-routing.yaml
  • compatible-openclaw-routing.yaml
  • compatible-openclaw-kimi.yaml
  • routed-nvidia-openclaw-model-router.yaml
  • openclaw-nvidia-inference-switch.yaml
  • cloud-nvidia-hermes.yaml
  • cloud-nvidia-hermes-inference-switch.yaml
  • telegram-compatible-openclaw.yaml
  • openclaw-runtime-overrides.yaml (only as image-entrypoint setup if useful)

Required fixtures / runtime actions

  • Fake Bedrock Runtime endpoint and host mapping fixture
  • Bedrock adapter state/log/token fixture
  • Fake compatible OpenAI / Kimi endpoints
  • Model-router health endpoint fixture (or live setup contract)
  • inference.set runtime action for OpenClaw and Hermes
  • Runtime override container/image fixture
  • Provider-key leak scan fixture
  • Brave: API secret gate, policy/config, direct-curl fixtures
  • Trajectory/session artifact reader

Required assertions

  • Provider route identity and provider registry shape
  • OpenClaw and Hermes config shape after compatible provider setup
  • Adapter health including fake endpoint, region, token hash
  • Authenticated Converse/ConverseStream or compatible traffic observed by fake endpoint
  • Sandbox route chat returns expected content
  • OpenClaw/Hermes runtime path returns expected content
  • Kimi trajectory splits combined tool calls into discrete hostname, date, uptime exec calls
  • Model-router healthy_count > 0 and routed completion returns model: nvidia-routed* plus content
  • Inference switch updates route/session/registry/config without unwanted restart where legacy checked it
  • Runtime overrides update config and config hash; reject invalid values
  • Brave: secret gate, policy preset, OpenClaw web search config, credential hygiene, agent search, direct-curl search/skip behavior

Validation scenarios — all must pass in PR workflow artifacts

Scenario 3.1 — Bedrock-compatible audit coverage covers adapter, configs, runtime, traffic, and leaks (Happy Path)

  • Given Bedrock-compatible Anthropic fake endpoint, host mapping, adapter token, OpenClaw/Hermes scenario contracts available.
  • When Onboarding and runtime assertions execute.
  • Then Health, registry, config shape, sandbox route chat, agent runtime chat, authenticated Converse/ConverseStream traffic, safe logs, and leak scans all pass.
  • Steps
    1. Filter rows AQ-019…AQ-025; start fake Bedrock endpoint/host mapping fixture; collect PR workflow evidence for provider-specific assertions.
    2. Run onboarding and Bedrock/provider assertion modules.
    3. Verify each row passes with stable assertion IDs/artifact paths; Bedrock includes config/runtime/traffic/leak evidence.

Scenario 3.2 — Generic /v1/models health cannot satisfy provider-specific routing audit rows (Sad Path)

  • Given Kimi, model-router, inference-switch, or runtime-overrides audit row has only a generic models-health assertion.
  • When Audit-coverage validation runs.
  • Then The audit row remains unresolved and completion is rejected.
  • Steps
    1. Filter rows AQ-020…AQ-025; inject provider-specific contract with only generic health assertion.
    2. Run audit-coverage validation in workflow.
    3. Verify rejection, rows unresolved, and named missing provider/config/trajectory assertions.

Scenario 3.3 — Inference switch proves state, config hash, and live post-switch request (Happy Path)

  • Given OpenClaw or Hermes inference switch action is declared.
  • When The action runs; assertions inspect route state, registry/session state, config hash/shape, live post-switch request.
  • Then Switch coverage is complete only if all surfaces match expected values and no unwanted restart occurred where legacy checked it.
  • Steps
    1. Filter row AQ-022; provision switchable provider fixtures; collect PR workflow evidence.
    2. Run inference.set and assertion modules.
    3. Verify AQ-022 evidence-complete only when route/session/registry/config/live-request assertion IDs all pass.

Acceptance criteria — issue is NOT DONE until ALL are true

  1. PR landed in test/e2e-scenario/ with manifests, fixtures, runtime actions, assertion modules above.
  2. PR CI passing:
    • All 7 audit-row scenarios run in workflow and emit PASS: markers
    • Generic /v1/models health is rejected as coverage for provider-specific routing
    • Kimi requires trajectory/tool-call assertion to pass
    • Inference switch requires route + registry/session + config + live request assertions all to pass
  3. Validation Scenarios 3.1–3.3 all pass in PR workflow artifacts.
  4. Audit work queue updated: AQ-019 through AQ-025 flipped to evidence-complete.
  5. Phase-specific validation gate (from spec): generic /v1/models cannot satisfy provider-specific routing or config-shape; Kimi not complete unless trajectory/tool-call semantics asserted; inference switch not complete unless route state, registry/session state, config hash/shape, and live post-switch request all covered.
  6. No-cheat gate: generic health probes alone cannot satisfy any provider-specific row.
  7. Secret gate: provider keys never in manifests; leak scan asserted on Bedrock fixture.
  8. PR description references AQ rows covered, links this issue.

Dependencies

Out of scope

  • Local Ollama/GPU (Phase 4)
  • Messaging matrix (Phase 5)
  • Hermes Discord/Slack flow (Phase 6)

Cross-phase acceptance gates (apply to every phase)

  1. Setup gate — scenario contract declares environment, manifest or no-manifest reason, fixtures, runtime actions, assertions.
  2. No-cheat gate — preview/dry-run output cannot mark an audit row complete.
  3. Boundary gate — assertions touch the same SUT boundary as the legacy script.
  4. Evidence gate — every assertion emits an evidence path and stable assertion ID.
  5. Secret gate — no manifest, log, report, or fixture file contains raw secrets.
  6. Cleanup gate — fixtures that mutate host or repo state have restore/cleanup logic and tests.
  7. Audit completeness gate — every assigned audit row has owner, planned scenario/assertion, phase assignment, evidence status.
  8. Phase completion gate — phase complete only when every assigned row has executable evidence (or independent audit amendment).
  9. Executable assertion gate — completed scenarios point to concrete suite steps / assertion modules, not pendingStep(...), TODOs, generic probes, or prose.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: e2eEnd-to-end tests, nightly failures, or validation infrastructurev0.0.63Release target
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions