Skip to content

Architecture: organize model-specific sandbox setup #3120

@ericksoa

Description

@ericksoa

Context

PR #3046 fixed a real Kimi K2.6/OpenClaw harness incompatibility in NemoClaw's managed inference.local path: Kimi can emit hostname; date; uptime as one combined exec tool call, while OpenClaw needs those calls represented as separate tool-call boundaries for persistence, replay, and tool-result correlation.

That PR intentionally kept the production fix narrow: a Kimi-specific OpenClaw plugin, Kimi-specific config compat, sandbox staging, and nightly e2e coverage. The fix is correct for the immediate issue, but it also exposes a pattern we should make coherent before we accumulate more one-off model interventions.

Goal

Introduce a clear model-specific setup architecture for sandbox/OpenClaw compatibility behavior. Model-specific interventions should be explicit, reviewable, testable, and centrally organized instead of spread across config generation, Dockerfile staging, and e2e scripts.

Proposed direction

  1. Add a registry directory, for example nemoclaw-blueprint/model-specific-setup/, with one manifest per targeted intervention.
    • Example: kimi-k2.6-managed-inference.json
  2. Keep manifests declarative.
    • Match fields: modelIds, providerKey, inferenceApi, baseUrl
    • Config effects: openclawCompat fields such as requiresToolResultName
    • Plugin effects: plugins.load entries for staged OpenClaw plugin paths
  3. Keep executable behavior in OpenClaw plugins, not in manifests.
    • The Kimi splitter should stay in openclaw-plugins/kimi-inference-compat/.
    • The manifest should only define when that plugin and compat behavior are active.
  4. Make scripts/generate-openclaw-config.py consume the registry and apply matching setup records.
    • This should replace hardcoded per-model constants and predicate functions over time.
  5. Stage model-specific plugin assets generically in the sandbox image.
    • Avoid a new Dockerfile COPY and chmod stanza for every future model-specific plugin.
  6. Add validation for the registry.
    • Exact match predicates only.
    • Known compat keys only.
    • No shell/code in manifests.
    • Every staged plugin path must exist.
    • Plugin files/directories get deterministic permissions.
  7. Organize e2e coverage by intervention.
    • Example case name: kimi-k2.6-managed-inference-exec-split
    • Keep the current Kimi trajectory acceptance checks as the first instance of this suite.

Acceptance criteria

  • Model-specific setup records live in one documented registry location.
  • generate-openclaw-config.py applies matching registry records without hardcoded Kimi-specific branching.
  • Sandbox image staging handles model-specific OpenClaw plugins generically.
  • The Kimi K2.6 behavior from fix: support reasoning models in the OpenClaw harness #3046 remains unchanged and covered by unit/config/build-context tests plus nightly e2e.
  • Adding a future model-specific intervention should require a manifest, optional plugin code, and focused tests, not edits scattered across unrelated build/config paths.

Non-goals

  • Do not broaden the Kimi splitter into a shell parser.
  • Do not make model-specific behavior global across providers or endpoints.
  • Do not move executable compatibility logic into JSON/YAML manifests.
  • Do not change the default model selection as part of this architecture work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: architectureArchitecture, design debt, major refactors, or maintainabilityarea: inferenceInference routing, serving, model selection, or outputsarea: sandboxOpenShell sandbox lifecycle, runtime, config, or recoveryintegration: openclawOpenClaw integration behavior
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions