Allow `nemoclaw onboard` to override model input capability for multimodal models

### Problem Statement

During `nemoclaw onboard`, models discovered from local or self-hosted providers may be registered as text-only, even when the underlying model supports multimodal input such as text+image.

For example, when using `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning`, the model can be selected successfully during onboarding, but the generated OpenClaw model configuration may register it as:

<img width="1458" height="300" alt="Image" src="https://github.com/user-attachments/assets/6065b8f0-6041-4b8e-a567-34d5f5c0d29b" />

Screenshot after manual override showing the desired result:

![OpenClaw model listed as text+image](https://github.com/user-attachments/assets/24dbf601-d2ac-40a6-bee2-d2db61f55f7a)

### Proposed Design

`nemoclaw onboard` should provide a supported way to override the input capability of a selected model when provider-side modality detection is incomplete.

A possible design:

1. During model selection, after the user selects a model, show an input capability prompt:

```text
Selected model:
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning

Input capability:
  1. Text only
  2. Text + Image
```
### Alternatives Considered

1. Always default all discovered models to `text+image`.

This would simplify onboarding, but it may be unsafe or misleading for pure text models. Some models or providers may reject image input, so a manual override is safer than changing the default for all models.

2. Rely entirely on automatic provider metadata.

This is ideal when providers expose reliable modality metadata, but many OpenAI-compatible or self-hosted endpoints do not. In these cases, auto-discovery may only return the model id and context length, not whether image input is supported.

3. Manually edit generated OpenClaw configuration files after onboarding.

This works as a workaround, but it is not a good user experience. Users need to know where the generated model catalog is located, which fields to modify, and when to restart the gateway. It is also easy to lose changes after recreating or re-onboarding a sandbox.

4. Configure the image model after onboarding only.

Setting an image model after onboarding is not enough if the model catalog still declares the selected model as text-only. The model input capability itself also needs to be configurable.

### Category

enhancement: feature

### Checklist

- [x] I searched existing issues and this is not a duplicate
- [x] This is a design proposal, not a "please build this" request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow `nemoclaw onboard` to override model input capability for multimodal models #3850

Problem Statement

Proposed Design

Alternatives Considered

Category

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Allow nemoclaw onboard to override model input capability for multimodal models #3850

Description

Problem Statement

Proposed Design

Alternatives Considered

Category

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Allow `nemoclaw onboard` to override model input capability for multimodal models #3850