Skip to content

[DGX Station][Inference] NIM-local onboard fails — docker pull nvcr.io/nim/nvidia/nemotron-3-nano:latest returns "error from registry: Incorrect Repository Format" #3885

@wangericnv

Description

@wangericnv

Description

Running NEMOCLAW_EXPERIMENTAL=1 nemoclaw onboard and selecting "Local NVIDIA NIM [experimental]" + model nvidia/nemotron-3-nano-30b-a3b, the wizard pulls many image layers successfully, then fails with error from registry: Incorrect Repository Format at the manifest stage. Onboard aborts; no sandbox is registered.

The image URL NemoClaw constructs is:
nvcr.io/nim/nvidia/nemotron-3-nano:latest

Compared to the model name in the menu (nvidia/nemotron-3-nano-30b-a3b), this is missing the size/quant suffix and uses a :latest tag. The actual NGC catalog uses size+version-specific tags, similar to other NIM images (e.g. nvcr.io/nim/nvidia/llama-3.1-nemotron-nano-8b-v1:1.0.0).

Same image URL pulled SUCCESSFULLY on DGX Spark (see related issue for T5937388 timeout) — so this appears to be a docker-client/registry interaction quirk on Station + Docker 29.5.0, not a uniformly missing image. Likely cause: the URL constructed by NemoClaw maps to a multi-arch manifest list that one docker client version can resolve and another cannot.

Environment

Device:        DGX Station #1 (galaxy-sku2-018)
OS:            Ubuntu 24.04.4 LTS
Architecture:  aarch64
Node.js:       v22.22.3
npm:           10.9.8
Docker:        29.5.0
OpenShell CLI: 0.0.39
NemoClaw:      v0.0.46
OpenClaw:      N/A (onboard failed before sandbox creation)

Steps to Reproduce

  1. Fresh nemoclaw v0.0.46 install on DGX Station (aarch64).
  2. Run:
    NEMOCLAW_EXPERIMENTAL=1 nemoclaw onboard --name nim-sb --fresh -y
  3. At [3/8] inference menu, choose 8 (Local NVIDIA NIM [experimental]).
  4. At model menu, choose 2 (nvidia/nemotron-3-nano-30b-a3b).
  5. Paste NGC API key (extracted from existing ~/.docker/config.json nvcr.io auth).
  6. Wait for image pull.

Expected Result

  • Image pull completes successfully.
  • NIM container starts.
  • Sandbox nim-sb becomes Ready.

Actual Result

Many layers pull successfully, then at the manifest verification stage:

…
c4aa5e88c597: Pull complete
e729ba9d10b3: Pull complete
…
error from registry: Incorrect Repository Format
Command failed (exit 1): docker pull nvcr.io/nim/nvidia/nemotron-3-nano:latest

Shell prompt returns; onboard aborted. No sandbox registered. nemoclaw list is empty.

Observed mismatch:

  • Menu lists model as nvidia/nemotron-3-nano-30b-a3b
  • NemoClaw constructs pull URL as nvcr.io/nim/nvidia/nemotron-3-nano:latest
  • The -30b-a3b suffix is dropped and tag is :latest rather than a specific version

On DGX Spark with the same image URL (different docker version 29.2.1), the same pull SUCCEEDS — and the container then fails at the health-probe stage (separate bug). So the URL itself resolves SOMEWHERE, but is fragile across docker client versions / registry edge behavior.


NVB#6194903

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: inferenceInference routing, serving, model selection, or outputsarea: packagingPackages, images, registries, installers, or distributionplatform: containerAffects Docker, containerd, Podman, or images
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions