Skip to content

feat: add NVIDIA Nemotron (NemotronForCausalLM) loader to aprender::rosetta #1590

@noahgift

Description

@noahgift

Context

The cookbook architecture-demos spec tracks Nemotron as status: blocked. Issue #228 ("QA: Nemotron-Nano-4B largely unsupported (12.3% pass rate)", closed 2026-02-13) characterized the failure surface; this issue is the prerequisite loader work.

Family

  • Name: nemotron
  • Vendor: NVIDIA
  • HF architectures: NemotronForCausalLM
  • HF pattern: nvidia/Nemotron-* (e.g., nvidia/Nemotron-Nano-4B, nvidia/Nemotron-4-340B-Instruct)
  • Reference checkpoints: nvidia/Nemotron-Nano-4B, nvidia/Nemotron-4-15B

Acceptance criteria

  • contracts/model-families/nemotron.yaml exists with size_variants for at least 4B
  • Loader handles Nemotron-specific squared ReLU activation (mlp_bias, qk_layernorm)
  • Discriminator field documented (Nemotron has mlp_bias: true + partial_rotary_factor)
  • Inference smoke pass against Nemotron-Nano-4B improves the 12.3% pass rate from QA: Nemotron-Nano-4B largely unsupported (12.3% pass rate) #228

Unblock impact

  • Cookbook manifest flips from blocked to certified
  • Re-opens path to the Nemotron QA campaign (currently 12.3% — way below the QA campaign threshold)

Cookbook reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions