[Feature] Add support for Kimi K2.6

# [Feature] Add support for Kimi K2.6

## Summary

Add NeMo RL support for `moonshotai/Kimi-K2.6` on the Megatron Core / Megatron backend, including checkpoint conversion or loading, model configuration mapping, example training recipes, and backend consistency validation for RL workflows.

## Motivation

Kimi K2.6 is a recent open-source MoE model from Moonshot AI with strong long-horizon coding and agentic-task performance. The Hugging Face model card describes it as a native multimodal agentic model with:

- 1T total parameters, 32B activated parameters
- 384 experts, 8 selected experts per token, 1 shared expert
- 61 layers, 1 dense layer
- 256K context length
- MLA attention, SwiGLU activation, 160K vocabulary
- MoonViT vision encoder

NeMo RL already exposes a Megatron Core path for large models, long context, MoE, sequence packing, and RL training. Adding Kimi K2.6 support would make it possible to post-train and evaluate a frontier-scale MoE coding/agent model in NeMo RL without users hand-rolling the Megatron mapping and validation.

## Proposed Scope

### MCore model integration

- Add or reuse the Megatron Core model mapping required for Kimi K2.6's architecture.
- Map the HF config fields to the appropriate Megatron Bridge / MCore configuration:
  - MoE expert count, selected experts, shared experts, expert hidden size
  - dense layer placement
  - MLA attention settings
  - vocab size and tokenizer settings
  - long-context settings
- Confirm whether Kimi K2.6 can reuse the Kimi K2.5 architecture path, since the Kimi model card says K2.6 has the same architecture as K2.5 and can reuse its deployment method.

### Checkpoint conversion / loading

- Support conversion from `moonshotai/Kimi-K2.6` Hugging Face checkpoints into the MCore format used by NeMo RL, or document the supported loading path through Megatron Bridge.
- Preserve tied/untied embedding semantics, MoE tensor layout, tokenizer assets, and chat template behavior.
- Document any unsupported weight formats. Kimi K2.6 includes native INT4 quantization; if INT4 is out of scope for training, please document the expected BF16/FP8 path.

### Training and rollout recipes

- Add a minimal SFT recipe for Kimi K2.6 on the Megatron backend.
- Add a minimal GRPO recipe for Kimi K2.6 on the Megatron backend.
- Include recommended parallelism settings for a smoke test and for a realistic multi-node run where possible:
  - TP / PP / CP / EP / FSDP
  - sequence packing on/off guidance for MoE
  - long-context guidance
- Clarify supported rollout backends:
  - vLLM and/or SGLang for generation
  - Megatron inference if supported to avoid weight conversion

### Validation

Follow the NeMo RL "Add New Models" validation workflow:

- Verify HF vs rollout-backend log probability consistency.
- Verify Megatron vs rollout-backend log probability consistency.
- Run across real and synthetic prompts, greedy and sampling modes, multiple batch sizes, and at least short/medium/long sequence lengths.
- Use the documented `1.05` error threshold for equal-precision log-probability consistency unless Kimi-specific precision caveats require a different threshold.
- Run the existing model diagnostics where applicable:
  - `max_model_len_respected.py`
  - `long_generation_decode_vs_prefill.py`
  - `check_hf_model_embeddings_untrained.py`
  - `vllm_precision_compilation_test.py`

## Acceptance Criteria

- `moonshotai/Kimi-K2.6` can be loaded or converted for Megatron Core training in NeMo RL.
- A Kimi K2.6 Megatron config can complete a small SFT smoke test.
- A Kimi K2.6 Megatron config can complete a small GRPO smoke test.
- Training-backend and rollout-backend log probabilities are validated and documented.
- Example configs are added under `examples/configs/`.
- Documentation is updated to list Kimi K2.6 support, known limitations, and recommended precision/parallelism settings.
- Tests or diagnostics are added so regressions in the model mapping or checkpoint conversion path are caught.

## References

- Kimi K2.6 model card: https://huggingface.co/moonshotai/Kimi-K2.6
- NeMo RL repository: https://github.com/NVIDIA-NeMo/RL
- NeMo RL Add New Models guide: https://docs.nvidia.com/nemo/rl/latest/adding-new-models.html
- NeMo RL Megatron/MCore backend docs and README: https://docs.nvidia.com/nemo/rl/latest/index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add support for Kimi K2.6 #2412

[Feature] Add support for Kimi K2.6

Summary

Motivation

Proposed Scope

MCore model integration

Checkpoint conversion / loading

Training and rollout recipes

Validation

Acceptance Criteria

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] Add support for Kimi K2.6 #2412

Description

[Feature] Add support for Kimi K2.6

Summary

Motivation

Proposed Scope

MCore model integration

Checkpoint conversion / loading

Training and rollout recipes

Validation

Acceptance Criteria

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions