Skip to content

Nvidia Collaboration Roadmap (2026 Q2) #22960

@Fridge003

Description

@Fridge003

OSS Model Performance Optimization

  • Objective: Improve mainstream OSS model performance for developers on the latest Nvidia hardware.

  • Note: Agg means aggregated serving (P and D on same worker); Disagg means PD disaggregation

  • Qwen3.5-397B (G)B200/(G)B300/Hopper

    • April: Agg Round 1 — GDN kernel, MNNVL All Reduce, spec_V2 MTP; Disagg functional
    • May: Disagg sweeps + Round 2
    • June: Agg/Disagg Round 3
  • gpt-oss-120B

    • April: Agg Round 2 — kernel fusions, communication kernels, memcpy removal
  • DeepSeek (G)B200/(G)B300

    • April: Updated rate-matching sweeps; New sweeps MTP on/off
    • May: DeepSeek V3.2 long context Round 2
    • June: DeepSeek V3.2 long context Round 3
  • Nemotron V3 (G)B200Hopper/Spark

    • April: async schedulin/g + prefix caching; Testing refactor; Return intermediate state
    • May: Disagg tuning/sweeps/kernels
    • June: Ultra support
  • GLM-5 (G)B200/(G)B300/Hopper

    • April: FP8/NVFP4 Functional; Agg Round 1
    • May: Disagg sweeps + Round 2
    • June: Agg/Disagg Round 3
  • Minimax-M2.5

    • April: FP8+NVFP4 Agg functional; Agg gap analysis
    • May: Agg Round 1
  • Qwen3-Coder-480B

    • April: FP8+NVFP4 Agg functional
  • GLM-4.7

    • April: Agg functional; Agg gap analysis
    • May: Agg Round 1

Runtime Optimizations

Objective: Incorporate and improve state-of-the-art runtime features to benefit all SGLang developers.

CI/CD/Dependencies

Dynamo

  • K8s + Planner

    • Enable deeper scheduler level forward pass metrics so customers can leverage planner for optimal engine tuning/performance
    • Enable SGLang with Dynamo K8s + Grove for large scale GB200/300 deployment
  • Agentic Optimizations

  • HiCache

    • Enable Dynamo router to route to workers based on KV cache at multiple tiers
    • Selective prefetch/evict APIs for more granular control
    • Recipes for deploying Dynamo with shared KV cache tier
  • More blog posts!

Documentation / Recipes / Blogs

  • Cookbooks/Docs: Better documentations on follwoing models: Qwen3.5/gpt-oss/GLM-5/MiniMax-M2.5/Qwen3-Coder/GLM-4.7

Miles — RadixArk RL Framework

  • Enablement & Stability: Collaborating with SGLang to bring up GB300 (possibly GB200) support in main branch; ensure key workflows function correctly (CI/CD validation + feature coverage).

  • Performance Track & Optimization: Establish perf baselines on GB300, continuously track regressions, and iterate on fixes after initial bring-up.

Related resources

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions