Skip to content

[Docs] add cookbook for Ling-2.6 family#23947

Merged
ispobock merged 8 commits intosgl-project:mainfrom
JustinTong0323:feat/ling-26-cookbook
Apr 28, 2026
Merged

[Docs] add cookbook for Ling-2.6 family#23947
ispobock merged 8 commits intosgl-project:mainfrom
JustinTong0323:feat/ling-26-cookbook

Conversation

@JustinTong0323
Copy link
Copy Markdown
Collaborator

@JustinTong0323 JustinTong0323 commented Apr 28, 2026

Summary

Combined cookbook covering the Ling-2.6 family from inclusionAI:

  • Ling-2.6-flash — 104B total / 7.4B active BF16 MoE
  • Ling-2.6-1T — ~1T FP8 (E4M3) MoE

Both share the 1:7 MLA + Lightning Linear hybrid attention backbone introduced in Ling-2.5, refined for token efficiency and agentic workloads (BFCL-V4, TAU2-bench, SWE-bench Verified, Claw-Eval, PinchBench).

What's included

  • docs_new/cookbook/autoregressive/InclusionAI/Ling-2.6.mdx — single combined cookbook page covering both variants, with per-variant deployment selectors, tool-calling examples, and a thinking-mode section.
  • docs_new/src/snippets/autoregressive/ling-26-flash-deployment.jsx — Hardware (H20-3e / H100 / H200 / B200), context length (128K vs 256K via YaRN), tool-call parser, and optional reasoning parser selectors.
  • docs_new/src/snippets/autoregressive/ling-26-1t-deployment.jsx — Hardware (GB300 / GB200 single-node TP=4 vs H200 / B200 two-node TP=8 + PP=2), tool-call parser, and optional reasoning parser selectors.
  • docs_new/docs.json — adds Ling-2.6 to the InclusionAI group (above the existing 2.5 entries).

Notable details

  • Ling-2.6-1T ships in FP8 (E4M3), so unlike Ling-2.5-1T (BF16, multi-node only) it fits a single GB300 node with --tp 4. H200 / B200 fall back to a 2-node TP=8 + PP=2 deployment.
  • All launch commands use the sglang serve CLI (matches the Llama3.1-style pattern elsewhere in docs_new).
  • Thinking mode is controllable on both variants. The chat template defaults to detailed thinking off and is toggled by textual directives in the system message (detailed thinking on / detailed thinking off) — it does not read the Qwen3-style enable_thinking template variable.
  • The recommended baseline does not include --reasoning-parser qwen3. SGLang's qwen3 reasoning parser assumes default-thinking semantics that mismatch Ling-2.6's default-off template; combining the two requires every tool-call request to include chat_template_kwargs.enable_thinking=false to prevent <tool_call>...</tool_call> from being routed into reasoning_content. The cookbook's §4.3 "Thinking Mode" section walks through both how to enable thinking and the parser caveat.
  • GSM8K reference run on Ling-2.6-1T: 96.21% (1269 / 1319) on a single GB300 node with TP=4.

Test plan

  • Cookbook page builds (mdx + JSX snippets follow the same pattern as Ling-2.5-1T.mdx / ling-25-1t-deployment.jsx).
  • docs_new/docs.json validates as JSON; new Ling-2.6 page registered in the InclusionAI group.
  • Ling-2.6-1T launch command in cookbook matches the GB300 4-GPU run (GSM8K 96.21%).
  • Ling-2.6-flash launch command not validated live — copied from inclusionAI's model-card README (TP=4 standard inference path).

Combined cookbook covering Ling-2.6-flash (104B/7.4B BF16 MoE) and
Ling-2.6-1T (~1T FP8 MoE), both built on the 1:7 MLA + Lightning
Linear hybrid attention backbone.

Includes per-variant deployment selectors:
- flash: single-node TP=4 across H20/H100/H200/B200, optional YaRN
  256K context extension, and qwen25 tool-call parser.
- 1T: single GB300/GB200 node with TP=4 (FP8 weights fit on a single
  node, unlike Ling-2.5-1T BF16) or two-node TP=8 + PP=2 fallback for
  H200/B200.

Documents the tool-call + reasoning interaction (qwen + qwen3
parsers) requiring chat_template_kwargs.enable_thinking=false to
avoid <tool_call> being captured into reasoning_content. Includes
GSM8K result on Ling-2.6-1T (96.21%, GB300 x 4).

Both Hugging Face repos are private at the time of writing -- noted
in the cookbook intro.
Ling-2.6-flash is an instruct-only model -- do not pass
--reasoning-parser or chat_template_kwargs.enable_thinking=true.
The reasoning/thinking interaction note still applies to
Ling-2.6-1T only.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds documentation for the Ling-2.6 model family, updates the navigation configuration, and introduces interactive React components for generating deployment commands. The review feedback identifies missing React hook imports in the JSX snippets, dead logic in the environment prefix calculation for the 1T model, and an unused hardware selection variable in the flash model's generator.

@@ -0,0 +1,180 @@
export const Ling261TDeployment = () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The component uses useState and useEffect hooks but does not import them from react. This will cause a ReferenceError at runtime unless these are provided globally by the build environment, which is not standard for .jsx files.

import React, { useState, useEffect } from 'react';

export const Ling261TDeployment = () => {

const { hardware, toolcall, reasoning } = values;
const isSingleNode = hardware === 'gb300' || hardware === 'gb200';
const isGB = hardware === 'gb300' || hardware === 'gb200';
const envPrefix = isGB && !isSingleNode ? 'NCCL_IB_DISABLE=1 ' : '';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for envPrefix is currently ineffective because isGB and isSingleNode are derived from the same hardware IDs (gb300, gb200). As a result, isGB && !isSingleNode is always false. If multi-node GB deployments are not supported, this logic should be simplified to avoid confusion.

Suggested change
const envPrefix = isGB && !isSingleNode ? 'NCCL_IB_DISABLE=1 ' : '';
const envPrefix = '';

@@ -0,0 +1,149 @@
export const Ling26FlashDeployment = () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This component is missing imports for useState and useEffect, which are required for the component to function.

import React, { useState, useEffect } from 'react';

export const Ling26FlashDeployment = () => {


// Generate command
const generateCommand = () => {
const { yarn, toolcall } = values;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The hardware selection is extracted from values but never used to modify the generated command. If the command is indeed identical for all listed hardware, consider removing the selector or using it to display hardware-specific performance expectations to make the UI more meaningful.

Suggested change
const { yarn, toolcall } = values;
const { hardware, yarn, toolcall } = values;

Both Ling-2.6-flash and Ling-2.6-1T are controllable-reasoning
models, not pure-instruct or pure-thinking. The chat template
defaults to 'detailed thinking off' (set via {% set thinking_option =
'off' %}) and is toggled by textual directives in the system message
('detailed thinking on' / 'detailed thinking off'), not by the
Qwen3-style enable_thinking template variable.

Changes:
- Revert previous "flash has no thinking mode" claim -- it does, just
  defaults to off.
- Drop --reasoning-parser qwen3 from the recommended baseline launch
  command (both variants). The qwen3 reasoning parser assumes
  default-thinking semantics and mis-routes Ling-2.6 output into
  reasoning_content.
- New section 4.3 "Thinking Mode (detailed thinking on/off)" covers
  how to enable thinking via the system message and explains the
  qwen3-parser caveat (per-request chat_template_kwargs.enable_thinking
  =false is consumed by the parser, not the template).
- Flash deployment selector gains an optional Reasoning Parser toggle
  (default disabled).
- 1T deployment selector default for Reasoning Parser flipped from
  enabled -> disabled.
- Remove 'private at the time of writing' callout in the intro -- the
  cookbook should read like a normal product doc, not a release note.
- Reword the MTP tip to drop the 'requires a patched branch / once
  upstream' framing.
- 'Validated on...' -> 'Reference run on...' in the benchmark
  section.
Drop the 'nightly PyPI builds required' framing -- the latest stable
release works. Match the standard LLaDA-2.1-style pointer to the
official install guide.
- Replace 'python3 -m sglang.launch_server' with 'sglang serve' in
  the cookbook and both deployment snippets, matching the
  Llama3.1-style pattern used elsewhere in docs_new.
- 1T snippet: remove the dead 'envPrefix = isGB && !isSingleNode ?
  ...' branch (gemini-bot review). The two predicates are derived
  from the same hardware IDs, so the branch never fired; multi-node
  deployment only applies to non-GB hardware here.
The previous heading '### 4.3 Thinking Mode (`detailed thinking
on/off`)' produced an unreliable anchor due to backticks, parens,
and the slash. Drop the parenthetical so the heading slugs cleanly
to 'sgl-project#43-thinking-mode' (matching the convention used by other
cookbooks, e.g. Gemma4 'sgl-project#51-speed-benchmark').
Mintlify generates '#4-3-thinking-mode' (with dash) for the '### 4.3
Thinking Mode' heading -- not 'sgl-project#43-thinking-mode' as I assumed. Fix
both references to use the correct slug.
@ispobock ispobock merged commit 3fce8f2 into sgl-project:main Apr 28, 2026
42 checks passed
vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants