[Docs] add cookbook for Ling-2.6 family by JustinTong0323 · Pull Request #23947 · sgl-project/sglang

JustinTong0323 · 2026-04-28T15:21:29Z

Summary

Combined cookbook covering the Ling-2.6 family from inclusionAI:

Ling-2.6-flash — 104B total / 7.4B active BF16 MoE
Ling-2.6-1T — ~1T FP8 (E4M3) MoE

Both share the 1:7 MLA + Lightning Linear hybrid attention backbone introduced in Ling-2.5, refined for token efficiency and agentic workloads (BFCL-V4, TAU2-bench, SWE-bench Verified, Claw-Eval, PinchBench).

What's included

docs_new/cookbook/autoregressive/InclusionAI/Ling-2.6.mdx — single combined cookbook page covering both variants, with per-variant deployment selectors, tool-calling examples, and a thinking-mode section.
docs_new/src/snippets/autoregressive/ling-26-flash-deployment.jsx — Hardware (H20-3e / H100 / H200 / B200), context length (128K vs 256K via YaRN), tool-call parser, and optional reasoning parser selectors.
docs_new/src/snippets/autoregressive/ling-26-1t-deployment.jsx — Hardware (GB300 / GB200 single-node TP=4 vs H200 / B200 two-node TP=8 + PP=2), tool-call parser, and optional reasoning parser selectors.
docs_new/docs.json — adds Ling-2.6 to the InclusionAI group (above the existing 2.5 entries).

Notable details

Ling-2.6-1T ships in FP8 (E4M3), so unlike Ling-2.5-1T (BF16, multi-node only) it fits a single GB300 node with --tp 4. H200 / B200 fall back to a 2-node TP=8 + PP=2 deployment.
All launch commands use the sglang serve CLI (matches the Llama3.1-style pattern elsewhere in docs_new).
Thinking mode is controllable on both variants. The chat template defaults to detailed thinking off and is toggled by textual directives in the system message (detailed thinking on / detailed thinking off) — it does not read the Qwen3-style enable_thinking template variable.
The recommended baseline does not include --reasoning-parser qwen3. SGLang's qwen3 reasoning parser assumes default-thinking semantics that mismatch Ling-2.6's default-off template; combining the two requires every tool-call request to include chat_template_kwargs.enable_thinking=false to prevent <tool_call>...</tool_call> from being routed into reasoning_content. The cookbook's §4.3 "Thinking Mode" section walks through both how to enable thinking and the parser caveat.
GSM8K reference run on Ling-2.6-1T: 96.21% (1269 / 1319) on a single GB300 node with TP=4.

Test plan

Cookbook page builds (mdx + JSX snippets follow the same pattern as Ling-2.5-1T.mdx / ling-25-1t-deployment.jsx).
docs_new/docs.json validates as JSON; new Ling-2.6 page registered in the InclusionAI group.
Ling-2.6-1T launch command in cookbook matches the GB300 4-GPU run (GSM8K 96.21%).
Ling-2.6-flash launch command not validated live — copied from inclusionAI's model-card README (TP=4 standard inference path).

Combined cookbook covering Ling-2.6-flash (104B/7.4B BF16 MoE) and Ling-2.6-1T (~1T FP8 MoE), both built on the 1:7 MLA + Lightning Linear hybrid attention backbone. Includes per-variant deployment selectors: - flash: single-node TP=4 across H20/H100/H200/B200, optional YaRN 256K context extension, and qwen25 tool-call parser. - 1T: single GB300/GB200 node with TP=4 (FP8 weights fit on a single node, unlike Ling-2.5-1T BF16) or two-node TP=8 + PP=2 fallback for H200/B200. Documents the tool-call + reasoning interaction (qwen + qwen3 parsers) requiring chat_template_kwargs.enable_thinking=false to avoid <tool_call> being captured into reasoning_content. Includes GSM8K result on Ling-2.6-1T (96.21%, GB300 x 4). Both Hugging Face repos are private at the time of writing -- noted in the cookbook intro.

Ling-2.6-flash is an instruct-only model -- do not pass --reasoning-parser or chat_template_kwargs.enable_thinking=true. The reasoning/thinking interaction note still applies to Ling-2.6-1T only.

gemini-code-assist

Code Review

This pull request adds documentation for the Ling-2.6 model family, updates the navigation configuration, and introduces interactive React components for generating deployment commands. The review feedback identifies missing React hook imports in the JSX snippets, dead logic in the environment prefix calculation for the 1T model, and an unused hardware selection variable in the flash model's generator.

gemini-code-assist · 2026-04-28T15:23:18Z

@@ -0,0 +1,180 @@
+export const Ling261TDeployment = () => {


The component uses useState and useEffect hooks but does not import them from react. This will cause a ReferenceError at runtime unless these are provided globally by the build environment, which is not standard for .jsx files.

import React, { useState, useEffect } from 'react'; export const Ling261TDeployment = () => {

gemini-code-assist · 2026-04-28T15:23:18Z

+    const { hardware, toolcall, reasoning } = values;
+    const isSingleNode = hardware === 'gb300' || hardware === 'gb200';
+    const isGB = hardware === 'gb300' || hardware === 'gb200';
+    const envPrefix = isGB && !isSingleNode ? 'NCCL_IB_DISABLE=1 ' : '';


The logic for envPrefix is currently ineffective because isGB and isSingleNode are derived from the same hardware IDs (gb300, gb200). As a result, isGB && !isSingleNode is always false. If multi-node GB deployments are not supported, this logic should be simplified to avoid confusion.

Suggested change

const envPrefix = isGB && !isSingleNode ? 'NCCL_IB_DISABLE=1 ' : '';

const envPrefix = '';

gemini-code-assist · 2026-04-28T15:23:19Z

@@ -0,0 +1,149 @@
+export const Ling26FlashDeployment = () => {


This component is missing imports for useState and useEffect, which are required for the component to function.

import React, { useState, useEffect } from 'react'; export const Ling26FlashDeployment = () => {

gemini-code-assist · 2026-04-28T15:23:19Z

+
+  // Generate command
+  const generateCommand = () => {
+    const { yarn, toolcall } = values;


The hardware selection is extracted from values but never used to modify the generated command. If the command is indeed identical for all listed hardware, consider removing the selector or using it to display hardware-specific performance expectations to make the UI more meaningful.

Suggested change

const { yarn, toolcall } = values;

const { hardware, yarn, toolcall } = values;

Both Ling-2.6-flash and Ling-2.6-1T are controllable-reasoning models, not pure-instruct or pure-thinking. The chat template defaults to 'detailed thinking off' (set via {% set thinking_option = 'off' %}) and is toggled by textual directives in the system message ('detailed thinking on' / 'detailed thinking off'), not by the Qwen3-style enable_thinking template variable. Changes: - Revert previous "flash has no thinking mode" claim -- it does, just defaults to off. - Drop --reasoning-parser qwen3 from the recommended baseline launch command (both variants). The qwen3 reasoning parser assumes default-thinking semantics and mis-routes Ling-2.6 output into reasoning_content. - New section 4.3 "Thinking Mode (detailed thinking on/off)" covers how to enable thinking via the system message and explains the qwen3-parser caveat (per-request chat_template_kwargs.enable_thinking =false is consumed by the parser, not the template). - Flash deployment selector gains an optional Reasoning Parser toggle (default disabled). - 1T deployment selector default for Reasoning Parser flipped from enabled -> disabled.

- Remove 'private at the time of writing' callout in the intro -- the cookbook should read like a normal product doc, not a release note. - Reword the MTP tip to drop the 'requires a patched branch / once upstream' framing. - 'Validated on...' -> 'Reference run on...' in the benchmark section.

Drop the 'nightly PyPI builds required' framing -- the latest stable release works. Match the standard LLaDA-2.1-style pointer to the official install guide.

- Replace 'python3 -m sglang.launch_server' with 'sglang serve' in the cookbook and both deployment snippets, matching the Llama3.1-style pattern used elsewhere in docs_new. - 1T snippet: remove the dead 'envPrefix = isGB && !isSingleNode ? ...' branch (gemini-bot review). The two predicates are derived from the same hardware IDs, so the branch never fired; multi-node deployment only applies to non-GB hardware here.

The previous heading '### 4.3 Thinking Mode (`detailed thinking on/off`)' produced an unreliable anchor due to backticks, parens, and the slash. Drop the parenthetical so the heading slugs cleanly to 'sgl-project#43-thinking-mode' (matching the convention used by other cookbooks, e.g. Gemma4 'sgl-project#51-speed-benchmark').

Mintlify generates '#4-3-thinking-mode' (with dash) for the '### 4.3 Thinking Mode' heading -- not 'sgl-project#43-thinking-mode' as I assumed. Fix both references to use the correct slug.

JustinTong0323 requested a review from wisclmy0611 as a code owner April 28, 2026 15:21

[Docs] Ling-2.6 cookbook: clarify flash has no thinking mode

fc2f8e7

Ling-2.6-flash is an instruct-only model -- do not pass --reasoning-parser or chat_template_kwargs.enable_thinking=true. The reasoning/thinking interaction note still applies to Ling-2.6-1T only.

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

JustinTong0323 added 5 commits April 28, 2026 15:27

[Docs] Ling-2.6 cookbook: simplify installation section

cc990d8

Drop the 'nightly PyPI builds required' framing -- the latest stable release works. Match the standard LLaDA-2.1-style pointer to the official install guide.

ispobock approved these changes Apr 28, 2026

View reviewed changes

[Docs] Ling-2.6 cookbook: fix Thinking Mode anchor slug

0af802a

Mintlify generates '#4-3-thinking-mode' (with dash) for the '### 4.3 Thinking Mode' heading -- not 'sgl-project#43-thinking-mode' as I assumed. Fix both references to use the correct slug.

ispobock merged commit 3fce8f2 into sgl-project:main Apr 28, 2026
42 checks passed

vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026

[Docs] add cookbook for Ling-2.6 family (sgl-project#23947)

2594f91

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] add cookbook for Ling-2.6 family#23947

[Docs] add cookbook for Ling-2.6 family#23947
ispobock merged 8 commits intosgl-project:mainfrom
JustinTong0323:feat/ling-26-cookbook

JustinTong0323 commented Apr 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	const envPrefix = isGB && !isSingleNode ? 'NCCL_IB_DISABLE=1 ' : '';
	const envPrefix = '';

		@@ -0,0 +1,149 @@
		export const Ling26FlashDeployment = () => {

	const { yarn, toolcall } = values;
	const { hardware, yarn, toolcall } = values;

Conversation

JustinTong0323 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Notable details

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JustinTong0323 commented Apr 28, 2026 •

edited

Loading