Support moe lora for gpt-oss by gongyisheng · Pull Request #798 · radixark/miles

gongyisheng · 2026-03-24T07:22:27Z

support moe lora for gpt-oss models
Megatron-Bridge:

@yushengsu-thu modified
miles code:
https://github.com/gongyisheng/miles/pulls
mg-bridge version:
https://github.com/radixark/Megatron-Bridge/tree/bridge
sgl version:
https://github.com/sgl-project/sglang/commits/sglang-miles-lora/

gemini-code-assist · 2026-03-24T07:22:39Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug in the SGLang LoRA adapter loading mechanism, specifically when operating in a tensor-parallel environment. The fix ensures that all necessary tensor components are correctly passed to the loading function, thereby preventing partial or failed LoRA adapter initialization and improving the robustness of the system under distributed configurations.

Highlights

LoRA Adapter Loading Fix: Corrected an issue where only the first serialized tensor was being passed during LoRA adapter loading in a tensor-parallel (TP) setup, leading to incomplete or incorrect loading.
Tensor Parallelism Compatibility: Ensured that the load_lora_adapter_from_tensors function correctly handles all parts of serialized tensors, making the LoRA loading process compatible with tensor parallelism.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request fixes a bug in LoRA weight loading when tensor parallelism is enabled. The original code only considered weights from the first rank of the tensor parallel group, which is incorrect. The fix ensures that weights from all ranks are gathered and sent to the SGLang engine. My review includes a suggestion to improve the readability of the list comprehension used for this fix.

gemini-code-assist · 2026-03-24T07:23:22Z

                    lora_name=lora_name,
                    config_dict=lora_config,
-                    serialized_tensors=serialized_named_tensors[0][0],
+                    serialized_tensors=[serialized_named_tensors[i][0] for i in range(len(serialized_named_tensors))],


For better readability and to make this more Pythonic, you can simplify this list comprehension. The current C-style for i in range(len(...)) loop can be replaced with a direct iteration over the list.

Suggested change

serialized_tensors=[serialized_named_tensors[i][0] for i in range(len(serialized_named_tensors))],

serialized_tensors=[tensors[0] for tensors in serialized_named_tensors],

- Auto-detect GPUS_PER_NODE from CUDA_VISIBLE_DEVICES instead of hardcoding 4 - Add --sglang-moe-runner-backend triton flag - Fix serialized_tensors to use single tensor instead of list Made-with: Cursor

yushengsu-thu

@yushengsu-thu modified
miles code:
https://github.com/gongyisheng/miles/pulls
mg-bridge version:
https://github.com/radixark/Megatron-Bridge/tree/bridge
sgl version:
https://github.com/sgl-project/sglang/commits/sglang-miles-lora/

feat: dynamic GPU count and MoE runner backend for lora training

… comments

yushengsu-thu · 2026-04-20T22:16:01Z

add the --lora-use-virtual-experts into script: optimized router
After this merge Dual MoE CUDA graph capture for lora/nolora batches sgl-project/sglang#22809, add this into sgl patch

yushengsu-thu · 2026-04-20T22:30:23Z

Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>

…region clusters (#10) * Revert "[BUGFIX] [P2PRDMA] Add rollout post-processing after P2PRDMA weight updates" (radixark#882) * [Fix] fix ci (radixark#894) * Avoid threading for ray getting object (radixark#886) * Add explicit errors for unsupported Megatron profiles (radixark#887) * Add nvfp4 quantizer files (radixark#907) * Bump flash-linear-attention version to 0.4.2 (radixark#892) * [BUGFIX] Invoke "post_process_quantization" by default after weight updating (radixark#890) Co-authored-by: Yueming Yuan <yym022502@gmail.com> * Add heartbeat and id to session server (radixark#866) * fix: adding thin glm5 image to docker build + latest tag sync (radixark#871) * Add consistent hashing routing policy for rollout (radixark#891) Co-authored-by: Yueming Yuan <yueming@Mac.attlocal.net> * [example] add retool v2 example with multi-turn framework interfaces (radixark#654) Co-authored-by: GuanxingLu <gxlu02@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Expose rollout-batch-size, n-samples-per-prompt, global-batch-size as CLI args in swe-agent-v2 (radixark#954) Co-authored-by: Shi Dong <shi.dong@radixark.ai> * chore: remove obsolete swe-agent server.py and run-qwen3.sh (radixark#952) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add weight staleness control for fully async rollout (radixark#958) * Fix/pause generation mode (radixark#924) Co-authored-by: Yueming Yuan <yym022502@gmail.com> * [v0.5.10][1] Bump sglang to v0.5.10 (radixark#898) * [v0.5.10][2] Fix apply_chat_template behavior for transformers >=5.0 (radixark#926) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [v0.5.10][3] Fix processor return_tensors duplicate kwarg for transformers >=5.0 (radixark#927) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [v0.5.10][4] Fix _no_split_modules set not subscriptable in transformers >=5.0 (radixark#931) * [v0.5.10][5] Disable piecewise cuda graph to avoid NVLS oom (radixark#935) * [v0.5.10][6][FSDP] fix outdated weight update logic in FSDP (radixark#948) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [v0.5.10][7][FSDP] move FSDP to experimental and disable by default (radixark#961) * Add skiplist and more robust calculation on val (radixark#965) * [fix] tiny fix debug rollout only in weight version check (radixark#967) * feat: real cp support with relayout fix for qwen3.5 train/rollout mismatch (radixark#885) * [AMD] Upgrade to sglv0.5.10 (radixark#973) * switch model to actor (radixark#756) * [fix] support general logic to bypass fp32 downcast and fix qwen35 A_log dtype (radixark#975) Co-authored-by: yueming-yuan <yym022502@gmail.com> * fix: populate prefix_cache_info in OpenAI/session rollout path (radixark#960) * Remove prepare_harbor_tasks.py; use harbor-private adapters (radixark#982) * [fix] Skip flush_cache in in_place mode and add fully async example (radixark#974) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * GLM47 full cmd for async and sync reasoning (radixark#986) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: handle non-tool appended messages in TITO incremental tokenization (radixark#949) Co-authored-by: Yanbin Jiang <jybsuper@gmail.com> * [docker] Add sgl-model-gateway install and download .tar.gz assets (radixark#895) * [ci] fix hf rate limit error by caching tokenizer loading (radixark#1014) Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com> * Use load_generate_function in legacy sglang_rollout path (radixark#1016) * Update CODEOWNERS to add new reviewers (radixark#1021) * Support moe lora for gpt-oss (radixark#798) Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com> * [fix] restore expert_bias to fp32 before bridge weight export (radixark#811) * [chore] drop legacy transformers upgrade pin for glm47-flash and qwen35 (radixark#1018) * [fix] Enforce param dtype before wrap ddp (radixark#992) Co-authored-by: Zhichen Zeng <zczeng@uw.edu> * [upgrade] update Megatron-Bridge source and LoRA CI to megatron e2e tests and (radixark#1023) * [CI] Drop --use-miles-router from R3 tests and add r3 comparasion test between sgl & miles router (radixark#1015) * wandb: raise init_timeout, add retry wrapper, fix shared-mode init for cross-region clusters In online + shared mode, both `init_wandb_primary` and `init_wandb_secondary` make HTTPS round-trips to wandb cloud (login + run create/attach). On high-latency cross-region clusters (e.g. Abu Dhabi MBZUAI ↔ wandb-cloud US-West) with concurrent actor bursts, a single round-trip can exceed the wandb SDK's 90s default `init_timeout` — tearing down the whole run with a silent handshake abort. Observed on RL360 job 1564420, which forced `WANDB_MODE=offline` as a global default ever since (see https://github.com/LLM360/RL360/issues/87). The issue's original diagnosis assumed a local primary↔secondary socket handshake race. That's not how shared mode works — per wandb's own feature PR (wandb/wandb#6882), each writer spawns an independent wandb-core that talks to the cloud directly; aggregation is server-side by run_id. No local socket exists. The failure mode is pure network/latency, not a local readiness race. Changes ------- - Bump `init_timeout` to 300s for primary and secondary Settings. Configurable via `WANDB_INIT_TIMEOUT_SECS` env var for tuning. - Wrap both init paths in a bounded exponential-backoff retry (`_wandb_init_with_retry`) that re-attempts on wandb.errors.CommError and wandb.errors.UsageError. 3 attempts with 5→10→20s backoff by default, tunable via `WANDB_INIT_RETRY_ATTEMPTS` / `WANDB_INIT_RETRY_BACKOFF_SECS`. - Add `x_label` tagging per wandb distributed-training docs: primary gets `rank_<rank>_primary`, secondaries get `rank_<rank>_secondary`. Enables per-rank console-log filtering in the wandb UI. - Drop `reinit=True` from secondary init_kwargs. Shared mode natively supports concurrent writers on a single run; `reinit=True` triggered stale-state warnings on secondary actors without functional benefit. Followups this change enables ----------------------------- - `WANDB_MODE=offline` can be removed from scale.yaml's extra_env default once a pilot run confirms online mode boots cleanly. - The tmux-based `~/bin/wandb-sync-rl360.sh` workaround on David's M2 account becomes obsolete (no more offline-only default). - Near-realtime wandb dashboards replace the ~2-minute-lag offline sync; per-rank system metrics via x_label filtering. --------- Co-authored-by: JD <jaedon.guo@gmail.com> Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Ziang Li <ziangli@umich.edu> Co-authored-by: Zhichen Zeng <zczeng@uw.edu> Co-authored-by: JensenFire <xinji1@microsoft.com> Co-authored-by: Yueming Yuan <yym022502@gmail.com> Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com> Co-authored-by: Douglas Yang <douglasyang88@gmail.com> Co-authored-by: Yueming Yuan <yueming@Mac.attlocal.net> Co-authored-by: Huapeng Zhou <73010314+PopSoda2002@users.noreply.github.com> Co-authored-by: GuanxingLu <gxlu02@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Shi-Dong <Shi-Dong@users.noreply.github.com> Co-authored-by: Shi Dong <shi.dong@radixark.ai> Co-authored-by: Jiajun Li <48857426+guapisolo@users.noreply.github.com> Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Yuzhen Zhou <82826991+zyzshishui@users.noreply.github.com> Co-authored-by: Yanbin Jiang <jybsuper@gmail.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Yisheng Gong <yishenggong9437@gmail.com>

chore: fix sglang lora load when tp

dd72c25

gongyisheng requested review from fzyzcjy, maocheng23, yueming-yuan and yushengsu-thu as code owners March 24, 2026 07:22

gongyisheng changed the title ~~chore: fix sglang lora load when tp~~ Support moe lora for gpt-oss Mar 24, 2026

gemini-code-assist Bot reviewed Mar 24, 2026

View reviewed changes

yushengsu-thu self-assigned this Mar 24, 2026

yushengsu-thu added run-ci-megatron run-ci-lora labels Mar 24, 2026

gongyisheng and others added 5 commits March 24, 2026 23:37

feat: add gpt-oss moe lora example

90e46d3

Merge branch 'main' into miles-gpt-oss-moe-lora

60de309

Merge branch 'main' into miles-gpt-oss-moe-lora

c2bac63

Merge branch 'main' into miles-gpt-oss-moe-lora

52b3672

feat: dynamic GPU count and MoE runner backend for lora training

676d4e6

- Auto-detect GPUS_PER_NODE from CUDA_VISIBLE_DEVICES instead of hardcoding 4 - Add --sglang-moe-runner-backend triton flag - Fix serialized_tensors to use single tensor instead of list Made-with: Cursor

yushengsu-thu reviewed Apr 19, 2026

View reviewed changes

gongyisheng added 3 commits April 19, 2026 22:49

Merge pull request #2 from yushengsu-thu/miles-gpt-oss-moe-lora-yusheng

a5861fa

feat: dynamic GPU count and MoE runner backend for lora training

chore: example script updates: increase lr, enable cuda graph and add…

70d0e3b

… comments

chore: remove comments

164a460

yushengsu-thu enabled auto-merge (squash) April 20, 2026 22:30

yushengsu-thu approved these changes Apr 20, 2026

View reviewed changes

yushengsu-thu merged commit 38f9183 into radixark:main Apr 20, 2026
23 checks passed

GuanxingLu pushed a commit to GuanxingLu/miles that referenced this pull request Apr 21, 2026

Support moe lora for gpt-oss (radixark#798)

273c34c

Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support moe lora for gpt-oss#798

Support moe lora for gpt-oss#798
yushengsu-thu merged 9 commits intoradixark:mainfrom
gongyisheng:miles-gpt-oss-moe-lora

gongyisheng commented Mar 24, 2026 •

edited by yushengsu-thu

Loading

Uh oh!

gemini-code-assist Bot commented Mar 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 24, 2026

Uh oh!

yushengsu-thu left a comment

Uh oh!

yushengsu-thu commented Apr 20, 2026

Uh oh!

yushengsu-thu commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	serialized_tensors=[serialized_named_tensors[i][0] for i in range(len(serialized_named_tensors))],
	serialized_tensors=[tensors[0] for tensors in serialized_named_tensors],

Conversation

gongyisheng commented Mar 24, 2026 • edited by yushengsu-thu Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Mar 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

yushengsu-thu left a comment

Choose a reason for hiding this comment

Uh oh!

yushengsu-thu commented Apr 20, 2026

Uh oh!

yushengsu-thu commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gongyisheng commented Mar 24, 2026 •

edited by yushengsu-thu

Loading