Use load_generate_function in legacy sglang_rollout path by maocheng23 · Pull Request #1016 · radixark/miles

maocheng23 · 2026-04-20T06:42:45Z

Summary

Replace the inline inspect.signature-based dispatch in sglang_rollout.py::generate_and_rm with the existing load_generate_function from compatibility.py.

This allows GenerateFnInput-style generate functions (e.g. agentic_tool_call.generate) to work on the legacy rollout path (without MILES_EXPERIMENTAL_ROLLOUT_REFACTOR=1), while keeping full backward compatibility with existing 3-arg and 4-arg custom generate functions.

Why not the approach in #1008?

PR #1008 adds an inline shim that inspect.signature-sniffs on every per-sample call and duplicates logic already in compatibility.py. This PR instead reuses the existing load_generate_function + LegacyGenerateFnAdapter infrastructure which:

Detects legacy vs new signatures at load time (not per-call)
Uses a more precise heuristic (len(params) >= 3 and params[0] != "input") vs len(params) == 1
Keeps a single source of truth for dispatch logic (DRY)
Has no performance impact relative to the old code (same load_function + inspect.signature cost that was already there)

Changes

miles/rollout/sglang_rollout.py: Replace 6-line inline dispatch with 4-line unified GenerateFnInput call via load_generate_function
Remove unused import inspect

Test plan

Existing 3-arg and 4-arg custom generate functions dispatch correctly through LegacyGenerateFnAdapter (no behavior change)
New GenerateFnInput-style functions (e.g. agentic_tool_call.generate) work without MILES_EXPERIMENTAL_ROLLOUT_REFACTOR=1
Per-sample generate_function_path (from eval dataset configs) continues to work

Made with Cursor

Replace the inline inspect-based dispatch in generate_and_rm with the existing load_generate_function from compatibility.py. This allows GenerateFnInput-style generate functions (e.g. agentic_tool_call.generate) to work on the legacy path without MILES_EXPERIMENTAL_ROLLOUT_REFACTOR, while keeping full backward compatibility with 3-arg/4-arg signatures. Made-with: Cursor

gemini-code-assist

Code Review

This pull request refactors the custom generation logic in sglang_rollout.py to use a standardized load_generate_function and GenerateFnInput interface, replacing manual signature inspection. Feedback suggests addressing performance concerns regarding repeated function loading and improving robustness when extracting samples from the generation output.

gemini-code-assist · 2026-04-20T06:45:28Z

+            generate_fn = load_generate_function(custom_func_path) if custom_func_path else None
+            if generate_fn is not None:
+                output = await generate_fn(
+                    GenerateFnInput(state=state, sample=sample, sampling_params=sampling_params, evaluation=evaluation)
+                )
+                sample = output.samples


There are two potential issues with this implementation:

Performance: load_generate_function is called for every sample. This involves load_function (module imports) and inspect.signature (expensive) on every call. Since args.custom_generate_function_path is usually constant for the entire rollout, it would be more efficient to load it once or use a cache.

Robustness: Line 268 assumes output is a GenerateFnOutput. While LegacyGenerateFnAdapter ensures this for legacy functions, 'new-style' functions are returned as-is by load_generate_function. If a new-style function returns a Sample or list[Sample] directly instead of wrapping it in GenerateFnOutput, this will raise an AttributeError.

Consider using a more defensive approach for extracting samples and caching the loaded function.

Suggested change

generate_fn = load_generate_function(custom_func_path) if custom_func_path else None

if generate_fn is not None:

output = await generate_fn(

GenerateFnInput(state=state, sample=sample, sampling_params=sampling_params, evaluation=evaluation)

)

sample = output.samples

generate_fn = load_generate_function(custom_func_path) if custom_func_path else None

if generate_fn is not None:

output = await generate_fn(

GenerateFnInput(state=state, sample=sample, sampling_params=sampling_params, evaluation=evaluation)

)

sample = output.samples if hasattr(output, "samples") else output

Shi-Dong

LGTM!

…region clusters (#10) * Revert "[BUGFIX] [P2PRDMA] Add rollout post-processing after P2PRDMA weight updates" (radixark#882) * [Fix] fix ci (radixark#894) * Avoid threading for ray getting object (radixark#886) * Add explicit errors for unsupported Megatron profiles (radixark#887) * Add nvfp4 quantizer files (radixark#907) * Bump flash-linear-attention version to 0.4.2 (radixark#892) * [BUGFIX] Invoke "post_process_quantization" by default after weight updating (radixark#890) Co-authored-by: Yueming Yuan <yym022502@gmail.com> * Add heartbeat and id to session server (radixark#866) * fix: adding thin glm5 image to docker build + latest tag sync (radixark#871) * Add consistent hashing routing policy for rollout (radixark#891) Co-authored-by: Yueming Yuan <yueming@Mac.attlocal.net> * [example] add retool v2 example with multi-turn framework interfaces (radixark#654) Co-authored-by: GuanxingLu <gxlu02@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Expose rollout-batch-size, n-samples-per-prompt, global-batch-size as CLI args in swe-agent-v2 (radixark#954) Co-authored-by: Shi Dong <shi.dong@radixark.ai> * chore: remove obsolete swe-agent server.py and run-qwen3.sh (radixark#952) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add weight staleness control for fully async rollout (radixark#958) * Fix/pause generation mode (radixark#924) Co-authored-by: Yueming Yuan <yym022502@gmail.com> * [v0.5.10][1] Bump sglang to v0.5.10 (radixark#898) * [v0.5.10][2] Fix apply_chat_template behavior for transformers >=5.0 (radixark#926) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [v0.5.10][3] Fix processor return_tensors duplicate kwarg for transformers >=5.0 (radixark#927) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [v0.5.10][4] Fix _no_split_modules set not subscriptable in transformers >=5.0 (radixark#931) * [v0.5.10][5] Disable piecewise cuda graph to avoid NVLS oom (radixark#935) * [v0.5.10][6][FSDP] fix outdated weight update logic in FSDP (radixark#948) Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [v0.5.10][7][FSDP] move FSDP to experimental and disable by default (radixark#961) * Add skiplist and more robust calculation on val (radixark#965) * [fix] tiny fix debug rollout only in weight version check (radixark#967) * feat: real cp support with relayout fix for qwen3.5 train/rollout mismatch (radixark#885) * [AMD] Upgrade to sglv0.5.10 (radixark#973) * switch model to actor (radixark#756) * [fix] support general logic to bypass fp32 downcast and fix qwen35 A_log dtype (radixark#975) Co-authored-by: yueming-yuan <yym022502@gmail.com> * fix: populate prefix_cache_info in OpenAI/session rollout path (radixark#960) * Remove prepare_harbor_tasks.py; use harbor-private adapters (radixark#982) * [fix] Skip flush_cache in in_place mode and add fully async example (radixark#974) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * GLM47 full cmd for async and sync reasoning (radixark#986) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: handle non-tool appended messages in TITO incremental tokenization (radixark#949) Co-authored-by: Yanbin Jiang <jybsuper@gmail.com> * [docker] Add sgl-model-gateway install and download .tar.gz assets (radixark#895) * [ci] fix hf rate limit error by caching tokenizer loading (radixark#1014) Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com> * Use load_generate_function in legacy sglang_rollout path (radixark#1016) * Update CODEOWNERS to add new reviewers (radixark#1021) * Support moe lora for gpt-oss (radixark#798) Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com> * [fix] restore expert_bias to fp32 before bridge weight export (radixark#811) * [chore] drop legacy transformers upgrade pin for glm47-flash and qwen35 (radixark#1018) * [fix] Enforce param dtype before wrap ddp (radixark#992) Co-authored-by: Zhichen Zeng <zczeng@uw.edu> * [upgrade] update Megatron-Bridge source and LoRA CI to megatron e2e tests and (radixark#1023) * [CI] Drop --use-miles-router from R3 tests and add r3 comparasion test between sgl & miles router (radixark#1015) * wandb: raise init_timeout, add retry wrapper, fix shared-mode init for cross-region clusters In online + shared mode, both `init_wandb_primary` and `init_wandb_secondary` make HTTPS round-trips to wandb cloud (login + run create/attach). On high-latency cross-region clusters (e.g. Abu Dhabi MBZUAI ↔ wandb-cloud US-West) with concurrent actor bursts, a single round-trip can exceed the wandb SDK's 90s default `init_timeout` — tearing down the whole run with a silent handshake abort. Observed on RL360 job 1564420, which forced `WANDB_MODE=offline` as a global default ever since (see https://github.com/LLM360/RL360/issues/87). The issue's original diagnosis assumed a local primary↔secondary socket handshake race. That's not how shared mode works — per wandb's own feature PR (wandb/wandb#6882), each writer spawns an independent wandb-core that talks to the cloud directly; aggregation is server-side by run_id. No local socket exists. The failure mode is pure network/latency, not a local readiness race. Changes ------- - Bump `init_timeout` to 300s for primary and secondary Settings. Configurable via `WANDB_INIT_TIMEOUT_SECS` env var for tuning. - Wrap both init paths in a bounded exponential-backoff retry (`_wandb_init_with_retry`) that re-attempts on wandb.errors.CommError and wandb.errors.UsageError. 3 attempts with 5→10→20s backoff by default, tunable via `WANDB_INIT_RETRY_ATTEMPTS` / `WANDB_INIT_RETRY_BACKOFF_SECS`. - Add `x_label` tagging per wandb distributed-training docs: primary gets `rank_<rank>_primary`, secondaries get `rank_<rank>_secondary`. Enables per-rank console-log filtering in the wandb UI. - Drop `reinit=True` from secondary init_kwargs. Shared mode natively supports concurrent writers on a single run; `reinit=True` triggered stale-state warnings on secondary actors without functional benefit. Followups this change enables ----------------------------- - `WANDB_MODE=offline` can be removed from scale.yaml's extra_env default once a pilot run confirms online mode boots cleanly. - The tmux-based `~/bin/wandb-sync-rl360.sh` workaround on David's M2 account becomes obsolete (no more offline-only default). - Near-realtime wandb dashboards replace the ~2-minute-lag offline sync; per-rank system metrics via x_label filtering. --------- Co-authored-by: JD <jaedon.guo@gmail.com> Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Ziang Li <ziangli@umich.edu> Co-authored-by: Zhichen Zeng <zczeng@uw.edu> Co-authored-by: JensenFire <xinji1@microsoft.com> Co-authored-by: Yueming Yuan <yym022502@gmail.com> Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com> Co-authored-by: Douglas Yang <douglasyang88@gmail.com> Co-authored-by: Yueming Yuan <yueming@Mac.attlocal.net> Co-authored-by: Huapeng Zhou <73010314+PopSoda2002@users.noreply.github.com> Co-authored-by: GuanxingLu <gxlu02@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Shi-Dong <Shi-Dong@users.noreply.github.com> Co-authored-by: Shi Dong <shi.dong@radixark.ai> Co-authored-by: Jiajun Li <48857426+guapisolo@users.noreply.github.com> Co-authored-by: guapisolo <guapisolo@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Yuzhen Zhou <82826991+zyzshishui@users.noreply.github.com> Co-authored-by: Yanbin Jiang <jybsuper@gmail.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Yisheng Gong <yishenggong9437@gmail.com>

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

Shi-Dong approved these changes Apr 20, 2026

View reviewed changes

yueming-yuan approved these changes Apr 20, 2026

View reviewed changes

maocheng23 marked this pull request as ready for review April 20, 2026 06:56

maocheng23 requested review from fzyzcjy and guapisolo as code owners April 20, 2026 06:56

maocheng23 merged commit 9a00364 into main Apr 20, 2026
5 checks passed

maocheng23 deleted the fix/legacy-generate-fn-compat branch April 20, 2026 06:57

This was referenced Apr 20, 2026

Add GenerateFnInput compatibility shim in sglang_rollout #1008

Closed

Add run-glm47-flash-agentic-async.py launcher #1019

Merged

GuanxingLu pushed a commit to GuanxingLu/miles that referenced this pull request Apr 21, 2026

Use load_generate_function in legacy sglang_rollout path (radixark#1016)

b7a20ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use load_generate_function in legacy sglang_rollout path#1016

Use load_generate_function in legacy sglang_rollout path#1016
maocheng23 merged 1 commit intomainfrom
fix/legacy-generate-fn-compat

maocheng23 commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

Shi-Dong left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maocheng23 commented Apr 20, 2026

Summary

Why not the approach in #1008?

Changes

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Shi-Dong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants