Skip to content

Use load_generate_function in legacy sglang_rollout path#1016

Merged
maocheng23 merged 1 commit intomainfrom
fix/legacy-generate-fn-compat
Apr 20, 2026
Merged

Use load_generate_function in legacy sglang_rollout path#1016
maocheng23 merged 1 commit intomainfrom
fix/legacy-generate-fn-compat

Conversation

@maocheng23
Copy link
Copy Markdown
Contributor

Summary

Replace the inline inspect.signature-based dispatch in sglang_rollout.py::generate_and_rm with the existing load_generate_function from compatibility.py.

This allows GenerateFnInput-style generate functions (e.g. agentic_tool_call.generate) to work on the legacy rollout path (without MILES_EXPERIMENTAL_ROLLOUT_REFACTOR=1), while keeping full backward compatibility with existing 3-arg and 4-arg custom generate functions.

Why not the approach in #1008?

PR #1008 adds an inline shim that inspect.signature-sniffs on every per-sample call and duplicates logic already in compatibility.py. This PR instead reuses the existing load_generate_function + LegacyGenerateFnAdapter infrastructure which:

  • Detects legacy vs new signatures at load time (not per-call)
  • Uses a more precise heuristic (len(params) >= 3 and params[0] != "input") vs len(params) == 1
  • Keeps a single source of truth for dispatch logic (DRY)
  • Has no performance impact relative to the old code (same load_function + inspect.signature cost that was already there)

Changes

  • miles/rollout/sglang_rollout.py: Replace 6-line inline dispatch with 4-line unified GenerateFnInput call via load_generate_function
  • Remove unused import inspect

Test plan

  • Existing 3-arg and 4-arg custom generate functions dispatch correctly through LegacyGenerateFnAdapter (no behavior change)
  • New GenerateFnInput-style functions (e.g. agentic_tool_call.generate) work without MILES_EXPERIMENTAL_ROLLOUT_REFACTOR=1
  • Per-sample generate_function_path (from eval dataset configs) continues to work

Made with Cursor

Replace the inline inspect-based dispatch in generate_and_rm with the
existing load_generate_function from compatibility.py. This allows
GenerateFnInput-style generate functions (e.g. agentic_tool_call.generate)
to work on the legacy path without MILES_EXPERIMENTAL_ROLLOUT_REFACTOR,
while keeping full backward compatibility with 3-arg/4-arg signatures.

Made-with: Cursor
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the custom generation logic in sglang_rollout.py to use a standardized load_generate_function and GenerateFnInput interface, replacing manual signature inspection. Feedback suggests addressing performance concerns regarding repeated function loading and improving robustness when extracting samples from the generation output.

Comment on lines +263 to +268
generate_fn = load_generate_function(custom_func_path) if custom_func_path else None
if generate_fn is not None:
output = await generate_fn(
GenerateFnInput(state=state, sample=sample, sampling_params=sampling_params, evaluation=evaluation)
)
sample = output.samples
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are two potential issues with this implementation:

  1. Performance: load_generate_function is called for every sample. This involves load_function (module imports) and inspect.signature (expensive) on every call. Since args.custom_generate_function_path is usually constant for the entire rollout, it would be more efficient to load it once or use a cache.
  2. Robustness: Line 268 assumes output is a GenerateFnOutput. While LegacyGenerateFnAdapter ensures this for legacy functions, 'new-style' functions are returned as-is by load_generate_function. If a new-style function returns a Sample or list[Sample] directly instead of wrapping it in GenerateFnOutput, this will raise an AttributeError.

Consider using a more defensive approach for extracting samples and caching the loaded function.

Suggested change
generate_fn = load_generate_function(custom_func_path) if custom_func_path else None
if generate_fn is not None:
output = await generate_fn(
GenerateFnInput(state=state, sample=sample, sampling_params=sampling_params, evaluation=evaluation)
)
sample = output.samples
generate_fn = load_generate_function(custom_func_path) if custom_func_path else None
if generate_fn is not None:
output = await generate_fn(
GenerateFnInput(state=state, sample=sample, sampling_params=sampling_params, evaluation=evaluation)
)
sample = output.samples if hasattr(output, "samples") else output

Copy link
Copy Markdown
Contributor

@Shi-Dong Shi-Dong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@maocheng23 maocheng23 marked this pull request as ready for review April 20, 2026 06:56
@maocheng23 maocheng23 merged commit 9a00364 into main Apr 20, 2026
5 checks passed
@maocheng23 maocheng23 deleted the fix/legacy-generate-fn-compat branch April 20, 2026 06:57
GuanxingLu pushed a commit to GuanxingLu/miles that referenced this pull request Apr 21, 2026
DavidBellamy added a commit to LLM360/miles that referenced this pull request Apr 21, 2026
…region clusters (#10)

* Revert "[BUGFIX] [P2PRDMA] Add rollout post-processing after P2PRDMA weight updates" (radixark#882)

* [Fix] fix ci (radixark#894)

* Avoid threading for ray getting object (radixark#886)

* Add explicit errors for unsupported Megatron profiles (radixark#887)

* Add nvfp4 quantizer files (radixark#907)

* Bump flash-linear-attention version to 0.4.2 (radixark#892)

* [BUGFIX] Invoke "post_process_quantization" by default after weight updating (radixark#890)

Co-authored-by: Yueming Yuan <yym022502@gmail.com>

* Add heartbeat and id to session server (radixark#866)

* fix: adding thin glm5 image to docker build + latest tag sync (radixark#871)

* Add consistent hashing routing policy for rollout (radixark#891)

Co-authored-by: Yueming Yuan <yueming@Mac.attlocal.net>

* [example] add retool v2 example with multi-turn framework interfaces (radixark#654)

Co-authored-by: GuanxingLu <gxlu02@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Expose rollout-batch-size, n-samples-per-prompt, global-batch-size as CLI args in swe-agent-v2 (radixark#954)

Co-authored-by: Shi Dong <shi.dong@radixark.ai>

* chore: remove obsolete swe-agent server.py and run-qwen3.sh (radixark#952)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add weight staleness control for fully async rollout (radixark#958)

* Fix/pause generation mode (radixark#924)

Co-authored-by: Yueming Yuan <yym022502@gmail.com>

* [v0.5.10][1] Bump sglang to v0.5.10 (radixark#898)

* [v0.5.10][2] Fix apply_chat_template behavior for transformers >=5.0 (radixark#926)

Co-authored-by: guapisolo <guapisolo@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [v0.5.10][3] Fix processor return_tensors duplicate kwarg for transformers >=5.0 (radixark#927)

Co-authored-by: guapisolo <guapisolo@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [v0.5.10][4] Fix _no_split_modules set not subscriptable in transformers >=5.0 (radixark#931)

* [v0.5.10][5] Disable piecewise cuda graph to avoid NVLS oom (radixark#935)

* [v0.5.10][6][FSDP] fix outdated weight update logic in FSDP (radixark#948)

Co-authored-by: guapisolo <guapisolo@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [v0.5.10][7][FSDP] move FSDP to experimental and disable by default (radixark#961)

* Add skiplist and more robust calculation on val (radixark#965)

* [fix] tiny fix debug rollout only in weight version check (radixark#967)

* feat: real cp support with relayout fix for qwen3.5 train/rollout mismatch (radixark#885)

* [AMD] Upgrade to sglv0.5.10 (radixark#973)

* switch model to actor (radixark#756)

* [fix] support general logic to bypass fp32 downcast and fix qwen35 A_log dtype (radixark#975)

Co-authored-by: yueming-yuan <yym022502@gmail.com>

* fix: populate prefix_cache_info in OpenAI/session rollout path (radixark#960)

* Remove prepare_harbor_tasks.py; use harbor-private adapters (radixark#982)

* [fix] Skip flush_cache in in_place mode and add fully async example (radixark#974)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* GLM47 full cmd for async and sync reasoning (radixark#986)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: handle non-tool appended messages in TITO incremental tokenization (radixark#949)

Co-authored-by: Yanbin Jiang <jybsuper@gmail.com>

* [docker] Add sgl-model-gateway install and download .tar.gz assets (radixark#895)

* [ci] fix hf rate limit error by caching tokenizer loading (radixark#1014)

Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com>

* Use load_generate_function in legacy sglang_rollout path (radixark#1016)

* Update CODEOWNERS to add new reviewers (radixark#1021)

* Support moe lora for gpt-oss (radixark#798)

Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>

* [fix] restore expert_bias to fp32 before bridge weight export (radixark#811)

* [chore] drop legacy transformers upgrade pin for glm47-flash and qwen35 (radixark#1018)

* [fix] Enforce param dtype before wrap ddp (radixark#992)

Co-authored-by: Zhichen Zeng <zczeng@uw.edu>

* [upgrade] update Megatron-Bridge source and LoRA CI to megatron e2e tests and  (radixark#1023)

* [CI] Drop --use-miles-router from R3 tests and add r3 comparasion test between sgl & miles router (radixark#1015)

* wandb: raise init_timeout, add retry wrapper, fix shared-mode init for cross-region clusters

In online + shared mode, both `init_wandb_primary` and `init_wandb_secondary`
make HTTPS round-trips to wandb cloud (login + run create/attach). On
high-latency cross-region clusters (e.g. Abu Dhabi MBZUAI ↔ wandb-cloud
US-West) with concurrent actor bursts, a single round-trip can exceed the
wandb SDK's 90s default `init_timeout` — tearing down the whole run
with a silent handshake abort. Observed on RL360 job 1564420, which
forced `WANDB_MODE=offline` as a global default ever since (see
https://github.com/LLM360/RL360/issues/87).

The issue's original diagnosis assumed a local primary↔secondary socket
handshake race. That's not how shared mode works — per wandb's own
feature PR (wandb/wandb#6882), each writer spawns
an independent wandb-core that talks to the cloud directly; aggregation
is server-side by run_id. No local socket exists. The failure mode is
pure network/latency, not a local readiness race.

Changes
-------

- Bump `init_timeout` to 300s for primary and secondary Settings.
  Configurable via `WANDB_INIT_TIMEOUT_SECS` env var for tuning.
- Wrap both init paths in a bounded exponential-backoff retry
  (`_wandb_init_with_retry`) that re-attempts on wandb.errors.CommError
  and wandb.errors.UsageError. 3 attempts with 5→10→20s backoff by
  default, tunable via `WANDB_INIT_RETRY_ATTEMPTS` /
  `WANDB_INIT_RETRY_BACKOFF_SECS`.
- Add `x_label` tagging per wandb distributed-training docs: primary
  gets `rank_<rank>_primary`, secondaries get `rank_<rank>_secondary`.
  Enables per-rank console-log filtering in the wandb UI.
- Drop `reinit=True` from secondary init_kwargs. Shared mode natively
  supports concurrent writers on a single run; `reinit=True` triggered
  stale-state warnings on secondary actors without functional benefit.

Followups this change enables
-----------------------------

- `WANDB_MODE=offline` can be removed from scale.yaml's extra_env
  default once a pilot run confirms online mode boots cleanly.
- The tmux-based `~/bin/wandb-sync-rl360.sh` workaround on David's M2
  account becomes obsolete (no more offline-only default).
- Near-realtime wandb dashboards replace the ~2-minute-lag offline
  sync; per-rank system metrics via x_label filtering.

---------

Co-authored-by: JD <jaedon.guo@gmail.com>
Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: Ziang Li <ziangli@umich.edu>
Co-authored-by: Zhichen Zeng <zczeng@uw.edu>
Co-authored-by: JensenFire <xinji1@microsoft.com>
Co-authored-by: Yueming Yuan <yym022502@gmail.com>
Co-authored-by: maocheng23 <35615230+maocheng23@users.noreply.github.com>
Co-authored-by: Douglas Yang <douglasyang88@gmail.com>
Co-authored-by: Yueming Yuan <yueming@Mac.attlocal.net>
Co-authored-by: Huapeng Zhou <73010314+PopSoda2002@users.noreply.github.com>
Co-authored-by: GuanxingLu <gxlu02@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Shi-Dong <Shi-Dong@users.noreply.github.com>
Co-authored-by: Shi Dong <shi.dong@radixark.ai>
Co-authored-by: Jiajun Li <48857426+guapisolo@users.noreply.github.com>
Co-authored-by: guapisolo <guapisolo@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Yuzhen Zhou <82826991+zyzshishui@users.noreply.github.com>
Co-authored-by: Yanbin Jiang <jybsuper@gmail.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Yisheng Gong <yishenggong9437@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants