[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N by kouroshHakha · Pull Request #931 · NovaSky-AI/SkyRL

kouroshHakha · 2026-01-24T02:39:06Z

Summary: Integrates the new inference layer with training code behind a private feature flag _SKYRL_USE_NEW_INFERENCE. When enabled, uses RemoteInferenceClient + ServerGroup + InferenceRouter instead of the legacy Ray actor-based inference. Both code paths remain fully functional; the flag allows gradual rollout and validation.

Key Changes:

Feature Flag (env_vars.py):
- _SKYRL_USE_NEW_INFERENCE env var (default: 0 = legacy path)
New Config Options (ppo_base_config.yaml):
- generator.external_proxy_url - External data plane URL (optional)
- generator.external_server_urls - External control plane URLs (optional)
- Reorganized config sections to separate weight sync from new inference config
Config Validation (utils.py):
- _validate_new_inference_cfg() validates config combinations
- Colocated + external endpoints → Error (must use driver-managed servers)
- Non-colocated routing logic for various external/internal combinations
Updated get_inference_client() (main_base.py):
- When flag enabled: Build VLLMServerGroup + InferenceRouter + RemoteInferenceClient
- When flag disabled: Use legacy InferenceEngineClient (existing behavior)
- Renamed _get_http_inference_client() → _get_new_inference_client()
Weight Sync Integration (worker.py, broadcast_strategy.py, transfer_strategy.py, cuda_ipc_strategy.py):
- worker.py fetches inference_world_size from client.get_world_size() for new inference path
- create_init_info() accepts optional inference_world_size parameter
- Clean separation of code paths with if _SKYRL_USE_NEW_INFERENCE: / else: pattern
API Compatibility (remote_inference_client.py):
- Renamed init_weight_transfer → init_weight_update_communicator
- Renamed update_weights → update_named_weights
- Added tags parameter to sleep()/wake_up() for colocation

Files Changed:

File	Change
`env_vars.py`	Add `_SKYRL_USE_NEW_INFERENCE` feature flag
`ppo_base_config.yaml`	Add `external_proxy_url`, `external_server_urls`; reorganize sections
`utils.py`	Add `_validate_new_inference_cfg()` with routing logic
`main_base.py`	Update `get_inference_client()` with `_get_new_inference_client()`
`remote_inference_client.py`	Rename methods for API compatibility
`worker.py`	Fetch `inference_world_size` from client for new inference path
`broadcast_strategy.py`	Accept `inference_world_size` parameter, clean conditional logic
`transfer_strategy.py`	Update base class signature
`cuda_ipc_strategy.py`	Accept (and ignore) `inference_world_size` parameter

Testing:

# Legacy path (default)
pytest tests/

# New HTTP path
_SKYRL_USE_HTTP_INFERENCE=1 pytest tests/

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Phase 2b-1: Refactors inference client creation into an overridable hook. Changes: - Add get_inference_client() -> InferenceEngineInterface hook in BasePPOExp - Update _setup_trainer() to use the new hook - Refactor DAPOExp to override get_inference_client() instead of duplicating _setup_trainer() - Update EvalOnlyEntrypoint.run() to use the hook - Update TerminalBenchGenerateExp._setup_generator() to use the hook - Move strategy validation for FlashRL to main() for early failure - Fix bug: add missing tokenizer arg in DAPOExp remote engines path This refactor eliminates code duplication and prepares for future RemoteInferenceClient integration (Phase 2b-2).

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…nce-3

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

gemini-code-assist

Code Review

This pull request effectively integrates an HTTP-based inference layer behind a feature flag, providing a clear path for gradual rollout. The changes are well-structured, touching configuration, the main entrypoint, client-side logic, and weight synchronization strategies. The addition of config validation for the new HTTP inference path is a great touch. I've identified a potential resource leak regarding the new server group and router resources that are not being torn down, and a minor inconsistency in parameter handling in the remote client. Overall, this is a solid contribution towards modernizing the inference architecture.

gemini-code-assist · 2026-01-31T19:52:53Z

skyrl-train/skyrl_train/entrypoints/main_base.py

+        # HTTP inference resources (created lazily when _SKYRL_USE_HTTP_INFERENCE=1)
+        self._server_group = None
+        self._inference_router = None


The _server_group and _inference_router are initialized here and created in _get_http_inference_client, but there doesn't appear to be corresponding teardown logic to shut them down. This can lead to resource leaks, especially since they manage Ray actors and other resources. Consider adding a teardown method to BasePPOExp that calls shutdown() on these objects, and ensure it's called reliably (e.g., in a finally block within the run method).

gemini-code-assist · 2026-01-31T19:52:53Z

skyrl-train/skyrl_train/inference_servers/remote_inference_client.py

+        body = {"level": level}
+        if tags:
+            body["tags"] = tags
+        return await self._call_all_servers("/sleep", body)


The sleep method checks for the tags parameter with if tags:, which evaluates to False for an empty list []. This is inconsistent with the wake_up method, which would correctly handle an empty list. This can lead to unexpected behavior if an empty list of tags is intentionally passed. To ensure consistent and predictable behavior, it's better to check for tags is not None.

Suggested change

body = {"level": level}

if tags:

body["tags"] = tags

return await self._call_all_servers("/sleep", body)

body = {"level": level}

if tags is not None:

body["tags"] = tags

return await self._call_all_servers("/sleep", body)

CharlieFRuan

Thank you! Left relatively nit comments, mainly on the feature rollout's naming

skyrl-train/skyrl_train/config/ppo_base_config.yaml

skyrl-train/skyrl_train/env_vars.py

skyrl-train/skyrl_train/utils/utils.py

skyrl-train/skyrl_train/weight_sync/broadcast_strategy.py

skyrl-train/skyrl_train/entrypoints/main_base.py

Address PR review feedback to avoid confusion with existing HTTP endpoint feature: - Rename env var from _SKYRL_USE_HTTP_INFERENCE to _SKYRL_USE_NEW_INFERENCE - Rename _get_http_inference_client() to _get_new_inference_client() - Rename _validate_http_inference_cfg() to _validate_new_inference_cfg() - Reorganize config sections to separate weight sync from new inference config - Clean up conditional logic in broadcast_strategy.py for clearer code paths Co-authored-by: Cursor <cursoragent@cursor.com>

skyrl-train/skyrl_train/config/ppo_base_config.yaml

CharlieFRuan

Thank you so much! Looking forward to feature parity!

skyrl-train/skyrl_train/entrypoints/main_base.py

kouroshHakha added 30 commits January 18, 2026 21:23

v0

40e538b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

common

a52b0dc

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

vllm_server_actor

6d68e2f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

pool

d0d2990

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

d20b4bd

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

group

07f3d9f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

1a48e61

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

e290f4b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

tests

509538f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Wip

555082b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

afcc8de

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Wip

058cb95

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

7c8fc0b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

lint

68dc4ed

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

lint

dce17d2

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

gemini fback

22c12ad

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

05bfc92

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

eca0e3d

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

bdd1d8a

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

9bf4173

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Wip

058d358

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge origin/kh/inference-2 (PR 904) - Add inference_servers module

da8106b

Merge remote-tracking branch 'origin/kh/inference-2b2' into kh/infere…

46c11d8

…nce-3

wip

5e93b49

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge branch 'main' into kh/inference-2

ded5cf2

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

e102ebd

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

6da662e

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Wip

43793df

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

02a7497

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha added 2 commits January 30, 2026 02:59

wip

df888e1

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

0e0b344

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha marked this pull request as ready for review January 31, 2026 19:50

gemini-code-assist bot reviewed Jan 31, 2026

View reviewed changes

CharlieFRuan reviewed Feb 1, 2026

View reviewed changes

CharlieFRuan reviewed Feb 2, 2026

View reviewed changes

skyrl-train/skyrl_train/config/ppo_base_config.yaml Show resolved Hide resolved

CharlieFRuan reviewed Feb 2, 2026

View reviewed changes

skyrl-train/skyrl_train/config/ppo_base_config.yaml Show resolved Hide resolved

Update skyrl-train/skyrl_train/config/ppo_base_config.yaml

1c81bcc

CharlieFRuan approved these changes Feb 2, 2026

View reviewed changes

CharlieFRuan merged commit 52c9085 into NovaSky-AI:main Feb 2, 2026
3 of 4 checks passed

SumanthRH reviewed Feb 2, 2026

View reviewed changes

skyrl-train/skyrl_train/entrypoints/main_base.py Show resolved Hide resolved

kouroshHakha mentioned this pull request Feb 3, 2026

[skyrl][inference] Rollout plan for the new inference backend #1014

Open

30 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N#931

[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N#931
CharlieFRuan merged 34 commits intoNovaSky-AI:mainfrom
kouroshHakha:kh/inference-3

kouroshHakha commented Jan 24, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 31, 2026

Uh oh!

gemini-code-assist bot Jan 31, 2026

Uh oh!

CharlieFRuan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CharlieFRuan left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kouroshHakha commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CharlieFRuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kouroshHakha commented Jan 24, 2026 •

edited

Loading