Skip to content

[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N#931

Merged
CharlieFRuan merged 34 commits intoNovaSky-AI:mainfrom
kouroshHakha:kh/inference-3
Feb 2, 2026
Merged

[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N#931
CharlieFRuan merged 34 commits intoNovaSky-AI:mainfrom
kouroshHakha:kh/inference-3

Conversation

@kouroshHakha
Copy link
Collaborator

@kouroshHakha kouroshHakha commented Jan 24, 2026

Summary: Integrates the new inference layer with training code behind a private feature flag _SKYRL_USE_NEW_INFERENCE. When enabled, uses RemoteInferenceClient + ServerGroup + InferenceRouter instead of the legacy Ray actor-based inference. Both code paths remain fully functional; the flag allows gradual rollout and validation.

Key Changes:

  1. Feature Flag (env_vars.py):

    • _SKYRL_USE_NEW_INFERENCE env var (default: 0 = legacy path)
  2. New Config Options (ppo_base_config.yaml):

    • generator.external_proxy_url - External data plane URL (optional)
    • generator.external_server_urls - External control plane URLs (optional)
    • Reorganized config sections to separate weight sync from new inference config
  3. Config Validation (utils.py):

    • _validate_new_inference_cfg() validates config combinations
    • Colocated + external endpoints → Error (must use driver-managed servers)
    • Non-colocated routing logic for various external/internal combinations
  4. Updated get_inference_client() (main_base.py):

    • When flag enabled: Build VLLMServerGroup + InferenceRouter + RemoteInferenceClient
    • When flag disabled: Use legacy InferenceEngineClient (existing behavior)
    • Renamed _get_http_inference_client()_get_new_inference_client()
  5. Weight Sync Integration (worker.py, broadcast_strategy.py, transfer_strategy.py, cuda_ipc_strategy.py):

    • worker.py fetches inference_world_size from client.get_world_size() for new inference path
    • create_init_info() accepts optional inference_world_size parameter
    • Clean separation of code paths with if _SKYRL_USE_NEW_INFERENCE: / else: pattern
  6. API Compatibility (remote_inference_client.py):

    • Renamed init_weight_transferinit_weight_update_communicator
    • Renamed update_weightsupdate_named_weights
    • Added tags parameter to sleep()/wake_up() for colocation

Files Changed:

File Change
env_vars.py Add _SKYRL_USE_NEW_INFERENCE feature flag
ppo_base_config.yaml Add external_proxy_url, external_server_urls; reorganize sections
utils.py Add _validate_new_inference_cfg() with routing logic
main_base.py Update get_inference_client() with _get_new_inference_client()
remote_inference_client.py Rename methods for API compatibility
worker.py Fetch inference_world_size from client for new inference path
broadcast_strategy.py Accept inference_world_size parameter, clean conditional logic
transfer_strategy.py Update base class signature
cuda_ipc_strategy.py Accept (and ignore) inference_world_size parameter

Testing:

# Legacy path (default)
pytest tests/

# New HTTP path
_SKYRL_USE_HTTP_INFERENCE=1 pytest tests/

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Phase 2b-1: Refactors inference client creation into an overridable hook.

Changes:
- Add get_inference_client() -> InferenceEngineInterface hook in BasePPOExp
- Update _setup_trainer() to use the new hook
- Refactor DAPOExp to override get_inference_client() instead of duplicating _setup_trainer()
- Update EvalOnlyEntrypoint.run() to use the hook
- Update TerminalBenchGenerateExp._setup_generator() to use the hook
- Move strategy validation for FlashRL to main() for early failure
- Fix bug: add missing tokenizer arg in DAPOExp remote engines path

This refactor eliminates code duplication and prepares for future
RemoteInferenceClient integration (Phase 2b-2).
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha kouroshHakha marked this pull request as ready for review January 31, 2026 19:50
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively integrates an HTTP-based inference layer behind a feature flag, providing a clear path for gradual rollout. The changes are well-structured, touching configuration, the main entrypoint, client-side logic, and weight synchronization strategies. The addition of config validation for the new HTTP inference path is a great touch. I've identified a potential resource leak regarding the new server group and router resources that are not being torn down, and a minor inconsistency in parameter handling in the remote client. Overall, this is a solid contribution towards modernizing the inference architecture.

Comment on lines +122 to +124
# HTTP inference resources (created lazily when _SKYRL_USE_HTTP_INFERENCE=1)
self._server_group = None
self._inference_router = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The _server_group and _inference_router are initialized here and created in _get_http_inference_client, but there doesn't appear to be corresponding teardown logic to shut them down. This can lead to resource leaks, especially since they manage Ray actors and other resources. Consider adding a teardown method to BasePPOExp that calls shutdown() on these objects, and ensure it's called reliably (e.g., in a finally block within the run method).

Comment on lines +501 to +504
body = {"level": level}
if tags:
body["tags"] = tags
return await self._call_all_servers("/sleep", body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sleep method checks for the tags parameter with if tags:, which evaluates to False for an empty list []. This is inconsistent with the wake_up method, which would correctly handle an empty list. This can lead to unexpected behavior if an empty list of tags is intentionally passed. To ensure consistent and predictable behavior, it's better to check for tags is not None.

Suggested change
body = {"level": level}
if tags:
body["tags"] = tags
return await self._call_all_servers("/sleep", body)
body = {"level": level}
if tags is not None:
body["tags"] = tags
return await self._call_all_servers("/sleep", body)

Copy link
Member

@CharlieFRuan CharlieFRuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Left relatively nit comments, mainly on the feature rollout's naming

Address PR review feedback to avoid confusion with existing HTTP endpoint feature:
- Rename env var from _SKYRL_USE_HTTP_INFERENCE to _SKYRL_USE_NEW_INFERENCE
- Rename _get_http_inference_client() to _get_new_inference_client()
- Rename _validate_http_inference_cfg() to _validate_new_inference_cfg()
- Reorganize config sections to separate weight sync from new inference config
- Clean up conditional logic in broadcast_strategy.py for clearer code paths

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Member

@CharlieFRuan CharlieFRuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much! Looking forward to feature parity!

@CharlieFRuan CharlieFRuan merged commit 52c9085 into NovaSky-AI:main Feb 2, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants