[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N#931
[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N#931CharlieFRuan merged 34 commits intoNovaSky-AI:mainfrom
Conversation
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Phase 2b-1: Refactors inference client creation into an overridable hook. Changes: - Add get_inference_client() -> InferenceEngineInterface hook in BasePPOExp - Update _setup_trainer() to use the new hook - Refactor DAPOExp to override get_inference_client() instead of duplicating _setup_trainer() - Update EvalOnlyEntrypoint.run() to use the hook - Update TerminalBenchGenerateExp._setup_generator() to use the hook - Move strategy validation for FlashRL to main() for early failure - Fix bug: add missing tokenizer arg in DAPOExp remote engines path This refactor eliminates code duplication and prepares for future RemoteInferenceClient integration (Phase 2b-2).
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request effectively integrates an HTTP-based inference layer behind a feature flag, providing a clear path for gradual rollout. The changes are well-structured, touching configuration, the main entrypoint, client-side logic, and weight synchronization strategies. The addition of config validation for the new HTTP inference path is a great touch. I've identified a potential resource leak regarding the new server group and router resources that are not being torn down, and a minor inconsistency in parameter handling in the remote client. Overall, this is a solid contribution towards modernizing the inference architecture.
| # HTTP inference resources (created lazily when _SKYRL_USE_HTTP_INFERENCE=1) | ||
| self._server_group = None | ||
| self._inference_router = None |
There was a problem hiding this comment.
The _server_group and _inference_router are initialized here and created in _get_http_inference_client, but there doesn't appear to be corresponding teardown logic to shut them down. This can lead to resource leaks, especially since they manage Ray actors and other resources. Consider adding a teardown method to BasePPOExp that calls shutdown() on these objects, and ensure it's called reliably (e.g., in a finally block within the run method).
| body = {"level": level} | ||
| if tags: | ||
| body["tags"] = tags | ||
| return await self._call_all_servers("/sleep", body) |
There was a problem hiding this comment.
The sleep method checks for the tags parameter with if tags:, which evaluates to False for an empty list []. This is inconsistent with the wake_up method, which would correctly handle an empty list. This can lead to unexpected behavior if an empty list of tags is intentionally passed. To ensure consistent and predictable behavior, it's better to check for tags is not None.
| body = {"level": level} | |
| if tags: | |
| body["tags"] = tags | |
| return await self._call_all_servers("/sleep", body) | |
| body = {"level": level} | |
| if tags is not None: | |
| body["tags"] = tags | |
| return await self._call_all_servers("/sleep", body) |
CharlieFRuan
left a comment
There was a problem hiding this comment.
Thank you! Left relatively nit comments, mainly on the feature rollout's naming
Address PR review feedback to avoid confusion with existing HTTP endpoint feature: - Rename env var from _SKYRL_USE_HTTP_INFERENCE to _SKYRL_USE_NEW_INFERENCE - Rename _get_http_inference_client() to _get_new_inference_client() - Rename _validate_http_inference_cfg() to _validate_new_inference_cfg() - Reorganize config sections to separate weight sync from new inference config - Clean up conditional logic in broadcast_strategy.py for clearer code paths Co-authored-by: Cursor <cursoragent@cursor.com>
CharlieFRuan
left a comment
There was a problem hiding this comment.
Thank you so much! Looking forward to feature parity!
Summary: Integrates the new inference layer with training code behind a private feature flag
_SKYRL_USE_NEW_INFERENCE. When enabled, usesRemoteInferenceClient+ServerGroup+InferenceRouterinstead of the legacy Ray actor-based inference. Both code paths remain fully functional; the flag allows gradual rollout and validation.Key Changes:
Feature Flag (
env_vars.py):_SKYRL_USE_NEW_INFERENCEenv var (default:0= legacy path)New Config Options (
ppo_base_config.yaml):generator.external_proxy_url- External data plane URL (optional)generator.external_server_urls- External control plane URLs (optional)Config Validation (
utils.py):_validate_new_inference_cfg()validates config combinationsUpdated
get_inference_client()(main_base.py):VLLMServerGroup+InferenceRouter+RemoteInferenceClientInferenceEngineClient(existing behavior)_get_http_inference_client()→_get_new_inference_client()Weight Sync Integration (
worker.py,broadcast_strategy.py,transfer_strategy.py,cuda_ipc_strategy.py):worker.pyfetchesinference_world_sizefromclient.get_world_size()for new inference pathcreate_init_info()accepts optionalinference_world_sizeparameterif _SKYRL_USE_NEW_INFERENCE:/else:patternAPI Compatibility (
remote_inference_client.py):init_weight_transfer→init_weight_update_communicatorupdate_weights→update_named_weightstagsparameter tosleep()/wake_up()for colocationFiles Changed:
env_vars.py_SKYRL_USE_NEW_INFERENCEfeature flagppo_base_config.yamlexternal_proxy_url,external_server_urls; reorganize sectionsutils.py_validate_new_inference_cfg()with routing logicmain_base.pyget_inference_client()with_get_new_inference_client()remote_inference_client.pyworker.pyinference_world_sizefrom client for new inference pathbroadcast_strategy.pyinference_world_sizeparameter, clean conditional logictransfer_strategy.pycuda_ipc_strategy.pyinference_world_sizeparameterTesting: