Skip to content

Refactor train/play and create uv run workflow without dedicated virtual environments#5623

Merged
kellyguo11 merged 8 commits into
isaac-sim:developfrom
StafaH:mh/uv_train
May 15, 2026
Merged

Refactor train/play and create uv run workflow without dedicated virtual environments#5623
kellyguo11 merged 8 commits into
isaac-sim:developfrom
StafaH:mh/uv_train

Conversation

@StafaH

@StafaH StafaH commented May 15, 2026

Copy link
Copy Markdown
Contributor

Description

This PR refactors the reinforcement learning train/play scripts into unified entry points while preserving the existing library folder structure and adding a lightweight uv workflow for fresh source checkouts.

The main changes are:

  • Added unified RL entrypoints:
    • scripts/reinforcement_learning/train.py --library <library>
    • scripts/reinforcement_learning/play.py --library <library>
  • Added library-specific implementation files under the existing library folders, for example:
    • scripts/reinforcement_learning/rsl_rl/train_rsl_rl.py
    • scripts/reinforcement_learning/rsl_rl/play_rsl_rl.py
  • Kept the old per-library train.py and play.py scripts intact, with deprecation warnings and migration examples.
  • Added shared RL entrypoint utilities in scripts/reinforcement_learning/common.py.
  • Added direct Isaac Lab CLI commands:
    • ./isaaclab.sh train --library <library> ...
    • ./isaaclab.sh play --library <library> ...
  • Kept bare script aliases for ./isaaclab.sh -p train.py ... and ./isaaclab.sh -p play.py ....
  • Added Python package entry points so installed environments can run:
    • train --library <library> ...
    • play --library <library> ...
  • Added a root source-checkout pyproject.toml project so a fresh clone can run kitless Newton training with:
    • uv run train --library rsl_rl --task Isaac-Cartpole-Direct-v0 presets=newton_mjwarp --num_envs 4096
  • Pinned the source-checkout uv environment to the same PyTorch family used by the Isaac Lab installer to avoid CUDA stack churn when users switch between uv run and ./isaaclab.sh.
  • Updated docs, tests, tools, and pretrained checkpoint helpers to use the unified train/play entrypoints.
  • Fixed CLI Python discovery so ./isaaclab.sh prefers an active or repo-local virtual environment before falling back to system Python.
  • Fixed ./isaaclab.sh --install in uv-created virtual environments that do not include the pip module by using uv pip for venv-targeted pip operations.

Motivation:

The previous RL scripts duplicated substantial train/play setup logic across libraries. This made behavior harder to keep consistent and increased maintenance cost when updating shared functionality. The new structure keeps library-specific logic in each library folder while centralizing shared dispatch and common helpers.

The uv workflow gives users a fast path from a fresh clone to kitless Newton training without manually creating an environment first. Isaac Sim / Kit workflows, including PhysX, continue to use the existing full installation path.

Dependencies:

No new required runtime dependencies are added to Isaac Lab packages. The root source-checkout pyproject.toml describes the local development environment used by uv run.

Fixes # N/A

Type of change

  • New feature (non-breaking change which adds functionality)
  • Documentation update

Screenshots

Not applicable.

Validation

Ran:

  • uv lock
  • uv lock --check
  • uv run --frozen train --help
  • ./isaaclab.sh --install
  • ./isaaclab.sh -p -m pytest source/isaaclab/test/cli/test_install.py -q
  • ./isaaclab.sh -p -m pytest source/isaaclab/test/cli/test_install.py::TestGetPipCommand source/isaaclab/test/cli/test_install.py::TestExtractPythonExe -q
  • Help smoke tests for unified train/play entrypoints and ./isaaclab.sh -p train.py --help / ./isaaclab.sh -p play.py --help

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@StafaH StafaH requested a review from ooctipus as a code owner May 15, 2026 02:35
@github-actions github-actions Bot added isaac-lab Related to Isaac Lab team infrastructure labels May 15, 2026

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR delivers a well-structured refactoring of the RL train/play scripts into unified entrypoints with library-specific dispatch, while adding a convenient uv run workflow for Newton training. The architecture maintains backward compatibility through deprecation warnings on legacy scripts.

Design Assessment

Architecture: Well-designed

The dispatch pattern in common.py via dispatch_library_entrypoint() cleanly separates concerns:

  • Unified entrypoints (train.py, play.py) handle library selection
  • Library-specific implementations (train_rsl_rl.py, etc.) contain RL-specific logic
  • Shared utilities reduce code duplication across libraries

The pyproject.toml changes for uv support are comprehensive, with proper platform markers for PyTorch CUDA indices.

Findings

🔴 Critical (1)

File Issue
scripts/reinforcement_learning/rsl_rl/train.py:53 RSL-RL version downgrade: Changed from RSL_RL_VERSION = "5.0.1" to "3.0.1". This appears to be a significant regression that conflicts with the new train_rsl_rl.py which also specifies 3.0.1. Please clarify if this is intentional and update both files consistently.

🟡 Medium (2)

File Issue
pyproject.toml:270 Missing newline at EOF: The file ends without a trailing newline, which will cause pre-commit to fail (as seen in CI).
scripts/reinforcement_learning/common.py:57-67 Exception handling in dispatch_library_entrypoint: When run_as_script=True, exceptions from runpy.run_path() propagate without cleanup of modified sys.argv/sys.path. Consider wrapping in try/except to ensure cleanup.

🟢 Minor (2)

File Issue
source/isaaclab/isaaclab/cli/__init__.py:35-36 Missing return codes: train() and play() functions call run_python_command() but don't return its exit code to the caller.
scripts/reinforcement_learning/common.py:168 Docstring accuracy: import_local_module docstring could mention that ImportError is raised on failure to help callers handle errors appropriately.

Test Coverage

The PR includes manual validation steps but lacks automated tests for the new unified entrypoints. Consider adding:

  • Unit tests for dispatch_library_entrypoint() with mocked library modules
  • Integration tests verifying deprecation warnings are emitted from legacy scripts

Existing tests in test_install.py are run and pass, which validates the Python discovery improvements.

CI Status

Check Status
pre-commit ❌ Failing (likely due to missing EOF newline)
Check changelog fragments ❌ Failing
Build Wheel ✅ Passing
Installation Tests ✅ Passing
license-check ⏳ Pending

Verdict

Minor fixes needed

The core refactoring is solid and well-designed. Please address:

  1. The RSL-RL version discrepancy (clarify or fix)
  2. Add trailing newline to pyproject.toml
  3. Add changelog fragment per CI requirements

Once these are resolved, this is ready to ship.


Update (ba80967):
RSL-RL version fixed - Both files now correctly use 5.0.1
Changelog fragment added - CI now passes
⚠️ pre-commit still failing - pyproject.toml EOF newline issue persists

Additional changes in this commit:

  • --rl_library is now required=True (good change - explicit is better than implicit)
  • Added uv_run.rst documentation for the experimental workflow
  • Updated deprecation warning in sb3/train.py to reflect new command format

Remaining: Fix the trailing newline in pyproject.toml to pass pre-commit.


Update (ac13a40):
✅ Documentation improvement: TOC entry now shows "uv run (experimental)" with clearer display name

⚠️ pyproject.toml EOF newline still not fixed - pre-commit CI continues to fail. This is the last remaining blocker.


Update (9188599):
EOF newline fixed - pyproject.toml now has proper trailing newline. Pre-commit should pass now.

🎉 All issues addressed! This PR is ready to merge.


Update (ca0a704):
New commits add release preparation changes and significant improvements to the LEAPP export flow:

Changes reviewed:

  • LEAPP export.py refactored - The export script now exposes a modular API (parse_export_args(), export_rsl_rl_agent(), run_export_with_hydra(), main_cli()) instead of module-level initialization. This is a well-designed change that:
    • Enables test batching (multiple exports per Kit process)
    • Defers runtime imports until after Isaac Sim launches
    • Improves cleanup with proper try/finally blocks and contextlib.suppress()
  • Test improvements - test_rsl_rl_export_flow.py now batches tasks to reduce CI runtime; pure utility tests (test_noise.py, test_wrench_composer.py) no longer require Isaac Sim
  • Import path fixes - Changed from isaaclab.utils import configclass to from isaaclab.utils.configclass import configclass in physics configs (avoids lazy-load issues)
  • CI action enhancement - Added LEAPP package resolution logging and unset HUB__ARGS__DETECT_ONLY for OmniHub reliability
  • Changelog consolidation - Merged changelog.d fragments into CHANGELOG.rst with version bumps (isaaclab 5.2.1, isaaclab_newton 0.9.1, isaaclab_physx 0.7.1)

No new issues found. This is solid release/CI preparation work.

@greptile-apps

greptile-apps Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR refactors the reinforcement learning train/play scripts into unified entrypoints (scripts/reinforcement_learning/train.py and play.py) that dispatch to library-specific implementations, adds a uv run source-checkout workflow via a new root pyproject.toml, and improves isaaclab.sh Python discovery and pip command handling.

  • Unified dispatch — new common.py helpers route --rl_library <lib> to per-library train_<lib>.py / play_<lib>.py files; old per-library train.py/play.py emit DeprecationWarning and remain functional.
  • uv workflow — root pyproject.toml gains a [project] table with pinned PyTorch wheels indexed by CUDA version and platform, enabling uv run train from a fresh clone.
  • CLI improvementsextract_python_exe now prefers an active or repo-local venv before falling back to kit Python; get_pip_command sets UV_PYTHON for venv-targeted uv pip installs.

Confidence Score: 3/5

The refactoring is well-structured overall, but two defects in the new canonical files will break real workflows before the fix.

The train_rl_games.py file imports distutils.util.strtobool, which is absent from Python 3.12+ — the minimum version this project targets — so any attempt to train with RL-Games via the new unified entrypoint will fail immediately on import. Separately, rsl_rl/train.py has its minimum RSL-RL version silently rolled back from 5.0.1 to 3.0.1 as a side effect of the refactor, bypassing a version guard that was there for a reason. The rest of the PR — the dispatch mechanism, uv workflow, CLI Python discovery, and pip improvements — looks solid and ready.

scripts/reinforcement_learning/rl_games/train_rl_games.py (distutils import) and scripts/reinforcement_learning/rsl_rl/train.py (RSL_RL_VERSION change) need attention before merge.

Important Files Changed

Filename Overview
scripts/reinforcement_learning/rl_games/train_rl_games.py New canonical RL-Games training implementation; imports distutils.util.strtobool which is removed in Python 3.12+, causing an immediate ImportError.
scripts/reinforcement_learning/rsl_rl/train.py Deprecated wrapper for RSL-RL training; inadvertently downgrades RSL_RL_VERSION from 5.0.1 to 3.0.1 beyond the stated change of adding only a deprecation warning.
scripts/reinforcement_learning/common.py New shared dispatch utilities; dead code in dispatch_library_entrypoint (the rl_library is None help branch never fires due to default="rsl_rl").
scripts/reinforcement_learning/sb3/train.py Deprecated SB3 training wrapper; deprecation message uses the old verbose form instead of the short ./isaaclab.sh train form used by all other libraries.
source/isaaclab/isaaclab/cli/utils.py Improves Python discovery and updates get_pip_command to detect venv Python for uv pip even when not activated; logic looks correct.
source/isaaclab/isaaclab/cli/init.py Adds train/play console-script entry points and CLI shortcuts; early dispatch logic is clean and correct.
pyproject.toml Adds [project] table for the uv run source-checkout workflow with pinned PyTorch family and per-platform CUDA index routing; looks well-structured.
source/isaaclab/setup.py Adds train, play, and isaaclab console scripts as package entry points; straightforward and correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["./isaaclab.sh train or 'train' console script"] --> B["cli/__init__.py train()"]
    B --> C["run_python_command(train.py, args)"]
    C --> D["scripts/reinforcement_learning/train.py main()"]
    D --> E["dispatch_library_entrypoint() parse --rl_library"]
    E -->|run_as_script=False| F["import_local_module() train_lib.py"]
    F --> G["module.run(library_args)"]
    A2["./isaaclab.sh play or 'play' console script"] --> B2["cli/__init__.py play()"]
    B2 --> C2["run_python_command(play.py, args)"]
    C2 --> D2["scripts/reinforcement_learning/play.py main()"]
    D2 --> E2["dispatch_library_entrypoint() parse --rl_library"]
    E2 -->|run_as_script=True| F2["runpy.run_path() play_lib.py"]
Loading

Comments Outside Diff (1)

  1. scripts/reinforcement_learning/rl_games/train_rl_games.py, line 750 (link)

    P1 Python 3.12 incompatibility: distutils removed

    distutils was removed from the standard library in Python 3.12 (PEP 594), yet pyproject.toml declares requires-python = ">=3.12". Loading this module will immediately raise ModuleNotFoundError: No module named 'distutils'. A suitable drop-in is a small inline helper — for example type=lambda x: x.lower() in ("1", "true", "yes") — or setuptools' re-export of distutils if setuptools is guaranteed to be present.

Reviews (1): Last reviewed commit: "Add train play entrypoints" | Re-trigger Greptile

import isaaclab_tasks_experimental # noqa: F401

RSL_RL_VERSION = "5.0.1"
RSL_RL_VERSION = "3.0.1"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unintended version downgrade in deprecated script

The PR description says deprecated scripts are kept intact (only a deprecation warning is added), but RSL_RL_VERSION is also changed here from 5.0.1 down to 3.0.1. Users who still invoke this deprecated script will no longer receive a version check error when running an incompatible RSL-RL version.

Suggested change
RSL_RL_VERSION = "3.0.1"
RSL_RL_VERSION = "5.0.1"

Comment on lines +204 to +213

def configure_io_descriptors(env_cfg: Any, args_cli: argparse.Namespace, logger: logging.Logger) -> None:
"""Configure IO descriptor export on supported environment configs.

Args:
env_cfg: Isaac Lab environment config.
args_cli: Parsed command-line arguments.
logger: Logger used for unsupported environment warnings.
"""
if isinstance(env_cfg, ManagerBasedRLEnvCfg):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead code: args_cli.rl_library is None can never be true

--rl_library is declared with default="rsl_rl", so parse_known_args always produces a non-None value and the if args_cli.rl_library is None branch is unreachable. A user who runs ./isaaclab.sh train without --rl_library silently falls through to rsl_rl instead of seeing the unified help page. Removing the default (or using default=None) would let the help branch trigger.

Comment on lines +11 to +17
warnings.warn(
"scripts/reinforcement_learning/sb3/train.py is deprecated. Use "
"`./isaaclab.sh -p scripts/reinforcement_learning/train.py --rl_library sb3 --task <TASK>` instead. "
"Example: `./isaaclab.sh -p scripts/reinforcement_learning/train.py --rl_library sb3 --task Isaac-Cartpole-v0`.",
DeprecationWarning,
stacklevel=1,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent deprecation message format

All other per-library train.py deprecation warnings direct users to ./isaaclab.sh train --rl_library <lib>, but this one references the older form ./isaaclab.sh -p scripts/reinforcement_learning/train.py --rl_library sb3. This is confusing for users migrating from sb3.

Suggested change
warnings.warn(
"scripts/reinforcement_learning/sb3/train.py is deprecated. Use "
"`./isaaclab.sh -p scripts/reinforcement_learning/train.py --rl_library sb3 --task <TASK>` instead. "
"Example: `./isaaclab.sh -p scripts/reinforcement_learning/train.py --rl_library sb3 --task Isaac-Cartpole-v0`.",
DeprecationWarning,
stacklevel=1,
)
warnings.warn(
"scripts/reinforcement_learning/sb3/train.py is deprecated. Use "
"`./isaaclab.sh train --rl_library sb3 --task <TASK>` instead. "
"Example: `./isaaclab.sh train --rl_library sb3 --task Isaac-Cartpole-v0`.",
DeprecationWarning,
stacklevel=1,
)

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 15, 2026
@kellyguo11 kellyguo11 moved this to In review in Isaac Lab May 15, 2026
@kellyguo11 kellyguo11 moved this from In review to Ready to merge in Isaac Lab May 15, 2026
@kellyguo11 kellyguo11 merged commit 7015be2 into isaac-sim:develop May 15, 2026
33 of 34 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to merge to Done in Isaac Lab May 15, 2026
hujc7 added a commit to hujc7/IsaacLab that referenced this pull request May 21, 2026
The typed preset selectors (physics=NAME, renderer=NAME, presets=NAME) from
isaac-sim#5587 fail on the unified `scripts/reinforcement_learning/train.py` dispatcher
with `Key 'physics' is not in struct`, because the new
`train_<library>.py` / `play_<library>.py` scripts introduced by isaac-sim#5623
forward the argparse remainder straight to Hydra without folding the typed
tokens into the canonical `presets=<csv>` form. They also don't register the
preset-selection help text, so `--help` output doesn't list the syntax.

isaac-sim#5623 landed at 2026-05-15 13:03 PDT, isaac-sim#5587 at 14:31 PDT the same day.
isaac-sim#5587's diff applied `setup_preset_cli` + `fold_preset_tokens` to the older
`*/train.py` / `*/play.py` scripts that existed when the PR was opened;
the post-isaac-sim#5623 rebase pulled in the new entrypoint files but didn't apply
the same integration because they were not in the PR's diff context.

Apply the canonical pattern from `isaaclab_tasks.utils.preset_cli` to the 4
unified train scripts and the 4 unified play scripts. `setup_preset_cli`
registers the preset-selection help text on each library's parser and runs
`parse_known_args`; `fold_preset_tokens` rewrites the typed selectors into
the canonical form post-argparse. Both help-text discovery and the fold
work whether invoked through the unified dispatcher or directly.

`train_rlinf.py` / `play_rlinf.py` are out of scope -- they manage their own
GlobalHydra instance and have never been integrated with the typed-preset
CLI.

Verified locally on develop:

  python scripts/reinforcement_learning/train.py --rl_library rsl_rl \
      --task=Isaac-Ant-v0 physics=newton_mjwarp --headless

reproduces the "Key 'physics' is not in struct" error before the fix and
launches Newton training successfully after. Same command via direct
invocation (`python scripts/reinforcement_learning/rsl_rl/play_rsl_rl.py
... physics=newton_mjwarp ...`) and `--help` discoverability both verified.
kellyguo11 pushed a commit that referenced this pull request May 22, 2026
…y scripts (#5715) (#5724)

Cherry-pick of #5715 to `release/3.0.0-beta2`.

## 1. Summary

- Typed preset selectors (`physics=NAME`, `renderer=NAME`,
`presets=NAME`) from #5587 silently fail on the unified
`scripts/reinforcement_learning/train.py` dispatcher with `Key 'physics'
is not in struct`.
- Root cause is merge-ordering between #5587 (preset CLI, 2026-05-15
14:31 PDT) and #5623 (unified train/play refactor, 2026-05-15 13:03
PDT): the new `train_<library>.py` / `play_<library>.py` files were
authored without the `setup_preset_cli` + `fold_preset_tokens`
integration that #5587 had applied to the older scripts.
- Wires the integration into the 4 unified train scripts and the 4
unified play scripts. Adds a regression test
(`scripts/reinforcement_learning/test/test_typed_preset_cli_train_play.py`)
that drives `physics=does_not_exist` through the unified dispatcher for
all 8 entrypoints and asserts the Hydra struct error is absent and the
resolver's `Unknown preset(s)` error is present.

## 2. Cherry-picked commits

| Commit on `develop` | Cherry-pick |
|---|---|
| `4e9ebc75595` – Wire preset CLI into unified train/play entrypoints |
`debaf15404b` |
| `0dc06846b18` – Add subprocess test that typed preset selectors reach
the resolver | `c239fbf2e27` |

No conflicts; clean cherry-pick.

## 3. Test plan

- [x] Repro confirmed before the fix.
- [x] All 4 train scripts smoke-tested with `physics=newton_mjwarp` on
Isaac-Ant-v0 — each reaches Newton sim init (no Hydra struct error).
Both invocation paths (dispatcher + direct) verified.
- [x] All 4 play scripts smoke-tested — typed selector resolves;
downstream failure is the expected fake-checkpoint `FileNotFoundError`.
- [x] `--help` output includes `physics=NAME` / `renderer=NAME` /
`presets=NAME[,NAME,...]` with task variant lists.
- [x] Invalid value (`physics=does_not_exist`) produces a clear `Unknown
preset(s)` error from the resolver.
- [x]
`scripts/reinforcement_learning/test/test_typed_preset_cli_train_play.py`
— 8 parametrized cases pass in ~20 s on CPU.
- [x] Pre-commit clean.
- [ ] CI green on `Docker + Tests`.

## 4. Out of scope

`train_rlinf.py` / `play_rlinf.py` — manage their own `GlobalHydra`
instance, never integrated with the typed preset CLI. Adding support is
a feature, not a regression fix.

## 5. Related

- Develop PR: #5715
- Source PR for the typed preset CLI: #5587
- Source PR for the unified train/play refactor: #5623
@StafaH StafaH deleted the mh/uv_train branch May 28, 2026 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation infrastructure isaac-lab Related to Isaac Lab team

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants