Skip to content

refactor: update TypedDict to BaseModel (MasterConfig & ClippedPGLossConfig)#2325

Merged
yuki-97 merged 20 commits into
mainfrom
yukih/dataclass
May 14, 2026
Merged

refactor: update TypedDict to BaseModel (MasterConfig & ClippedPGLossConfig)#2325
yuki-97 merged 20 commits into
mainfrom
yukih/dataclass

Conversation

@yuki-97

@yuki-97 yuki-97 commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Related issues

  1. Use dataclass instead of TypedDict for config #1675
  2. Use dataclasses instead of TypedDict to handle defaults (but not #2102

Summary

This PR refactors MasterConfig (in all algorithm entry points: grpo.py, sft.py, dpo.py, rm.py, distillation.py, eval.py) and ClippedPGLossConfig from TypedDict to pydantic.BaseModel.

We can now get four abilities when overriding config fields via Hydra/OmegaConf:

  1. Default values — Fields omitted from the YAML use their Python-defined defaults.
    # disable_ppo_ratio is defined with `= False` in ClippedPGLossConfig
    # but absent from examples/configs/grpo_math_1B.yaml — resolves to False
    loss_config.disable_ppo_ratio  # False
  2. Custom/extra fields — Ad-hoc fields can be injected from the command line without modifying the config class.
    uv run python examples/run_grpo.py ++loss_fn.custom=1
    # cfg.custom is now accessible inside ClippedPGLossFn.__init__
  3. Type validation — Pydantic validates field types at runtime.
    uv run python examples/run_grpo.py ++loss_fn.disable_ppo_ratio=1.2
    # Will raise error: "Input should be a valid boolean [type=bool_type, input_value=1.2, input_type=float]"
  4. Go-to-definition — IDE navigation works on attribute access. Clicking on config.disable_ppo_ratio jumps directly to the field definition in ClippedPGLossConfig, whereas the previous dict-style access (config["disable_ppo_ratio"]) was opaque to static analysis tools.

Changes

  • MasterConfig in grpo.py, sft.py, dpo.py, rm.py, distillation.py, eval.py: TypedDictBaseModel(extra="allow")
  • ClippedPGLossConfig in loss_functions.py: TypedDictBaseModel(extra="allow")
  • All dict-style access (config["key"]) on these two config types updated to attribute access (config.key)
  • skills/config-conventions/SKILL.md updated to reflect the new BaseModel convention

Results before / after the changes

image

@copy-pr-bot

copy-pr-bot Bot commented Apr 23, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yuki-97 yuki-97 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Apr 23, 2026
@yuki-97

yuki-97 commented Apr 23, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test a2ce5d6

@yuki-97 yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) labels Apr 24, 2026
@yuki-97

yuki-97 commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 88ffb9f

@yuki-97

yuki-97 commented May 6, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 73fcbb9

@yuki-97

yuki-97 commented May 6, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 3be0da1

@yuki-97

yuki-97 commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 9ea8a92

@yuki-97 yuki-97 added CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) and removed CI:L1 Run doctests, unit tests, and functional tests labels May 7, 2026
@yuki-97

yuki-97 commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test f6e2623

@yuki-97

yuki-97 commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 3f17dce

yuki-97 added 14 commits May 13, 2026 20:43
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented May 14, 2026

Copy link
Copy Markdown

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@kajalj22

Copy link
Copy Markdown
Contributor

/ok to test 6fd074f

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97

yuki-97 commented May 14, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 92bf84f

Comment thread examples/nemo_gym/run_grpo_nemo_gym.py Outdated
Comment thread nemo_rl/algorithms/grpo.py Outdated
Comment thread nemo_rl/algorithms/rm.py Outdated
Comment thread nemo_rl/algorithms/loss/loss_functions.py
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97

yuki-97 commented May 14, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 8719153

@yuki-97

yuki-97 commented May 14, 2026

Copy link
Copy Markdown
Contributor Author

thanks @RayenTian , good catch! resolved in 8719153, could you help to take a look again?

Comment thread skills/config-conventions/SKILL.md
@RayenTian

Copy link
Copy Markdown
Contributor

Thanks @yuki-97. LGTM! Approved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants