Skip to content

fix: run dynamic sampling on unshaped rewards#2478

Merged
terrykong merged 7 commits into
mainfrom
ashors/fix-dynamic-sampling-reward-shaping
May 21, 2026
Merged

fix: run dynamic sampling on unshaped rewards#2478
terrykong merged 7 commits into
mainfrom
ashors/fix-dynamic-sampling-reward-shaping

Conversation

@ashors1

@ashors1 ashors1 commented May 12, 2026

Copy link
Copy Markdown
Contributor

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

closes #2431

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Signed-off-by: ashors1 <ashors@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented May 12, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ashors1 ashors1 changed the title fix: run dynamic sampling on _unshaped_ rewards fix: run dynamic sampling on unshaped rewards May 12, 2026
Signed-off-by: Anna Shors <ashors@nvidia.com>
@ashors1 ashors1 marked this pull request as ready for review May 19, 2026 21:37
@ashors1 ashors1 requested review from a team as code owners May 19, 2026 21:37
terrykong
terrykong previously approved these changes May 20, 2026
@terrykong terrykong enabled auto-merge (squash) May 20, 2026 05:38
ashors1 added 2 commits May 20, 2026 14:04
…om:NVIDIA-NeMo/RL into ashors/fix-dynamic-sampling-reward-shaping
@ashors1 ashors1 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label May 20, 2026
Signed-off-by: ashors1 <ashors@nvidia.com>
@ashors1 ashors1 requested a review from a team as a code owner May 20, 2026 21:15
@github-actions github-actions Bot added the Documentation Improvements or additions to documentation label May 20, 2026
terrykong
terrykong previously approved these changes May 20, 2026
@ashors1 ashors1 added CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) and removed CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) labels May 20, 2026
@ashors1

ashors1 commented May 21, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 9c7f619

Signed-off-by: ashors1 <ashors@nvidia.com>
@ashors1

ashors1 commented May 21, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test bd1ab6e

Signed-off-by: ashors1 <ashors@nvidia.com>
@ashors1

ashors1 commented May 21, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test fee9a4a

@terrykong terrykong merged commit fc60573 into main May 21, 2026
37 checks passed
@terrykong terrykong deleted the ashors/fix-dynamic-sampling-reward-shaping branch May 21, 2026 22:58
yfw pushed a commit that referenced this pull request May 27, 2026
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) Documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DAPO dynamic sampling filters on shaped reward instead of raw task metric

2 participants