feat: Auto research skill by vinhngx · Pull Request #2419 · NVIDIA-NeMo/RL

vinhngx · 2026-05-06T02:23:57Z

What does this PR do ?

This PR adds an auto research skill that guides agents on how to do a prolonged research session with Nemo-RL and Nemo-gym. It sets some operating guidelines on how to form and test hypotheses, how to organize git branches, how to monitor and report progress, and how to explicitly check for the stopping conditions of the campaign.

Issues

N/A

Usage

You can prompt Codex, such as:

Use the @skill/auto_research skill and train the Qwen-3-VL-2B-instruct model to high accuracy in the Nemo-gym circle click environment. Time budget: 5h

For this skill to be effective, Codex should have sufficient knowledge of the local operating environment (e.g. Slurm or local machine). A prerequisite to using the auto research skill is therefore, for the agent to be able to automatically run a baseline workload on the given environment.

copy-pr-bot · 2026-05-06T02:24:01Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

terrykong · 2026-05-07T20:11:39Z

/claude review

terrykong

Review: PR #2419 — feat: Auto research skill

Nice work — this skill provides a well-structured framework for running iterative RL experiments with git as the experiment journal. The exploration-ideas guide and git-workflow reference are thorough and practical. The safety guardrails in git-workflow.md (no stash/reset/overwrite without consent) are particularly good.

A few suggestions to align with existing repo conventions and improve consistency:

Directory naming mismatch

All other skill directories use hyphens (build-and-dependency, config-conventions, launch-nemo-rl, etc.), but this one uses an underscore (auto_research). The frontmatter name field is auto-research (with hyphen), creating an inconsistency. Consider renaming the directory to auto-research/ to match the convention.

Nemo-gym coverage

The PR description mentions guiding agents on research with "Nemo-RL and Nemo-gym", but the SKILL.md workflow (step 3) only references NeMo-RL paths (examples/run_grpo.py, nemo_rl/models/, etc.). The Nemo-gym entrypoints (examples/nemo_gym/) are not mentioned. Consider either adding Nemo-gym paths to the workflow or adjusting the PR description to match the actual scope.

See inline comments for additional suggestions.

Generated by Claude Code

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

vinhngx · 2026-05-12T01:17:11Z

Thanks, @terrykong, for the prompt review. Fixed the reported issues and tightened all 3 skills. Add best practices and gotchas observed with Codex, but could happen with other agents.

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

yuki-97 · 2026-05-19T12:22:24Z

/ok to test 9880d5e

chtruong814 · 2026-05-20T12:37:46Z

/ok to test 39de685

NVIDIA-NeMo#2419 workaround) Automodel's _restore_loaded_model_dtype (HF/force_hf load path) re-casts loaded params back to the bf16 checkpoint dtype, silently undoing NeMo-RL's intended torch_dtype=float32 master-weight load. With bf16 master weights, AdamW updates underflow and the policy never learns: grpo-nano-v2-12b reward[30] stuck ~0.18 (vs ~0.54) and sft-nanov3-30BA3B loss plateaus. Only force_hf models (NemotronH nano-v2/nano-v3) are affected; custom-impl models (gemma4, Llama) load via the DCP copy path that preserves fp32. Add _disable_automodel_checkpoint_dtype_restore() to no-op that restore before from_pretrained so the requested fp32 is honored. Validated: nano-v2-12b reward[30] 0.176 -> 0.541 PASS; nanov3-30BA3B-lora loss[20] 2.027 PASS. This is temporary until the automodel pin includes NVIDIA-NeMo/Automodel#2419 (rewrites _restore_loaded_model_dtype to honor an explicit torch_dtype). Add an obsolescence tripwire test that fails when NVIDIA-NeMo#2419 lands so the workaround is removed timely, plus an analogous tripwire for the existing Qwen-VL vision-tower key-mapping workaround (fires when transformers #45358 / >=5.6 reaches the pin). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

The automodel fp32-master-weight tripwire test (test_automodel_dtype_restore_workaround_still_needed) failed in CI as a false positive. _disable_automodel_checkpoint_dtype_restore() globally and irreversibly replaces _restore_loaded_model_dtype with a no-op; earlier setup_model_and_optimizer tests in the same process leave that no-op installed, so the tripwire exercised the no-op (which preserves fp32) instead of Automodel's real downgrading function. Stash the original on the no-op and have the test recover it via _nrl_original. Also pass requested_dtype=fp32 to the function when its signature accepts it, so the tripwire actually fires once Automodel NVIDIA-NeMo#2419 is pinned: the rewritten function honors the explicit fp32 request only via that new parameter (promote_types), not via hf_config/load_kwargs. Correct the Skywork reward baseline (-5.4062 -> -5.2500) to the value the CI build produces (also the historical pre-refresh value); the incorrect-answer score is sensitive to the transformers/torch/kernel build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

NVIDIA-NeMo#2419 workaround) Automodel's _restore_loaded_model_dtype (HF/force_hf load path) re-casts loaded params back to the bf16 checkpoint dtype, silently undoing NeMo-RL's intended torch_dtype=float32 master-weight load. With bf16 master weights, AdamW updates underflow and the policy never learns: grpo-nano-v2-12b reward[30] stuck ~0.18 (vs ~0.54) and sft-nanov3-30BA3B loss plateaus. Only force_hf models (NemotronH nano-v2/nano-v3) are affected; custom-impl models (gemma4, Llama) load via the DCP copy path that preserves fp32. Add _disable_automodel_checkpoint_dtype_restore() to no-op that restore before from_pretrained so the requested fp32 is honored. Validated: nano-v2-12b reward[30] 0.176 -> 0.541 PASS; nanov3-30BA3B-lora loss[20] 2.027 PASS. This is temporary until the automodel pin includes NVIDIA-NeMo/Automodel#2419 (rewrites _restore_loaded_model_dtype to honor an explicit torch_dtype). Add an obsolescence tripwire test that fails when NVIDIA-NeMo#2419 lands so the workaround is removed timely, plus an analogous tripwire for the existing Qwen-VL vision-tower key-mapping workaround (fires when transformers #45358 / >=5.6 reaches the pin). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

The automodel fp32-master-weight tripwire test (test_automodel_dtype_restore_workaround_still_needed) failed in CI as a false positive. _disable_automodel_checkpoint_dtype_restore() globally and irreversibly replaces _restore_loaded_model_dtype with a no-op; earlier setup_model_and_optimizer tests in the same process leave that no-op installed, so the tripwire exercised the no-op (which preserves fp32) instead of Automodel's real downgrading function. Stash the original on the no-op and have the test recover it via _nrl_original. Also pass requested_dtype=fp32 to the function when its signature accepts it, so the tripwire actually fires once Automodel NVIDIA-NeMo#2419 is pinned: the rewritten function honors the explicit fp32 request only via that new parameter (promote_types), not via hf_config/load_kwargs. Correct the Skywork reward baseline (-5.4062 -> -5.2500) to the value the CI build produces (also the historical pre-refresh value); the incorrect-answer score is sensitive to the transformers/torch/kernel build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

NVIDIA-NeMo#2419 workaround) Automodel's _restore_loaded_model_dtype (HF/force_hf load path) re-casts loaded params back to the bf16 checkpoint dtype, silently undoing NeMo-RL's intended torch_dtype=float32 master-weight load. With bf16 master weights, AdamW updates underflow and the policy never learns: grpo-nano-v2-12b reward[30] stuck ~0.18 (vs ~0.54) and sft-nanov3-30BA3B loss plateaus. Only force_hf models (NemotronH nano-v2/nano-v3) are affected; custom-impl models (gemma4, Llama) load via the DCP copy path that preserves fp32. Add _disable_automodel_checkpoint_dtype_restore() to no-op that restore before from_pretrained so the requested fp32 is honored. Validated: nano-v2-12b reward[30] 0.176 -> 0.541 PASS; nanov3-30BA3B-lora loss[20] 2.027 PASS. This is temporary until the automodel pin includes NVIDIA-NeMo/Automodel#2419 (rewrites _restore_loaded_model_dtype to honor an explicit torch_dtype). Add an obsolescence tripwire test that fails when NVIDIA-NeMo#2419 lands so the workaround is removed timely, plus an analogous tripwire for the existing Qwen-VL vision-tower key-mapping workaround (fires when transformers #45358 / >=5.6 reaches the pin). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

The automodel fp32-master-weight tripwire test (test_automodel_dtype_restore_workaround_still_needed) failed in CI as a false positive. _disable_automodel_checkpoint_dtype_restore() globally and irreversibly replaces _restore_loaded_model_dtype with a no-op; earlier setup_model_and_optimizer tests in the same process leave that no-op installed, so the tripwire exercised the no-op (which preserves fp32) instead of Automodel's real downgrading function. Stash the original on the no-op and have the test recover it via _nrl_original. Also pass requested_dtype=fp32 to the function when its signature accepts it, so the tripwire actually fires once Automodel NVIDIA-NeMo#2419 is pinned: the rewritten function honors the explicit fp32 request only via that new parameter (promote_types), not via hf_config/load_kwargs. Correct the Skywork reward baseline (-5.4062 -> -5.2500) to the value the CI build produces (also the historical pre-refresh value); the incorrect-answer score is sensitive to the transformers/torch/kernel build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

NVIDIA-NeMo#2419 workaround) Automodel's _restore_loaded_model_dtype (HF/force_hf load path) re-casts loaded params back to the bf16 checkpoint dtype, silently undoing NeMo-RL's intended torch_dtype=float32 master-weight load. With bf16 master weights, AdamW updates underflow and the policy never learns: grpo-nano-v2-12b reward[30] stuck ~0.18 (vs ~0.54) and sft-nanov3-30BA3B loss plateaus. Only force_hf models (NemotronH nano-v2/nano-v3) are affected; custom-impl models (gemma4, Llama) load via the DCP copy path that preserves fp32. Add _disable_automodel_checkpoint_dtype_restore() to no-op that restore before from_pretrained so the requested fp32 is honored. Validated: nano-v2-12b reward[30] 0.176 -> 0.541 PASS; nanov3-30BA3B-lora loss[20] 2.027 PASS. This is temporary until the automodel pin includes NVIDIA-NeMo/Automodel#2419 (rewrites _restore_loaded_model_dtype to honor an explicit torch_dtype). Add an obsolescence tripwire test that fails when NVIDIA-NeMo#2419 lands so the workaround is removed timely, plus an analogous tripwire for the existing Qwen-VL vision-tower key-mapping workaround (fires when transformers #45358 / >=5.6 reaches the pin). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

The automodel fp32-master-weight tripwire test (test_automodel_dtype_restore_workaround_still_needed) failed in CI as a false positive. _disable_automodel_checkpoint_dtype_restore() globally and irreversibly replaces _restore_loaded_model_dtype with a no-op; earlier setup_model_and_optimizer tests in the same process leave that no-op installed, so the tripwire exercised the no-op (which preserves fp32) instead of Automodel's real downgrading function. Stash the original on the no-op and have the test recover it via _nrl_original. Also pass requested_dtype=fp32 to the function when its signature accepts it, so the tripwire actually fires once Automodel NVIDIA-NeMo#2419 is pinned: the rewritten function honors the explicit fp32 request only via that new parameter (promote_types), not via hf_config/load_kwargs. Correct the Skywork reward baseline (-5.4062 -> -5.2500) to the value the CI build produces (also the historical pre-refresh value); the incorrect-answer score is sensitive to the transformers/torch/kernel build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

vinhngx requested a review from a team as a code owner May 6, 2026 02:23

github-actions Bot added the community-request label May 6, 2026

vinhngx force-pushed the vinhn/autoresearch branch from fafbf11 to ca90191 Compare May 6, 2026 02:28

vinhngx changed the title ~~Auto research skill~~ feat: Auto research skill May 6, 2026

claude Bot reviewed May 7, 2026

View reviewed changes

Comment thread skills/auto-research/SKILL.md Outdated

claude Bot reviewed May 7, 2026

View reviewed changes

Comment thread skills/auto-research/SKILL.md

svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label May 9, 2026

terrykong reviewed May 11, 2026

View reviewed changes

Comment thread skills/auto-research/SKILL.md Outdated

Comment thread skills/auto-research/SKILL.md Outdated

Comment thread skills/auto-research/SKILL.md Outdated

Comment thread skills/auto-research/references/git-workflow.md Outdated

Comment thread skills/auto-research/references/git-workflow.md

svcnvidia-nemo-ci added waiting-on-customer Waiting on the original author to respond and removed waiting-on-maintainers Waiting on maintainers to respond labels May 11, 2026

revise auto research skill

bd35c3e

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

vinhngx force-pushed the vinhn/autoresearch branch from 29bb26f to 7aee365 Compare May 12, 2026 01:09

vinhngx added 15 commits May 12, 2026 01:22

add Brev skill. Rename auto-research

d5b497c

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

add session memory skill

3455e96

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

docs: address auto research skill review feedback

6e3250a

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

docs: clarify auto research execution environment

c4b9013

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

docs: refine auto research skill triggers

41ba60d

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

docs: clarify auto research objectives

55d5034

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

docs: clarify auto research experiment count stop rule

b623125

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

docs: rename auto research experiment count target

7184baf

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

docs: link auto research to Brev etiquette

e90907f

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

docs: clarify Brev detection for auto research

397de2c

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

docs: trigger Brev etiquette on user mention

06e97b3

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

docs: require session memory for auto research

c856914

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

docs: tidy research support skills

a16b0a9

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

docs: add auto research gotchas

43e16a3

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

docs: preserve auto research context across handoffs

4cdf4a1

Signed-off-by: Vinh Nguyen <vinhn@nvidia.com>

copy-pr-bot Bot temporarily deployed to public May 18, 2026 18:21 Inactive

copy-pr-bot Bot temporarily deployed to public May 18, 2026 18:25 Inactive

svcnvidia-nemo-ci removed the waiting-on-maintainers Waiting on maintainers to respond label May 18, 2026

Merge branch 'main' into vinhn/autoresearch

9880d5e

copy-pr-bot Bot temporarily deployed to public May 19, 2026 12:22 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci May 19, 2026 12:22 Failure

copy-pr-bot Bot temporarily deployed to public May 19, 2026 12:22 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 12:23 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 12:26 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci May 20, 2026 09:32 Error

Merge branch 'main' into vinhn/autoresearch

39de685

copy-pr-bot Bot temporarily deployed to public May 20, 2026 12:38 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 20, 2026 12:38 Inactive

copy-pr-bot Bot temporarily deployed to public May 20, 2026 12:38 Inactive

copy-pr-bot Bot temporarily deployed to public May 20, 2026 12:42 Inactive

terrykong merged commit 012bf17 into NVIDIA-NeMo:main May 20, 2026
42 checks passed

terrykong mentioned this pull request May 26, 2026

fix: move session-memory back to skills/ with auto-research group #2577

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Auto research skill#2419

feat: Auto research skill#2419
terrykong merged 21 commits into
NVIDIA-NeMo:mainfrom
vinhngx:vinhn/autoresearch

vinhngx commented May 6, 2026

Uh oh!

copy-pr-bot Bot commented May 6, 2026

Uh oh!

terrykong commented May 7, 2026

Uh oh!

Uh oh!

Uh oh!

terrykong left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vinhngx commented May 12, 2026

Uh oh!

yuki-97 commented May 19, 2026

Uh oh!

chtruong814 commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

vinhngx commented May 6, 2026

What does this PR do ?

Issues

Usage

Uh oh!

copy-pr-bot Bot commented May 6, 2026

Uh oh!

terrykong commented May 7, 2026

Uh oh!

Uh oh!

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Review: PR #2419 — feat: Auto research skill

Directory naming mismatch

Nemo-gym coverage

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vinhngx commented May 12, 2026

Uh oh!

yuki-97 commented May 19, 2026

Uh oh!

chtruong814 commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants