Skip to content

SearchR1 reproduction fixes#65

Merged
erictang000 merged 43 commits intomainfrom
erictang000/searchr1
Jul 10, 2025
Merged

SearchR1 reproduction fixes#65
erictang000 merged 43 commits intomainfrom
erictang000/searchr1

Conversation

@erictang000
Copy link
Copy Markdown
Collaborator

@erictang000 erictang000 commented Jul 7, 2025

What does this PR do?

Config Change

  • Makes generator.eval_sampling_params.temperature default to 0 instead of 1.

Adds fixes for reproducing results from SearchR1:

  • Updates the skyrl-gym SearchEnv implementation to correctly postprocess actions and use a consistent tool parsing schema (<search> and <answer>)
  • Adds session management to the SearchToolGroup to avoid a session being started for each request.
  • Updates the local retrieval server endpoint to respond to individual queries (and removed problematic torch.cuda.empty_cache() calls)
  • Cleaning up unnecessary printing/unused reward code paths
  • Adds comprehensive testing for the search environment
  • Adds config arguments for the search env (skyrl_gym.search.*)

Adds docs (examples/search.rst) and (recipes.searchr1.rst) for our reproduction runs

Results below:
image
image

@erictang000 erictang000 marked this pull request as draft July 7, 2025 20:57
@lynnliu030 lynnliu030 self-requested a review July 7, 2025 21:23
@erictang000 erictang000 requested a review from tyler-griggs July 8, 2025 21:52
Comment thread skyrl-train/pyproject.toml
Comment thread skyrl-train/tests/gpu/test_skyrl_gym_generator.py
@erictang000 erictang000 marked this pull request as ready for review July 8, 2025 21:54
@erictang000 erictang000 changed the title [WIP] SearchR1 reproduction fixes SearchR1 reproduction fixes Jul 8, 2025
Comment thread skyrl-gym/skyrl_gym/tools/search.py
Comment thread skyrl-gym/skyrl_gym/tools/search.py Outdated
Comment thread skyrl-gym/tests/test_search.py Outdated
Comment thread skyrl-gym/tests/test_search.py Outdated
Comment thread skyrl-train/tests/gpu/test_skyrl_gym_generator.py
Copy link
Copy Markdown
Member

@tyler-griggs tyler-griggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval on the assumption that you know the right way to make these updates without breaking skyrl-train's dependency on skyrl-gym :D. Does #71 resolve this concern?

Comment thread skyrl-gym/skyrl_gym/tools/search.py
Comment thread skyrl-train/docs/examples/search.rst Outdated
Comment thread skyrl-train/examples/search/run_search.sh Outdated
@erictang000
Copy link
Copy Markdown
Collaborator Author

Approval on the assumption that you know the right way to make these updates without breaking skyrl-train's dependency on skyrl-gym :D. Does #71 resolve this concern?

Yep! That added the same workaround I was using

@erictang000 erictang000 merged commit afbf88b into main Jul 10, 2025
3 checks passed
@erictang000 erictang000 deleted the erictang000/searchr1 branch July 10, 2025 00:51
@SumanthRH SumanthRH mentioned this pull request Jul 11, 2025
2 tasks
fannie1208 pushed a commit to vinid/SkyRL that referenced this pull request Aug 19, 2025
# What does this PR do?
### Config Change
- Makes `generator.eval_sampling_params.temperature` default to 0
instead of 1.

### Adds fixes for reproducing results from SearchR1:
- Updates the `skyrl-gym` `SearchEnv` implementation to correctly
postprocess actions and use a consistent tool parsing schema (`<search>`
and `<answer>`)
- Adds session management to the `SearchToolGroup` to avoid a session
being started for each request.
- Updates the local retrieval server endpoint to respond to individual
queries (and removed problematic torch.cuda.empty_cache() calls)
- Cleaning up unnecessary printing/unused reward code paths
- Adds comprehensive testing for the search environment
- Adds config arguments for the search env (`skyrl_gym.search.*`)

Adds docs (examples/search.rst) and (recipes.searchr1.rst) for our
reproduction runs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants