grpo updates by ncfrey · Pull Request #128 · prescient-design/lobster

ncfrey · 2025-07-01T15:25:34Z

This pull request introduces improvements to synthetic dataset generation, UME model training, and reward computation for RL training. Key changes include adding tagging for generated sequences, updating training configurations, refining reward logging, and enhancing reward computation to handle invalid completions.

Synthetic Dataset Enhancements:

Updated sequence generation prompts in examples/generate_synthetic_dataset.py to include tags (<smiles>, <amino_acid>, <dna>) for better content identification. [1] [2] [3]

Training Configuration Updates:

Changed the default UME model path in examples/train_ume_grpo.py to ume-medium-base-480M for training purposes, replacing the smaller debugging model.

Reward Computation Improvements:

Enhanced src/lobster/rl_training/reward_functions.py to penalize invalid completions with a configurable penalty_for_invalid parameter and added logic to extract tagged content for modality detection. This ensures only valid tagged sequences are rewarded. [1] [2] [3] [4]
Updated reward statistics to include counts for invalid completions (no_tag_count, empty_content_count) and adjusted logging to reflect these metrics.

Logging Refinements:

Modified src/lobster/callbacks/_ume_grpo_logging_callback.py to log sample examples only when data is available and introduced step-specific tables for immediate logging during training. [1] [2]

Documentation Updates:

Added considerations for penalty values in GRPO training to src/lobster/rl_training/README.md, including guidelines for determining appropriate penalties based on reward distributions.

ncfrey and others added 2 commits June 27, 2025 16:24

rewards

c312495

rewards

a3c6914

ncfrey requested a review from karinazad July 1, 2025 15:25

update tests

53bfa86

ncfrey temporarily deployed to test.pypi.org July 1, 2025 15:29 — with GitHub Actions Inactive

karinazad approved these changes Jul 1, 2025

View reviewed changes

ncfrey merged commit 85b6fca into main Jul 1, 2025
5 checks passed

ncfrey deleted the n/grpo-whitening branch July 1, 2025 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grpo updates#128

grpo updates#128
ncfrey merged 3 commits intomainfrom
n/grpo-whitening

ncfrey commented Jul 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ncfrey commented Jul 1, 2025

Synthetic Dataset Enhancements:

Training Configuration Updates:

Reward Computation Improvements:

Logging Refinements:

Documentation Updates:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants