Skip to content

grpo updates#128

Merged
ncfrey merged 3 commits intomainfrom
n/grpo-whitening
Jul 1, 2025
Merged

grpo updates#128
ncfrey merged 3 commits intomainfrom
n/grpo-whitening

Conversation

@ncfrey
Copy link
Contributor

@ncfrey ncfrey commented Jul 1, 2025

This pull request introduces improvements to synthetic dataset generation, UME model training, and reward computation for RL training. Key changes include adding tagging for generated sequences, updating training configurations, refining reward logging, and enhancing reward computation to handle invalid completions.

Synthetic Dataset Enhancements:

  • Updated sequence generation prompts in examples/generate_synthetic_dataset.py to include tags (<smiles>, <amino_acid>, <dna>) for better content identification. [1] [2] [3]

Training Configuration Updates:

  • Changed the default UME model path in examples/train_ume_grpo.py to ume-medium-base-480M for training purposes, replacing the smaller debugging model.

Reward Computation Improvements:

  • Enhanced src/lobster/rl_training/reward_functions.py to penalize invalid completions with a configurable penalty_for_invalid parameter and added logic to extract tagged content for modality detection. This ensures only valid tagged sequences are rewarded. [1] [2] [3] [4]
  • Updated reward statistics to include counts for invalid completions (no_tag_count, empty_content_count) and adjusted logging to reflect these metrics.

Logging Refinements:

  • Modified src/lobster/callbacks/_ume_grpo_logging_callback.py to log sample examples only when data is available and introduced step-specific tables for immediate logging during training. [1] [2]

Documentation Updates:

  • Added considerations for penalty values in GRPO training to src/lobster/rl_training/README.md, including guidelines for determining appropriate penalties based on reward distributions.

@ncfrey ncfrey requested a review from karinazad July 1, 2025 15:25
@ncfrey ncfrey temporarily deployed to test.pypi.org July 1, 2025 15:29 — with GitHub Actions Inactive
@ncfrey ncfrey merged commit 85b6fca into main Jul 1, 2025
5 checks passed
@ncfrey ncfrey deleted the n/grpo-whitening branch July 1, 2025 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants