Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR initializes a reinforcement learning training module (rl_training) using UME-based reward functions and GRPO training, adding implementation, tests, examples, and documentation.
- Add core RL utilities:
reward_functions.pyandtrainers.pyundersrc/lobster/rl_training - Add comprehensive unit tests for trainers and reward functions
- Update documentation (module README, root README, docs), CI config, and examples for RL workflow
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/lobster/rl_training/test_trainers.py | Tests for create_ume_grpo_trainer and train_ume_grpo |
| tests/lobster/rl_training/test_reward_functions.py | Tests for detect_modality, compute_pseudo_likelihood, and UMERewardFunction |
| src/lobster/rl_training/trainers.py | Utility functions to create and run GRPO trainers with UME rewards |
| src/lobster/rl_training/reward_functions.py | Modality detection and pseudo-likelihood–based reward functions |
| src/lobster/rl_training/README.md | Quick start and module documentation for rl_training |
| src/lobster/rl_training/init.py | Module exports for RL training utilities |
| examples/train_ume_grpo.py | Example script demonstrating UME-based GRPO training |
| examples/generate_synthetic_dataset.py | Script to generate a synthetic molecular/biological dataset |
| docs/RL_TRAINING.md | Detailed guide for RL training workflow |
| pyproject.toml | Add trl and accelerate to dependencies |
| .github/workflows/push.yml | Include --extra trl in CI sync step |
Comments suppressed due to low confidence (4)
src/lobster/rl_training/README.md:61
- The import references
UmeRewardFunction, but the actual class is namedUMERewardFunction. Update the import to match the exact class name.
from lobster.rl_training import UmeRewardFunction, detect_modality
docs/RL_TRAINING.md:66
- The documentation refers to
UmeRewardFunction, but the class is declared asUMERewardFunction. Adjust the heading toUMERewardFunctionfor consistency.
- **`UmeRewardFunction`**: Main reward function class that computes rewards based on UME pseudo-likelihood
README.md:222
- The quick-start command assumes
train_ume_grpo.pyis in the working directory, but the example script lives underexamples/. Consider updating topython examples/train_ume_grpo.py.
python train_ume_grpo.py
tests/lobster/rl_training/test_reward_functions.py:308
- Consider adding a test for
create_ume_reward_wrapperto ensure it correctly delegates toUMERewardFunctionand preserves the function signature expected by TRL.
assert all(isinstance(l, (float, np.floating)) for l in likelihoods) # Allow numpy types
| ``` | ||
|
|
||
| ## Step-by-Step Training Process | ||
|
|
There was a problem hiding this comment.
Could you add that it requires --extra install?
examples/train_ume_grpo.py
Outdated
| """Main training function.""" | ||
| # Load datasets | ||
| logger.info("Loading datasets...") | ||
| train_dataset = load_from_disk("/data/bucket/freyn6/synthetic_molecular_dataset/train") |
There was a problem hiding this comment.
this could be a relative path since the README says cd examples
examples/train_ume_grpo.py
Outdated
| val_dataset = load_from_disk("/data/bucket/freyn6/synthetic_molecular_dataset/validation") | ||
|
|
||
| # Load Qwen model from local cache to avoid download timeouts | ||
| qwen_model_path = "/data/bucket/freyn6/cache/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d" |
| - `synthetic_molecular_dataset/` - HuggingFace dataset with train/val/test splits | ||
| - `synthetic_molecular_dataset.json` - JSON file for easy inspection | ||
|
|
||
| ### Step 2: Run UME-based GRPO Training |
There was a problem hiding this comment.
Could add instructions for how to download Qwen? Or switch to the HF name in the example training code if it works
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def detect_modality(text: str) -> Modality: |
There was a problem hiding this comment.
can we move this out to model utils? or directly in UME if modality is not specified
|
|
||
| # Skip empty or very short sequences | ||
| if len(text) < 3: | ||
| return Modality.SMILES |
There was a problem hiding this comment.
Should this error out instead of SMILES?
There was a problem hiding this comment.
Oh I see, SMILES is the default. I wonder if this could be an argument passed to function whether to fall back on a modality of to error out/ return None
There was a problem hiding this comment.
yeah i think erroring out instead of defaulting to SMILES makes sense. i'll change this
| return Modality.SMILES | ||
|
|
||
|
|
||
| def compute_pseudo_likelihood(ume_model: UME, sequences: list[str], modality: Modality) -> list[float]: |
There was a problem hiding this comment.
this could also be useful in UME itself. I image for a lot of applications, people will just want likelihoods
There was a problem hiding this comment.
i'll add this as a method
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Fix StopIteration in UME.embed_sequences when no parameters exist (testing) - Update modality detection test to match actual error message - Add wandb mocking to RL training tests to prevent initialization errors
No description provided.