rl init by ncfrey · Pull Request #121 · prescient-design/lobster

ncfrey · 2025-06-24T23:12:23Z

No description provided.

Copilot

Pull Request Overview

This PR initializes a reinforcement learning training module (rl_training) using UME-based reward functions and GRPO training, adding implementation, tests, examples, and documentation.

Add core RL utilities: reward_functions.py and trainers.py under src/lobster/rl_training
Add comprehensive unit tests for trainers and reward functions
Update documentation (module README, root README, docs), CI config, and examples for RL workflow

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/lobster/rl_training/test_trainers.py	Tests for `create_ume_grpo_trainer` and `train_ume_grpo`
tests/lobster/rl_training/test_reward_functions.py	Tests for `detect_modality`, `compute_pseudo_likelihood`, and `UMERewardFunction`
src/lobster/rl_training/trainers.py	Utility functions to create and run GRPO trainers with UME rewards
src/lobster/rl_training/reward_functions.py	Modality detection and pseudo-likelihood–based reward functions
src/lobster/rl_training/README.md	Quick start and module documentation for `rl_training`
src/lobster/rl_training/init.py	Module exports for RL training utilities
examples/train_ume_grpo.py	Example script demonstrating UME-based GRPO training
examples/generate_synthetic_dataset.py	Script to generate a synthetic molecular/biological dataset
docs/RL_TRAINING.md	Detailed guide for RL training workflow
pyproject.toml	Add `trl` and `accelerate` to dependencies
.github/workflows/push.yml	Include `--extra trl` in CI sync step

Comments suppressed due to low confidence (4)

src/lobster/rl_training/README.md:61

The import references UmeRewardFunction, but the actual class is named UMERewardFunction. Update the import to match the exact class name.

from lobster.rl_training import UmeRewardFunction, detect_modality

docs/RL_TRAINING.md:66

The documentation refers to UmeRewardFunction, but the class is declared as UMERewardFunction. Adjust the heading to UMERewardFunction for consistency.

- **`UmeRewardFunction`**: Main reward function class that computes rewards based on UME pseudo-likelihood

README.md:222

The quick-start command assumes train_ume_grpo.py is in the working directory, but the example script lives under examples/. Consider updating to python examples/train_ume_grpo.py.

python train_ume_grpo.py

tests/lobster/rl_training/test_reward_functions.py:308

Consider adding a test for create_ume_reward_wrapper to ensure it correctly delegates to UMERewardFunction and preserves the function signature expected by TRL.

        assert all(isinstance(l, (float, np.floating)) for l in likelihoods)  # Allow numpy types

tests/lobster/rl_training/test_reward_functions.py

src/lobster/rl_training/reward_functions.py

karinazad · 2025-06-25T15:38:38Z

docs/RL_TRAINING.md

+```
+
+## Step-by-Step Training Process
+


Could you add that it requires --extra install?

karinazad · 2025-06-25T15:41:28Z

examples/train_ume_grpo.py

+    """Main training function."""
+    # Load datasets
+    logger.info("Loading datasets...")
+    train_dataset = load_from_disk("/data/bucket/freyn6/synthetic_molecular_dataset/train")


this could be a relative path since the README says cd examples

karinazad · 2025-06-25T15:41:38Z

examples/train_ume_grpo.py

+    val_dataset = load_from_disk("/data/bucket/freyn6/synthetic_molecular_dataset/validation")
+
+    # Load Qwen model from local cache to avoid download timeouts
+    qwen_model_path = "/data/bucket/freyn6/cache/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d"


karinazad · 2025-06-25T15:44:33Z

docs/RL_TRAINING.md

+- `synthetic_molecular_dataset/` - HuggingFace dataset with train/val/test splits
+- `synthetic_molecular_dataset.json` - JSON file for easy inspection
+
+### Step 2: Run UME-based GRPO Training


Could add instructions for how to download Qwen? Or switch to the HF name in the example training code if it works

karinazad · 2025-06-25T15:49:56Z

src/lobster/rl_training/reward_functions.py

+logger = logging.getLogger(__name__)
+
+
+def detect_modality(text: str) -> Modality:


can we move this out to model utils? or directly in UME if modality is not specified

karinazad · 2025-06-25T15:50:34Z

src/lobster/rl_training/reward_functions.py

+
+    # Skip empty or very short sequences
+    if len(text) < 3:
+        return Modality.SMILES


Should this error out instead of SMILES?

Oh I see, SMILES is the default. I wonder if this could be an argument passed to function whether to fall back on a modality of to error out/ return None

yeah i think erroring out instead of defaulting to SMILES makes sense. i'll change this

karinazad · 2025-06-25T15:52:22Z

src/lobster/rl_training/reward_functions.py

+    return Modality.SMILES
+
+
+def compute_pseudo_likelihood(ume_model: UME, sequences: list[str], modality: Modality) -> list[float]:


this could also be useful in UME itself. I image for a lot of applications, people will just want likelihoods

i'll add this as a method

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- Fix StopIteration in UME.embed_sequences when no parameters exist (testing) - Update modality detection test to match actual error message - Add wandb mocking to RL training tests to prevent initialization errors

ncfrey self-assigned this Jun 24, 2025

ncfrey temporarily deployed to test.pypi.org June 24, 2025 23:12 — with GitHub Actions Inactive

ncfrey temporarily deployed to test.pypi.org June 24, 2025 23:16 — with GitHub Actions Inactive

ncfrey temporarily deployed to test.pypi.org June 25, 2025 13:59 — with GitHub Actions Inactive

ncfrey temporarily deployed to test.pypi.org June 25, 2025 14:39 — with GitHub Actions Inactive

ncfrey requested review from Copilot and karinazad and removed request for karinazad June 25, 2025 14:46

Copilot AI reviewed Jun 25, 2025

View reviewed changes

tests/lobster/rl_training/test_reward_functions.py Outdated Show resolved Hide resolved

src/lobster/rl_training/reward_functions.py Outdated Show resolved Hide resolved

ncfrey temporarily deployed to test.pypi.org June 25, 2025 14:52 — with GitHub Actions Inactive

ncfrey marked this pull request as ready for review June 25, 2025 15:03

karinazad reviewed Jun 25, 2025

View reviewed changes

karinazad approved these changes Jun 25, 2025

View reviewed changes

ncfrey temporarily deployed to test.pypi.org June 25, 2025 17:49 — with GitHub Actions Inactive

ncfrey temporarily deployed to test.pypi.org June 25, 2025 20:20 — with GitHub Actions Inactive

ncfrey temporarily deployed to test.pypi.org June 25, 2025 20:30 — with GitHub Actions Inactive

ncfrey temporarily deployed to test.pypi.org June 25, 2025 21:11 — with GitHub Actions Inactive

ncfrey temporarily deployed to test.pypi.org June 25, 2025 21:37 — with GitHub Actions Inactive

ncfrey temporarily deployed to test.pypi.org June 25, 2025 21:45 — with GitHub Actions Inactive

ncfrey temporarily deployed to test.pypi.org June 25, 2025 22:54 — with GitHub Actions Inactive

shriramc1 approved these changes Jun 25, 2025

View reviewed changes

freyn6 added 2 commits June 25, 2025 20:36

rl init

f6b1eda

case

443dcc1

freyn6 and others added 9 commits June 25, 2025 20:36

fix casing

4d892c8

fixes

0b8f960

Update tests/lobster/rl_training/test_reward_functions.py

1c6ee97

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/lobster/rl_training/reward_functions.py

f5a4096

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

better logging

3362bdf

lint

98abf1c

fixes

e46d67f

uv lock

1570def

Fix failing test errors

b21db1b

- Fix StopIteration in UME.embed_sequences when no parameters exist (testing) - Update modality detection test to match actual error message - Add wandb mocking to RL training tests to prevent initialization errors

ncfrey force-pushed the n/rl-init-2 branch from 7e7f41e to b21db1b Compare June 26, 2025 00:40

ncfrey temporarily deployed to test.pypi.org June 26, 2025 00:40 — with GitHub Actions Inactive

ncfrey merged commit 25b9bb1 into main Jun 26, 2025
5 checks passed

ncfrey deleted the n/rl-init-2 branch June 26, 2025 00:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rl init#121

rl init#121
ncfrey merged 11 commits intomainfrom
n/rl-init-2

ncfrey commented Jun 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

karinazad Jun 25, 2025

Uh oh!

karinazad Jun 25, 2025

Uh oh!

karinazad Jun 25, 2025

Uh oh!

karinazad Jun 25, 2025

Uh oh!

karinazad Jun 25, 2025

Uh oh!

karinazad Jun 25, 2025

Uh oh!

karinazad Jun 25, 2025

Uh oh!

ncfrey Jun 25, 2025

Uh oh!

karinazad Jun 25, 2025

Uh oh!

ncfrey Jun 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		logger = logging.getLogger(__name__)


		def detect_modality(text: str) -> Modality:

		return Modality.SMILES


		def compute_pseudo_likelihood(ume_model: UME, sequences: list[str], modality: Modality) -> list[float]:

Conversation

ncfrey commented Jun 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants