Skip to content

rl init#121

Merged
ncfrey merged 11 commits intomainfrom
n/rl-init-2
Jun 26, 2025
Merged

rl init#121
ncfrey merged 11 commits intomainfrom
n/rl-init-2

Conversation

@ncfrey
Copy link
Contributor

@ncfrey ncfrey commented Jun 24, 2025

No description provided.

@ncfrey ncfrey self-assigned this Jun 24, 2025
@ncfrey ncfrey temporarily deployed to test.pypi.org June 24, 2025 23:12 — with GitHub Actions Inactive
@ncfrey ncfrey temporarily deployed to test.pypi.org June 24, 2025 23:16 — with GitHub Actions Inactive
@ncfrey ncfrey temporarily deployed to test.pypi.org June 25, 2025 13:59 — with GitHub Actions Inactive
@ncfrey ncfrey temporarily deployed to test.pypi.org June 25, 2025 14:39 — with GitHub Actions Inactive
@ncfrey ncfrey requested review from Copilot and karinazad and removed request for karinazad June 25, 2025 14:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR initializes a reinforcement learning training module (rl_training) using UME-based reward functions and GRPO training, adding implementation, tests, examples, and documentation.

  • Add core RL utilities: reward_functions.py and trainers.py under src/lobster/rl_training
  • Add comprehensive unit tests for trainers and reward functions
  • Update documentation (module README, root README, docs), CI config, and examples for RL workflow

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/lobster/rl_training/test_trainers.py Tests for create_ume_grpo_trainer and train_ume_grpo
tests/lobster/rl_training/test_reward_functions.py Tests for detect_modality, compute_pseudo_likelihood, and UMERewardFunction
src/lobster/rl_training/trainers.py Utility functions to create and run GRPO trainers with UME rewards
src/lobster/rl_training/reward_functions.py Modality detection and pseudo-likelihood–based reward functions
src/lobster/rl_training/README.md Quick start and module documentation for rl_training
src/lobster/rl_training/init.py Module exports for RL training utilities
examples/train_ume_grpo.py Example script demonstrating UME-based GRPO training
examples/generate_synthetic_dataset.py Script to generate a synthetic molecular/biological dataset
docs/RL_TRAINING.md Detailed guide for RL training workflow
pyproject.toml Add trl and accelerate to dependencies
.github/workflows/push.yml Include --extra trl in CI sync step
Comments suppressed due to low confidence (4)

src/lobster/rl_training/README.md:61

  • The import references UmeRewardFunction, but the actual class is named UMERewardFunction. Update the import to match the exact class name.
from lobster.rl_training import UmeRewardFunction, detect_modality

docs/RL_TRAINING.md:66

  • The documentation refers to UmeRewardFunction, but the class is declared as UMERewardFunction. Adjust the heading to UMERewardFunction for consistency.
- **`UmeRewardFunction`**: Main reward function class that computes rewards based on UME pseudo-likelihood

README.md:222

  • The quick-start command assumes train_ume_grpo.py is in the working directory, but the example script lives under examples/. Consider updating to python examples/train_ume_grpo.py.
python train_ume_grpo.py

tests/lobster/rl_training/test_reward_functions.py:308

  • Consider adding a test for create_ume_reward_wrapper to ensure it correctly delegates to UMERewardFunction and preserves the function signature expected by TRL.
        assert all(isinstance(l, (float, np.floating)) for l in likelihoods)  # Allow numpy types

@ncfrey ncfrey temporarily deployed to test.pypi.org June 25, 2025 14:52 — with GitHub Actions Inactive
@ncfrey ncfrey marked this pull request as ready for review June 25, 2025 15:03
```

## Step-by-Step Training Process

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add that it requires --extra install?

"""Main training function."""
# Load datasets
logger.info("Loading datasets...")
train_dataset = load_from_disk("/data/bucket/freyn6/synthetic_molecular_dataset/train")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be a relative path since the README says cd examples

val_dataset = load_from_disk("/data/bucket/freyn6/synthetic_molecular_dataset/validation")

# Load Qwen model from local cache to avoid download timeouts
qwen_model_path = "/data/bucket/freyn6/cache/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

- `synthetic_molecular_dataset/` - HuggingFace dataset with train/val/test splits
- `synthetic_molecular_dataset.json` - JSON file for easy inspection

### Step 2: Run UME-based GRPO Training
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add instructions for how to download Qwen? Or switch to the HF name in the example training code if it works

logger = logging.getLogger(__name__)


def detect_modality(text: str) -> Modality:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this out to model utils? or directly in UME if modality is not specified


# Skip empty or very short sequences
if len(text) < 3:
return Modality.SMILES
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this error out instead of SMILES?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, SMILES is the default. I wonder if this could be an argument passed to function whether to fall back on a modality of to error out/ return None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think erroring out instead of defaulting to SMILES makes sense. i'll change this

return Modality.SMILES


def compute_pseudo_likelihood(ume_model: UME, sequences: list[str], modality: Modality) -> list[float]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could also be useful in UME itself. I image for a lot of applications, people will just want likelihoods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll add this as a method

freyn6 and others added 9 commits June 25, 2025 20:36
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Fix StopIteration in UME.embed_sequences when no parameters exist (testing)
- Update modality detection test to match actual error message
- Add wandb mocking to RL training tests to prevent initialization errors
@ncfrey ncfrey temporarily deployed to test.pypi.org June 26, 2025 00:40 — with GitHub Actions Inactive
@ncfrey ncfrey merged commit 25b9bb1 into main Jun 26, 2025
5 checks passed
@ncfrey ncfrey deleted the n/rl-init-2 branch June 26, 2025 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants