Add 'from_pretrained' to Ume by karinazad · Pull Request #113 · prescient-design/lobster

karinazad · 2025-06-18T01:44:49Z

MR #113: Add 'from_pretrained' to Ume

Overview

This merge request adds a convenient from_pretrained method to the Universal Molecular Encoder (Ume) model, making it easier to load pre-trained models without manually specifying checkpoint paths. This follows the familiar pattern used by other popular model libraries like Hugging Face Transformers.

Key Changes

1. New Constants File

File: src/lobster/constants/_ume_models.py
Purpose: Defines available pre-trained model checkpoints
Models Available:
- ume-mini-base-12M (12M parameters)
- ume-medium-base-480M (480M parameters)
- ume-large-base-740M (740M parameters)

2. Enhanced Ume Model

File: src/lobster/model/_ume.py
New Method: from_pretrained() class method
Features:
- Automatic model name resolution to checkpoint paths
- Device placement control (cpu/cuda)
- Flash attention configuration
- Custom cache directory support
- Automatic retry on corrupted downloads

3. Checkpoint Utilities

File: src/lobster/model/_utils_checkpoint.py
Purpose: Handles S3 downloads and checkpoint loading with error recovery
Features:
- Automatic download from S3
- Corruption detection and recovery
- Proper error handling for credential issues

4. Comprehensive Testing

Files:
- tests/lobster/model/test__ume.py - Tests for from_pretrained method
- tests/lobster/model/test__utils_checkpoint.py - Tests for checkpoint utilities
Coverage: Unit tests for all new functionality including error cases

Usage Examples

Basic Usage

from lobster.model import Ume

# Load a pre-trained model
ume = Ume.from_pretrained("ume-mini-base-12M")

# Check model properties
print(f"Supported modalities: {ume.modalities}")
print(f"Vocab size: {len(ume.get_vocab())}")
print(f"Embedding dimension: {ume.embedding_dim}")

# Protein sequences
protein_sequences = ["MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"]
protein_embeddings = ume.embed_sequences(protein_sequences, modality="amino_acid")

Security Notes

Models are currently only available to Prescient Design members
S3 credentials required for download
Clear error messages for unauthorized access attempts

Future Enhancements

Support for external users (planned)
Additional model variants
Integration with Hugging Face Hub
More sophisticated caching strategies

src/lobster/constants/_ume_models.py

src/lobster/model/_ume.py

from pretrained

7f0c8f9

karinazad requested a review from ncfrey June 18, 2025 01:47

add from pretrained to notebooks

1f09fa8

karinazad temporarily deployed to test.pypi.org June 18, 2025 01:53 — with GitHub Actions Inactive

ncfrey approved these changes Jun 18, 2025

View reviewed changes

src/lobster/constants/_ume_models.py Outdated Show resolved Hide resolved

src/lobster/model/_ume.py Outdated Show resolved Hide resolved

checkpoint manager

bbde749

karinazad temporarily deployed to test.pypi.org June 18, 2025 17:16 — with GitHub Actions Inactive

checkpoint manager docs

e236563

karinazad temporarily deployed to test.pypi.org June 18, 2025 17:17 — with GitHub Actions Inactive

Literal

4899049

karinazad temporarily deployed to test.pypi.org June 18, 2025 17:18 — with GitHub Actions Inactive

remove batch embeddings normalization

f1a9391

karinazad temporarily deployed to test.pypi.org June 18, 2025 17:21 — with GitHub Actions Inactive

remove unused get logits and labels func

ef20bd4

karinazad temporarily deployed to test.pypi.org June 18, 2025 17:22 — with GitHub Actions Inactive

dont read from S3 on import

1dd4662

karinazad temporarily deployed to test.pypi.org June 18, 2025 17:29 — with GitHub Actions Inactive

remove docs

ac52296

karinazad temporarily deployed to test.pypi.org June 18, 2025 17:42 — with GitHub Actions Inactive

fix tests

86b0f1d

karinazad temporarily deployed to test.pypi.org June 18, 2025 18:00 — with GitHub Actions Inactive

karinazad merged commit d695dfa into main Jun 18, 2025
5 checks passed

karinazad deleted the k/ume-from-pretrained branch June 18, 2025 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'from_pretrained' to Ume#113

Add 'from_pretrained' to Ume#113
karinazad merged 10 commits intomainfrom
k/ume-from-pretrained

karinazad commented Jun 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

karinazad commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

MR #113: Add 'from_pretrained' to Ume

Overview

Key Changes

1. New Constants File

2. Enhanced Ume Model

3. Checkpoint Utilities

4. Comprehensive Testing

Usage Examples

Basic Usage

Security Notes

Future Enhancements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

karinazad commented Jun 18, 2025 •

edited

Loading