add conversion transforms by cgrambow · Pull Request #74 · prescient-design/lobster

cgrambow · 2025-05-13T16:15:11Z

No description provided.

taylormjs

Nice!

* add initial smiles to peptide and peptide to smiles transforms * remove smiles -> * transforms and touch up conversion functions * rename * add option to randomize smiles and caps --------- Co-authored-by: Colin Grambow <grambowc@gene.com>

* peer fixes, add evaluate method * dataloader checkpoint callback (#60) * dataloader callback * utils * ume * gitignore dev * tests * update flash attention wheels (#61) * lock * torch 2.5 * torch 2.5 * part * .env * unpin flash attn (#62) * fix scheduler params (#64) * scheduler * fix scheduler * fix scheduler * Add AtomicaDataset (#63) Processed Atomica interactions dataset * Ume conversion/interaction tokenizer + fix SMILES and nucleotide tokenizers (#65) add two special tokens: <convert> and <interact> for later stages of Ume training: will be used as this: (or something like that) [CLS] PROT_SEQ [SEP] <convert> PROT_STRUCT(masked) [SEP] [CLS] PROT_SEQ [SEP] <interact> SMILES(masked) [SEP] extend functionality of UmeTokenizerTransform to handle dual modalities change the name of Ume embedding method and allow embedding from existing input_ids fix existing tokenizers: add lowercase normalized to nucleotide tokenizer (OG2 dataset contains a mix of upper and lowercase letters) BPE handled SMILES tokenization incorrectly, switch to WordLevel * Ume SMILES tokenizer fix (#66) * tokenizer * fix tests * lowercase normalizer for nt * tests * remove mod conv dataset * embed * Test * merge 2mod into UmeTokenizerTransform * fix tests * all * type hints * docstrings * tests * fix SMILES tokenizer * switch all tokenizer to BPE * Revert "switch all tokenizer to BPE" This reverts commit 367e77d. * tok * fix SMILES tokenizer * remove print statement * Ume perplexity logging (#67) * pplx * tests * src * ignore torchmetrics warnings * docstrings * docstrings * Update README.md (#69) * Ume fix perplexity device (#68) * pplx as attr * pplx as attr * pplx * comments * on step * comment * update tests, fix ruff * ruff * ruff ruff * Add <cls_modality> to Ume tokenizers (#71) * add <cls_modality> tokens * add <cls_modality> tokens * docstring * RNS metric implementation (#73) * add <cls_modality> tokens * add <cls_modality> tokens * modality embeddings * module dict * embeddings * tests * modality and device * rank zero only * rank zero * fix back modality mask * sync dist * RNS implementation * restore from main * restore * docstrings * docstrings * review * test * Ume modality-specific embeddings (#72) * add <cls_modality> tokens * add <cls_modality> tokens * modality embeddings * module dict * embeddings * tests * modality and device * rank zero only * rank zero * fix back modality mask * sync dist * add conversion transforms (#74) * add initial smiles to peptide and peptide to smiles transforms * remove smiles -> * transforms and touch up conversion functions * rename * add option to randomize smiles and caps --------- Co-authored-by: Colin Grambow <grambowc@gene.com> * fix def pad token, replace process_and_embed w/ ume.embed * update tests w -100 pad token --------- Co-authored-by: Taylor Joren <joren.taylor@gene.com> Co-authored-by: Karina Zadorozhny <karina.zadorozhny@gmail.com> Co-authored-by: Nathan Frey <ncfrey@users.noreply.github.com> Co-authored-by: Colin Grambow <17198155+cgrambow@users.noreply.github.com> Co-authored-by: Colin Grambow <grambowc@gene.com>

Colin Grambow added 2 commits May 13, 2025 12:13

add initial smiles to peptide and peptide to smiles transforms

6541937

remove smiles -> * transforms and touch up conversion functions

ae57310

cgrambow temporarily deployed to test.pypi.org May 13, 2025 18:33 — with GitHub Actions Inactive

rename

fadcd0c

cgrambow temporarily deployed to test.pypi.org May 13, 2025 18:34 — with GitHub Actions Inactive

add option to randomize smiles and caps

17e137a

cgrambow temporarily deployed to test.pypi.org May 13, 2025 20:01 — with GitHub Actions Inactive

taylormjs approved these changes May 13, 2025

View reviewed changes

taylormjs merged commit 405f955 into main May 13, 2025
5 checks passed

taylormjs deleted the conversion-transforms branch May 13, 2025 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add conversion transforms#74

add conversion transforms#74
taylormjs merged 4 commits intomainfrom
conversion-transforms

cgrambow commented May 13, 2025

Uh oh!

taylormjs left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cgrambow commented May 13, 2025

Uh oh!

taylormjs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants