Skip to content

Ume fix perplexity device#68

Merged
karinazad merged 6 commits intomainfrom
ume-fix-perplexity-device
May 2, 2025
Merged

Ume fix perplexity device#68
karinazad merged 6 commits intomainfrom
ume-fix-perplexity-device

Conversation

@karinazad
Copy link
Collaborator

No description provided.

@karinazad karinazad temporarily deployed to test.pypi.org May 2, 2025 14:29 — with GitHub Actions Inactive
@ncfrey ncfrey requested a review from Copilot May 2, 2025 18:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses modifications in the UME model to change how perplexity metrics are stored and accessed during training and validation. Key changes include replacing the metrics dictionary with dynamic attributes using setattr, and adjusting the logging of perplexity metrics accordingly.

Comments suppressed due to low confidence (1)

src/lobster/model/_ume.py:146

  • [nitpick] Using attribute names with a slash (e.g., "train_perplexity/{modality.value}") is unconventional and may lead to confusion when accessing these attributes. Consider using a valid identifier format, such as replacing the slash with an underscore.
setattr(self, f"train_perplexity/{modality.value}", Perplexity(ignore_index=-100))

metric = getattr(self, metric_name)
metric(logits_reshaped[mask], labels_reshaped[mask])

self.log(metric_name, metric, sync_dist=True)
Copy link

Copilot AI May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated logging call no longer specifies the on_step parameter, which changes the logging behavior compared to previous implementations. If this change is intentional, please document the rationale to ensure consistent logging.

Suggested change
self.log(metric_name, metric, sync_dist=True)
self.log(metric_name, metric, on_step=True, sync_dist=True)

Copilot uses AI. Check for mistakes.
@karinazad karinazad temporarily deployed to test.pypi.org May 2, 2025 18:48 — with GitHub Actions Inactive
@karinazad karinazad temporarily deployed to test.pypi.org May 2, 2025 18:50 — with GitHub Actions Inactive
@karinazad karinazad temporarily deployed to test.pypi.org May 2, 2025 20:02 — with GitHub Actions Inactive
@karinazad karinazad merged commit dc040c0 into main May 2, 2025
5 checks passed
@karinazad karinazad deleted the ume-fix-perplexity-device branch May 2, 2025 20:17
taylormjs pushed a commit that referenced this pull request May 6, 2025
* pplx as attr

* pplx as attr

* pplx

* comments

* on step

* comment
karinazad added a commit that referenced this pull request May 14, 2025
* peer fixes, add evaluate method

* dataloader checkpoint callback (#60)

* dataloader callback

* utils

* ume

* gitignore dev

* tests

* update flash attention wheels (#61)

* lock

* torch 2.5

* torch 2.5

* part

* .env

* unpin flash attn (#62)

* fix scheduler params (#64)

* scheduler

* fix scheduler

* fix scheduler

* Add AtomicaDataset (#63)

Processed Atomica interactions dataset

* Ume conversion/interaction tokenizer + fix SMILES and nucleotide tokenizers (#65)

add two special tokens: <convert> and <interact> for later stages of Ume training:
will be used as this: (or something like that)
[CLS]  PROT_SEQ  [SEP] <convert> PROT_STRUCT(masked)  [SEP]
[CLS]  PROT_SEQ  [SEP] <interact> SMILES(masked)  [SEP] 
extend functionality of UmeTokenizerTransform to handle dual modalities
change the name of Ume embedding method and allow embedding from existing input_ids
fix existing tokenizers:

add lowercase normalized to nucleotide tokenizer (OG2 dataset contains a mix of upper and lowercase letters)
BPE handled SMILES tokenization incorrectly, switch to WordLevel

* Ume SMILES tokenizer fix (#66)

* tokenizer

* fix tests

* lowercase normalizer for nt

* tests

* remove mod conv dataset

* embed

* Test

* merge 2mod into UmeTokenizerTransform

* fix tests

* all

* type hints

* docstrings

* tests

* fix SMILES tokenizer

* switch all tokenizer to BPE

* Revert "switch all tokenizer to BPE"

This reverts commit 367e77d.

* tok

* fix SMILES tokenizer

* remove print statement

* Ume perplexity logging (#67)

* pplx

* tests

* src

* ignore torchmetrics warnings

* docstrings

* docstrings

* Update README.md (#69)

* Ume fix perplexity device (#68)

* pplx as attr

* pplx as attr

* pplx

* comments

* on step

* comment

* update tests, fix ruff

* ruff

* ruff ruff

* Add <cls_modality> to Ume tokenizers (#71)

* add <cls_modality> tokens

* add <cls_modality> tokens

* docstring

* RNS metric implementation  (#73)

* add <cls_modality> tokens

* add <cls_modality> tokens

* modality embeddings

* module dict

* embeddings

* tests

* modality and device

* rank zero only

* rank zero

* fix back modality mask

* sync dist

* RNS implementation

* restore from main

* restore

* docstrings

* docstrings

* review

* test

* Ume modality-specific embeddings (#72)

* add <cls_modality> tokens

* add <cls_modality> tokens

* modality embeddings

* module dict

* embeddings

* tests

* modality and device

* rank zero only

* rank zero

* fix back modality mask

* sync dist

* add conversion transforms (#74)

* add initial smiles to peptide and peptide to smiles transforms

* remove smiles -> * transforms and touch up conversion functions

* rename

* add option to randomize smiles and caps

---------

Co-authored-by: Colin Grambow <grambowc@gene.com>

* fix def pad token, replace process_and_embed w/ ume.embed

* update tests w -100 pad token

---------

Co-authored-by: Taylor Joren <joren.taylor@gene.com>
Co-authored-by: Karina Zadorozhny <karina.zadorozhny@gmail.com>
Co-authored-by: Nathan Frey <ncfrey@users.noreply.github.com>
Co-authored-by: Colin Grambow <17198155+cgrambow@users.noreply.github.com>
Co-authored-by: Colin Grambow <grambowc@gene.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants