Skip to content

Add 3D Pinder to Ume datamodule#45

Merged
karinazad merged 6 commits intomainfrom
ume-3d-pinder
Mar 13, 2025
Merged

Add 3D Pinder to Ume datamodule#45
karinazad merged 6 commits intomainfrom
ume-3d-pinder

Conversation

@karinazad
Copy link
Collaborator

No description provided.

model_name: UME_medium
vocab_size: 640
model_name: UME_mini
vocab_size: 889
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: update

"""Samples a single tokenized input from a batch of tokenized inputs.

Meant to be used with tokenized inputs from 3D coordinates latent generator
dataset (e.g. LatentGeneratorPinderIterableDataset) which containts 4 poses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"contains"

@karinazad karinazad merged commit 5ac44c4 into main Mar 13, 2025
5 checks passed
@karinazad karinazad deleted the ume-3d-pinder branch March 13, 2025 23:35
taylormjs pushed a commit that referenced this pull request Mar 17, 2025
taylormjs pushed a commit that referenced this pull request Mar 18, 2025
taylormjs added a commit that referenced this pull request Mar 18, 2025
* add peer datasets

* fix ruff

* add peer datasets, fix get item for all tasks

* lg tokenizer

* lg tokenizer assets

* lint

* added test and new wor level model v bpe

* rename to include coord tokenization explicity

* ruff tests

* dataset hg

* ruff

* nathans comments

* forgot to add files

* Add 3D Pinder to Ume datamodule (#45)

* Ume tokenizers and dataset sampling options (#46)

Creates mutually-compatible multi-modal Ume tokenizers
Ensures that the vocab size is a multiple of 64
Adds an option to use multiplex sampler in Ume datamodule (in addition to round robin concatenation)
Adds stopping conditions to round robin concatenation

* add peer datasets

* peer callback 1

* add tests

* fix ruff reformat

* fix per-residue task logic

* ruff readd spaces

* ruff check

* remove unnecessary download arg

* ruff checks

* uv update

---------

Co-authored-by: Taylor Joren <joren.taylor@gene.com>
Co-authored-by: Sidney Lisanza <lisanzas@gene.com>
Co-authored-by: karinazad <karina.zadorozhny@gmail.com>
ncfrey pushed a commit that referenced this pull request Mar 19, 2025
* add peer datasets

* fix ruff

* add peer datasets, fix get item for all tasks

* lg tokenizer

* lg tokenizer assets

* lint

* added test and new wor level model v bpe

* rename to include coord tokenization explicity

* ruff tests

* dataset hg

* ruff

* nathans comments

* forgot to add files

* Add 3D Pinder to Ume datamodule (#45)

* Ume tokenizers and dataset sampling options (#46)

Creates mutually-compatible multi-modal Ume tokenizers
Ensures that the vocab size is a multiple of 64
Adds an option to use multiplex sampler in Ume datamodule (in addition to round robin concatenation)
Adds stopping conditions to round robin concatenation

* add peer datasets

* peer callback 1

* add tests

* fix ruff reformat

* fix per-residue task logic

* ruff readd spaces

* ruff check

* remove unnecessary download arg

* ruff checks

* uv update

---------

Co-authored-by: Taylor Joren <joren.taylor@gene.com>
Co-authored-by: Sidney Lisanza <lisanzas@gene.com>
Co-authored-by: karinazad <karina.zadorozhny@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants