Skip to content

Ume datamodule, iterable datasets, concat and multiplexed datasets, throughput+tokens per second callback#39

Merged
karinazad merged 155 commits intomainfrom
ume-datamodule
Mar 3, 2025
Merged

Ume datamodule, iterable datasets, concat and multiplexed datasets, throughput+tokens per second callback#39
karinazad merged 155 commits intomainfrom
ume-datamodule

Conversation

@karinazad
Copy link
Collaborator

No description provided.

max_length: 512
tokenizer_dir: pmlm_tokenizer
embedding_layer: linear_pos
hidden_size: 252
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to drop these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were dropped in the config because when we provide the model config name (mini, medium), it would receive two hidden_size args

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants