Skip to content

Separate TransformerEmbedding layer#33

Merged
wconstab merged 1 commit intomainfrom
whc/modular
Feb 2, 2024
Merged

Separate TransformerEmbedding layer#33
wconstab merged 1 commit intomainfrom
whc/modular

Conversation

@wconstab
Copy link
Contributor

@wconstab wconstab commented Feb 2, 2024

Make it easier to chop Transformer into pieces for PP

@wconstab wconstab requested a review from wanchaol February 2, 2024 01:07
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 2, 2024
Copy link
Contributor

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

return self.w2(F.silu(self.w1(x)) * self.w3(x))


class TransformerEmbedding(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe call it RotaryEmbedding as it does not only do plain embedding but also compute the freqs_cis?

Make it easier to chop Transformer into pieces for PP
@wconstab wconstab merged commit b99af33 into main Feb 2, 2024
@wconstab wconstab deleted the whc/modular branch February 2, 2024 18:46
wconstab added a commit that referenced this pull request Apr 10, 2024
Avoid diverging the model structure (FQNs and checkpoint
interoperability) with similar models.

This reverts commit b99af33.

ghstack-source-id: 0f93b1b
Pull Request resolved: #214
wconstab added a commit that referenced this pull request Apr 10, 2024
Avoid diverging the model structure (FQNs and checkpoint
interoperability) with similar models.

This reverts commit b99af33.

ghstack-source-id: 8e27f08
Pull Request resolved: #214
wconstab added a commit that referenced this pull request Apr 10, 2024
Avoid diverging the model structure (FQNs and checkpoint
interoperability) with similar models.

This reverts commit b99af33.

ghstack-source-id: 3f2d31e
Pull Request resolved: #214
wconstab added a commit that referenced this pull request Apr 10, 2024
Avoid diverging the model structure (FQNs and checkpoint
interoperability) with similar models.

This reverts commit b99af33.

ghstack-source-id: 9811f5f
Pull Request resolved: #214
wconstab added a commit that referenced this pull request Apr 10, 2024
Avoid diverging the model structure (FQNs and checkpoint
interoperability) with similar models.

This reverts commit b99af33.

ghstack-source-id: 9811f5f
Pull Request resolved: #214
lessw2020 pushed a commit that referenced this pull request Apr 18, 2024
Make it easier to chop Transformer into pieces for PP
lessw2020 pushed a commit that referenced this pull request Apr 18, 2024
Avoid diverging the model structure (FQNs and checkpoint
interoperability) with similar models.

This reverts commit b99af33.

ghstack-source-id: 9811f5f
Pull Request resolved: #214
philippguevorguian referenced this pull request in YerevaNN/YNNtitan Aug 17, 2024
Avoid diverging the model structure (FQNs and checkpoint
interoperability) with similar models.

This reverts commit f30202c.

ghstack-source-id: 9811f5f
Pull Request resolved: pytorch#214
payoto pushed a commit to graphcore-research/torchtitan-fork that referenced this pull request Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants