Skip to content

Implement Deepseek-V3 model skeleton#1315

Merged
wwwjn merged 10 commits intodeepseek-v3from
dsv3-model
Jun 18, 2025
Merged

Implement Deepseek-V3 model skeleton#1315
wwwjn merged 10 commits intodeepseek-v3from
dsv3-model

Conversation

@wwwjn
Copy link
Contributor

@wwwjn wwwjn commented Jun 17, 2025

Contents

  1. Attention module
  2. MoE module (note: I only implemented the naive routing, not the "node limit routing" strategy)
  3. Deepseek-V3 model

Reference:

  1. Deepseek-ai: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.py
  2. Huggingface: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/modeling_deepseek.py
  3. torchtitan/experiment/deepseek-v3
  4. torchtitan/experiment/llama4

TODO

  • Further clean up the DeepseekV3ModelArgs class, remove unused model args
  • Test forward pass w/ torchtitan

@wwwjn wwwjn requested review from fegin and tianyu-l as code owners June 17, 2025 22:57
@wwwjn wwwjn self-assigned this Jun 17, 2025
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 17, 2025
@wwwjn wwwjn requested a review from H-Huang June 17, 2025 22:57
@wwwjn wwwjn removed the request for review from fegin June 18, 2025 00:28
Copy link
Member

@H-Huang H-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

from .args import DeepseekV3ModelArgs


# Reference: torchtitan/experiments/llama4/model/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, how come we are using the llama4 one instead of the original?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly because the llama4 version implemented GroupedExperts, which could use _grouped_mm to speed up the calculation of experts. The Router and MoE implementation should be almost the same, and I double checked the forward pass and make some changes on llama4 version to align with Deepseek-v3 original version

@wwwjn wwwjn merged commit 6c3369a into deepseek-v3 Jun 18, 2025
4 of 5 checks passed
H-Huang pushed a commit to H-Huang/torchtitan that referenced this pull request Jun 26, 2025
## Contents
1. Attention module
2. MoE module (note: I only implemented the naive routing, not the "node
limit routing" strategy)
3. Deepseek-V3 model

Reference:
1. Deepseek-ai:
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.py
4. Huggingface:
https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/modeling_deepseek.py
5. torchtitan/experiment/deepseek-v3
6. torchtitan/experiment/llama4

## TODO
- [ ] Further clean up the DeepseekV3ModelArgs class, remove unused
model args
- [ ] Test forward pass w/ torchtitan
wwwjn added a commit that referenced this pull request Jul 1, 2025
## Contents
1. Attention module
2. MoE module (note: I only implemented the naive routing, not the "node
limit routing" strategy)
3. Deepseek-V3 model

Reference:
1. Deepseek-ai:
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.py
4. Huggingface:
https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/modeling_deepseek.py
5. torchtitan/experiment/deepseek-v3
6. torchtitan/experiment/llama4

## TODO
- [ ] Further clean up the DeepseekV3ModelArgs class, remove unused
model args
- [ ] Test forward pass w/ torchtitan
wwwjn added a commit that referenced this pull request Jul 1, 2025
## Contents
1. Attention module
2. MoE module (note: I only implemented the naive routing, not the "node
limit routing" strategy)
3. Deepseek-V3 model

Reference:
1. Deepseek-ai:
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.py
4. Huggingface:
https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/modeling_deepseek.py
5. torchtitan/experiment/deepseek-v3
6. torchtitan/experiment/llama4

## TODO
- [ ] Further clean up the DeepseekV3ModelArgs class, remove unused
model args
- [ ] Test forward pass w/ torchtitan
wwwjn added a commit that referenced this pull request Jul 2, 2025
## Contents
1. Attention module
2. MoE module (note: I only implemented the naive routing, not the "node
limit routing" strategy)
3. Deepseek-V3 model

Reference:
1. Deepseek-ai:
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.py
4. Huggingface:
https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/modeling_deepseek.py
5. torchtitan/experiment/deepseek-v3
6. torchtitan/experiment/llama4

## TODO
- [ ] Further clean up the DeepseekV3ModelArgs class, remove unused
model args
- [ ] Test forward pass w/ torchtitan
@wwwjn wwwjn deleted the dsv3-model branch July 2, 2025 21:24
H-Huang pushed a commit to H-Huang/torchtitan that referenced this pull request Jul 3, 2025
## Contents
1. Attention module
2. MoE module (note: I only implemented the naive routing, not the "node
limit routing" strategy)
3. Deepseek-V3 model

Reference:
1. Deepseek-ai:
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.py
4. Huggingface:
https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/modeling_deepseek.py
5. torchtitan/experiment/deepseek-v3
6. torchtitan/experiment/llama4

## TODO
- [ ] Further clean up the DeepseekV3ModelArgs class, remove unused
model args
- [ ] Test forward pass w/ torchtitan
H-Huang pushed a commit to H-Huang/torchtitan that referenced this pull request Jul 8, 2025
## Contents
1. Attention module
2. MoE module (note: I only implemented the naive routing, not the "node
limit routing" strategy)
3. Deepseek-V3 model

Reference:
1. Deepseek-ai:
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.py
4. Huggingface:
https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/modeling_deepseek.py
5. torchtitan/experiment/deepseek-v3
6. torchtitan/experiment/llama4

## TODO
- [ ] Further clean up the DeepseekV3ModelArgs class, remove unused
model args
- [ ] Test forward pass w/ torchtitan
wwwjn added a commit that referenced this pull request Jul 8, 2025
## Contents
1. Attention module
2. MoE module (note: I only implemented the naive routing, not the "node
limit routing" strategy)
3. Deepseek-V3 model

Reference:
1. Deepseek-ai:
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.py
4. Huggingface:
https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/modeling_deepseek.py
5. torchtitan/experiment/deepseek-v3
6. torchtitan/experiment/llama4

## TODO
- [ ] Further clean up the DeepseekV3ModelArgs class, remove unused
model args
- [ ] Test forward pass w/ torchtitan
wwwjn added a commit that referenced this pull request Jul 10, 2025
## Contents
1. Attention module
2. MoE module (note: I only implemented the naive routing, not the "node
limit routing" strategy)
3. Deepseek-V3 model

Reference:
1. Deepseek-ai:
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.py
4. Huggingface:
https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/modeling_deepseek.py
5. torchtitan/experiment/deepseek-v3
6. torchtitan/experiment/llama4

## TODO
- [ ] Further clean up the DeepseekV3ModelArgs class, remove unused
model args
- [ ] Test forward pass w/ torchtitan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants