Implement Deepseek-V3 model skeleton by wwwjn · Pull Request #1315 · pytorch/torchtitan

wwwjn · 2025-06-17T22:57:02Z

TODO

Further clean up the DeepseekV3ModelArgs class, remove unused model args
Test forward pass w/ torchtitan

torchtitan/models/deepseek-v3/model/model.py

H-Huang

LGTM!

torchtitan/models/deepseek-v3/model/model.py

H-Huang · 2025-06-18T14:18:57Z

torchtitan/models/deepseek-v3/model/moe.py

+from .args import DeepseekV3ModelArgs
+
+
+# Reference: torchtitan/experiments/llama4/model/


just curious, how come we are using the llama4 one instead of the original?

Mainly because the llama4 version implemented GroupedExperts, which could use _grouped_mm to speed up the calculation of experts. The Router and MoE implementation should be almost the same, and I double checked the forward pass and make some changes on llama4 version to align with Deepseek-v3 original version

## Contents 1. Attention module 2. MoE module (note: I only implemented the naive routing, not the "node limit routing" strategy) 3. Deepseek-V3 model Reference: 1. Deepseek-ai: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.py 4. Huggingface: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/modeling_deepseek.py 5. torchtitan/experiment/deepseek-v3 6. torchtitan/experiment/llama4 ## TODO - [ ] Further clean up the DeepseekV3ModelArgs class, remove unused model args - [ ] Test forward pass w/ torchtitan

wwwjn requested review from fegin and tianyu-l as code owners June 17, 2025 22:57

wwwjn self-assigned this Jun 17, 2025

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 17, 2025

wwwjn requested a review from H-Huang June 17, 2025 22:57

wwwjn added 7 commits June 17, 2025 17:28

dsv3 attention block

e7a0af1

add attention block

e9f925f

add MoE

bd6e2b8

clean moe

709ab81

clean

d6bd731

models

2a4bc36

clean up

722f6e2

wwwjn force-pushed the dsv3-model branch from 0654d4e to 722f6e2 Compare June 18, 2025 00:28

wwwjn removed the request for review from fegin June 18, 2025 00:28

fegin reviewed Jun 18, 2025

View reviewed changes

H-Huang approved these changes Jun 18, 2025

View reviewed changes

wwwjn added 3 commits June 18, 2025 12:07

add moe in transformer block

6003e87

docstring

26cccdd

lint

a1f925e

wwwjn merged commit 6c3369a into deepseek-v3 Jun 18, 2025
4 of 5 checks passed

wwwjn deleted the dsv3-model branch July 2, 2025 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Deepseek-V3 model skeleton#1315

Implement Deepseek-V3 model skeleton#1315
wwwjn merged 10 commits intodeepseek-v3from
dsv3-model

wwwjn commented Jun 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

H-Huang left a comment

Uh oh!

Uh oh!

Uh oh!

H-Huang Jun 18, 2025

Uh oh!

wwwjn Jun 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		from .args import DeepseekV3ModelArgs


		# Reference: torchtitan/experiments/llama4/model/

Conversation

wwwjn commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contents

TODO

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

H-Huang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

H-Huang Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wwwjn commented Jun 17, 2025 •

edited

Loading