Add beta support for jsd by Tcc0403 · Pull Request #290 · linkedin/Liger-Kernel

Tcc0403 · 2024-10-01T22:15:35Z

Summary

Resolve #278 .

Details

Forward:

$$\begin{align} JSD(X, Y, \beta) &= JSD_{\beta}(P \Vert Q)\\\ &= \beta\ KL(P \Vert \beta P + (1-\beta)Q) + (1-\beta)\ KL(Q \Vert \beta P + (1-\beta)Q)\\\ &= \sum \beta\ PY + (1-\beta)QX - M\ logM \end{align}$$

where $X=logQ$, $Y=logP$ and $M=\beta P + (1-\beta)Q$.

Gradients:

$$\frac{\partial}{\partial X_i} JSD(X, Y, \beta) = (1-\beta)Q_i(X_i - logM_i)$$

Testing Done

Hardware Type: H100
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

…orrectly

Tcc0403 · 2024-10-02T09:05:09Z


-    def forward(self, p, q):
-        return LigerJSDFunction.apply(p, q)
+    def forward(self, log_q, log_p):


This is the correct order of input and target (student and teacher) respectively. would it be too confusing?

yeah, the name is a bit confusing, or we can add some descriptions here to clarify

Tcc0403 · 2024-10-02T09:32:04Z

@qingquansong @yundai424 ready for review!

qingquansong

LGTM in general! In case you're interested, I think one good future work is to make those KL or JSD losses similar to the fuse CE loss: feed teacher and student model last projection layer to the kernel and fuse it with the losses. Here teacher weight does not need grad and student will need grad.

qingquansong · 2024-10-02T17:52:54Z


-    def forward(self, p, q):
-        return LigerJSDFunction.apply(p, q)
+    def forward(self, log_q, log_p):


yeah, the name is a bit confusing, or we can add some descriptions here to clarify

ByronHsu · 2024-10-02T20:08:30Z

awesome work! waiting for the final nit review

Tcc0403 · 2024-10-02T22:01:18Z

I think one good future work is to make those KL or JSD losses similar to the fuse CE loss: feed teacher and student model last projection layer to the kernel and fuse it with the losses. Here teacher weight does not need grad and student will need grad.

@qingquansong sure, I'm in.

Tcc0403 · 2024-10-02T22:21:39Z

Forgot to add jsd in readme and liger_kernel.transformer

Tcc0403 added 5 commits October 2, 2024 06:12

Add beta support for jsd forward pass

e093b0c

Merge branch 'main' into jsd-beta

087e35a

Fix reference torch JSD

028710f

Add functional

3a1727b

Rename parameters of forward function to indicate _input and target c…

90e5a44

…orrectly

Tcc0403 marked this pull request as ready for review October 2, 2024 08:43

Tcc0403 commented Oct 2, 2024

View reviewed changes

Tcc0403 added 3 commits October 2, 2024 15:13

Update benchmark data

6db95c9

Update the jsd benchmark script

d8a6cac

Update jsd benchmark data

a5d0352

qingquansong previously approved these changes Oct 2, 2024

View reviewed changes

Add discriptions of LigerJSD and modify the reference model with lerp

96ba54e

Tcc0403 dismissed qingquansong’s stale review via 96ba54e October 2, 2024 20:20

Merge branch 'main' into jsd-beta

43c5fe1

Tcc0403 requested a review from qingquansong October 2, 2024 21:40

lancerts previously approved these changes Oct 2, 2024

View reviewed changes

lancerts enabled auto-merge (squash) October 2, 2024 21:42

Merge branch 'main' into jsd-beta

1817598

Tcc0403 added 2 commits October 3, 2024 06:36

Add JSD to README and transformer.__init__.py

b07be0a

Merge branch 'jsd-beta' of github.com:Tcc0403/Liger-Kernel into jsd-beta

b1fe5a8

auto-merge was automatically disabled October 2, 2024 22:38
Head branch was pushed to by a user without write access

Tcc0403 dismissed lancerts’s stale review via b1fe5a8 October 2, 2024 22:38

lancerts approved these changes Oct 2, 2024

View reviewed changes

lancerts enabled auto-merge (squash) October 2, 2024 23:15

lancerts merged commit 6817c2d into linkedin:main Oct 3, 2024

Tcc0403 deleted the jsd-beta branch December 1, 2024 03:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add beta support for jsd #290

Add beta support for jsd #290
lancerts merged 13 commits into
linkedin:mainfrom
Tcc0403:jsd-beta

Tcc0403 commented Oct 1, 2024 •

edited

Loading

Uh oh!

Tcc0403 Oct 2, 2024

Uh oh!

qingquansong Oct 2, 2024

Uh oh!

Tcc0403 commented Oct 2, 2024

Uh oh!

qingquansong left a comment

Uh oh!

Uh oh!

qingquansong Oct 2, 2024

Uh oh!

ByronHsu commented Oct 2, 2024

Uh oh!

Tcc0403 commented Oct 2, 2024

Uh oh!

Tcc0403 commented Oct 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Tcc0403 commented Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Forward:

Gradients:

Testing Done

Uh oh!

Tcc0403 Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

qingquansong Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

Tcc0403 commented Oct 2, 2024

Uh oh!

qingquansong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qingquansong Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

ByronHsu commented Oct 2, 2024

Uh oh!

Tcc0403 commented Oct 2, 2024

Uh oh!

Tcc0403 commented Oct 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Tcc0403 commented Oct 1, 2024 •

edited

Loading