🚀 The feature, motivation and pitch
We have had a great JSD kernel implemented in this PR and this request is for helping extend with an optional beta arg to create the mixed distribution between student and teacher. When beta=0, it should be equivalent to forward KL and beta=1 it should be equivalent to reverse KL. Note: the corner case could be tricky. More details can be found in paper: https://arxiv.org/abs/2306.13649
Alternatives
No response
Additional context
No response
🚀 The feature, motivation and pitch
We have had a great JSD kernel implemented in this PR and this request is for helping extend with an optional beta arg to create the mixed distribution between student and teacher. When beta=0, it should be equivalent to forward KL and beta=1 it should be equivalent to reverse KL. Note: the corner case could be tricky. More details can be found in paper: https://arxiv.org/abs/2306.13649
Alternatives
No response
Additional context
No response