Fix #11752: fix numerical issue in log_softmax#21672
Fix #11752: fix numerical issue in log_softmax#21672wolegechu wants to merge 13 commits intopytorch:masterfrom wolegechu:fix_numerical_issue_in_log_softmax
Conversation
test/test_nn.py
Outdated
|
|
||
| def test_log_softmax(self): | ||
| x_small = torch.ones(1, 2, dtype=torch.float32) | ||
| x_big = x_small * 1e16 |
There was a problem hiding this comment.
softmax and logsoftmax should be invariant uner addition, but not multiplication, right? (multilicative factor is inverse temperature, right?)
should this line read x_big = x_small + 1e16?
There was a problem hiding this comment.
Oops..., this test works in the special case, but it's a mistake. I've written a new commit.
| scalar_t max_input = max_input_arr[j]; | ||
| vec256::map( | ||
| [tmp_sum](Vec x) { return x - Vec(tmp_sum); }, | ||
| [tmp_sum, max_input](Vec x) { return x - Vec(max_input) - Vec(tmp_sum); }, |
There was a problem hiding this comment.
Does the compiler optimize this into computing max_input and tmp_sum before the :map?
There was a problem hiding this comment.
It shouldn’t be optimized to compute before.
In some cases, that input is large digits and the difference is small so that the preprocessed input value would be ignored (in the old way). You can see the example here. #11752 (comment)
And I have looked up the log_sofmax implementation in MXNet and TensorFlow. They also write like x - Vec(max_input) - Vec(tmp_sum) together to ensure the computing order.
There was a problem hiding this comment.
Could you add a comment here so people won't "optimize" this away in future? Thanks!
…echu/pytorch into fix_numerical_issue_in_log_softmax
wolegechu
left a comment
There was a problem hiding this comment.
@VitalyFedyunin So strange... I merged the 'test/test_nn.py' file from upstream:master branch to my branch. Now Github highlights the codes in green that don't belong to me.
…echu/pytorch into fix_numerical_issue_in_log_softmax
I introduced some new commits, which had been merged into master after I created this PR. @VitalyFedyunin emmmmm...Anyway, I have removed all the unrelated changes. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: pytorch/pytorch#11866 has corrected this issue in function `host_softmax` (aten/src/ATen/native/SoftMax.cpp). But I tried the example proposed in pytorch/pytorch#11752. `log_softmax` is still not working for big logits. I have looked into the source code, found that example had called `vec_host_softmax_lastdim`, not `host_softmax`. This code fixes the issue in `_vec_log_softmax_lastdim` and has a test for `log_softmax`. Pull Request resolved: pytorch/pytorch#21672 Differential Revision: D15856327 Pulled By: VitalyFedyunin fbshipit-source-id: 7a1fd3c0a03d366c99eb873e235361e4fcfa7567
|
@VitalyFedyunin merged this pull request in b403b10. |
#11866 has corrected this issue in function
host_softmax(aten/src/ATen/native/SoftMax.cpp). But I tried the example proposed in #11752.log_softmaxis still not working for big logits.I have looked into the source code, found that example had called
vec_host_softmax_lastdim, nothost_softmax.This code fixes the issue in
_vec_log_softmax_lastdimand has a test forlog_softmax.