Skip to content

Fix print precision and match numpy behavior#12746

Closed
ailzhang wants to merge 16 commits intopytorch:masterfrom
ailzhang:fix_print_prec
Closed

Fix print precision and match numpy behavior#12746
ailzhang wants to merge 16 commits intopytorch:masterfrom
ailzhang:fix_print_prec

Conversation

@ailzhang
Copy link
Copy Markdown
Contributor

@ailzhang ailzhang commented Oct 17, 2018

Fixes #12578 #9395.

scientific notation is used when absolute value of the smallest number is < 1e-4 or maximum > 1e8 or the ratio of the maximum absolute value to the minimum is > 1e3

I hope I didn't break anything since there seems to be a lot of edge cases here... Here are some easy sanity checks.

# For int tensor, we just print, never use scientific.
In [5]: torch.tensor(1)
Out[5]: tensor(1)
Out[2]: array(1) # numpy

In [6]: torch.tensor(10)
Out[6]: tensor(10)
Out[3]: array(10) # numpy

In [8]: torch.tensor(99000000)
Out[8]: tensor(99000000)
Out[5]: array(99000000) # numpy

In [9]: torch.tensor(100000000)
Out[9]: tensor(100000000)
Out[6]: array(100000000) # numpy

In [10]: torch.tensor(100000001)
Out[10]: tensor(100000001)
Out[7]: array(100000001) # numpy

In [11]: torch.tensor(1000000000)
Out[11]: tensor(1000000000)
Out[8]: array(1000000000) # numpy

In [12]: torch.tensor([1, 1000])
Out[12]: tensor([   1, 1000])
Out[9]: array([   1, 1000]) # numpy

In [13]: torch.tensor([1, 1010])
Out[13]: tensor([   1, 1010])
Out[10]: array([   1, 1010]) # numpy

For floating points, we use scientific when max/min > 1000 || max > 1e8 || min < 1e-4
Lines with "old" are old behaviors that either has precision issue, or not aligned with numpy

In [14]: torch.tensor(0.01)
Out[14]: tensor(0.0100)
Out[11]: array(0.01) # numpy

In [15]: torch.tensor(0.1)
Out[15]: tensor(0.1000)
Out[12]: array(0.1) # numpy

In [16]: torch.tensor(0.0001)
Out[16]: tensor(0.0001)
Out[14]: array(0.0001) # numpy

In [17]: torch.tensor(0.00002)
Out[17]: tensor(2.0000e-05)
Out[15]: array(2e-05) # numpy
Out[5]: tensor(0.0000) # old

In [18]: torch.tensor(1e8)
Out[18]: tensor(100000000.)
Out[16]: array(100000000.0) # numpy

In [19]: torch.tensor(1.1e8)
Out[19]: tensor(1.1000e+08)
Out[17]: array(1.1e8) # numpy 1.14.5, In <= 1.13 this was not using scientific print
Out[10]: tensor(110000000.) # old

In [20]: torch.tensor([0.01, 10.])
Out[20]: tensor([ 0.0100, 10.0000])
Out[18]: array([  0.01,  10.  ]) # numpy

In [21]: torch.tensor([0.01, 11.])
Out[21]: tensor([1.0000e-02, 1.1000e+01])
Out[19]: array([  1.00000000e-02,   1.10000000e+01]) # numpy
Out[7]: tensor([ 0.0100, 11.0000]) # old

When print floating number in int mode, we still need to respect rules to use scientific mode first

In [22]: torch.tensor([1., 1000.])
Out[22]: tensor([   1., 1000.])
Out[20]: array([    1.,  1000.]) # numpy

In [23]: torch.tensor([1., 1010.])
Out[23]: tensor([1.0000e+00, 1.0100e+03])
Out[21]: array([  1.00000000e+00,   1.01000000e+03]) # numpy
Out[9]: tensor([   1., 1010.]) # old

Copy link
Copy Markdown
Collaborator

@soumith soumith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test cases in the issue description, where you match numpy behavior,, make them test cases.

Comment thread torch/_tensor_str.py Outdated
self.max_width = 1

# use tensor_view for 0-dim tensor iteration
tensor_view = tensor.view(tensor.nelement())

This comment was marked as off-topic.

@ailzhang
Copy link
Copy Markdown
Contributor Author

ailzhang commented Oct 17, 2018

@soumith I added a few tests. There is one thing left todo:
As you can see from examples above, Numpy does automatic trimming and padding while we use a fixed precision of 4. Numpy default is 8. But this causes a problem when we try to print a really long tensor, we actually force the print precision to be 4:

>>> np.array([123456789.])
array([1.23456789e+08])
>>> torch.tensor([123456789.])
tensor([1.2346e+08])

I actually think we should port the dragon4_scientific code from Numpy here to make the print prettier. But it should be in a separate PR and depends on the priority. I opened #12797 for it.
https://github.com/numpy/numpy/blob/f36d2d4d3f622f7901e3d5ade13e04fc05062948/numpy/core/src/multiarray/multiarraymodule.c#L3530

Copy link
Copy Markdown
Collaborator

@soumith soumith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Instead of separate expect files per each string, it feels like it's better to inline the expected strings in the tests. Making a separate file doesn't add a lot of value here, and one has to jump 1 extra file to find out what the expected value is supposed to be.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ailzhang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Oct 18, 2018

I implemented inline expect tests in my spare time. Let me put it in PyTorch.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ailzhang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ailzhang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ailzhang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang ezyang added the merged label Jun 25, 2019
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Fixes pytorch#12578 pytorch#9395.

* Fix and simplify print logic

* Follow numpy print rule https://github.com/numpy/numpy/blob/eb2bd11870731ea19a0eee72e616c7deb00f6c54/numpy/core/arrayprint.py#L859
> scientific notation is used when absolute value of the smallest number is < 1e-4 or maximum > 1e8 or the ratio of the maximum absolute value to the minimum is > 1e3

I hope I didn't break anything since there seems to be a lot of edge cases here... Here are some easy sanity checks.
```
In [5]: torch.tensor(1)
Out[5]: tensor(1)
Out[2]: array(1) # numpy

In [6]: torch.tensor(10)
Out[6]: tensor(10)
Out[3]: array(10) # numpy

In [8]: torch.tensor(99000000)
Out[8]: tensor(99000000)
Out[5]: array(99000000) # numpy

In [9]: torch.tensor(100000000)
Out[9]: tensor(100000000)
Out[6]: array(100000000) # numpy

In [10]: torch.tensor(100000001)
Out[10]: tensor(100000001)
Out[7]: array(100000001) # numpy

In [11]: torch.tensor(1000000000)
Out[11]: tensor(1000000000)
Out[8]: array(1000000000) # numpy

In [12]: torch.tensor([1, 1000])
Out[12]: tensor([   1, 1000])
Out[9]: array([   1, 1000]) # numpy

In [13]: torch.tensor([1, 1010])
Out[13]: tensor([   1, 1010])
Out[10]: array([   1, 1010]) # numpy
```
For floating points, we use scientific when `max/min > 1000 || max > 1e8 || min < 1e-4`
Lines with "old" are old behaviors that either has precision issue, or not aligned with numpy
```
In [14]: torch.tensor(0.01)
Out[14]: tensor(0.0100)
Out[11]: array(0.01) # numpy

In [15]: torch.tensor(0.1)
Out[15]: tensor(0.1000)
Out[12]: array(0.1) # numpy

In [16]: torch.tensor(0.0001)
Out[16]: tensor(0.0001)
Out[14]: array(0.0001) # numpy

In [17]: torch.tensor(0.00002)
Out[17]: tensor(2.0000e-05)
Out[15]: array(2e-05) # numpy
Out[5]: tensor(0.0000) # old

In [18]: torch.tensor(1e8)
Out[18]: tensor(100000000.)
Out[16]: array(100000000.0) # numpy

In [19]: torch.tensor(1.1e8)
Out[19]: tensor(1.1000e+08)
Out[17]: array(1.1e8) # numpy 1.14.5, In <= 1.13 this was not using scientific print
Out[10]: tensor(110000000.) # old

In [20]: torch.tensor([0.01, 10.])
Out[20]: tensor([ 0.0100, 10.0000])
Out[18]: array([  0.01,  10.  ]) # numpy

In [21]: torch.tensor([0.01, 11.])
Out[21]: tensor([1.0000e-02, 1.1000e+01])
Out[19]: array([  1.00000000e-02,   1.10000000e+01]) # numpy
Out[7]: tensor([ 0.0100, 11.0000]) # old
```
When print floating number in int mode, we still need to respect rules to use scientific mode first
```
In [22]: torch.tensor([1., 1000.])
Out[22]: tensor([   1., 1000.])
Out[20]: array([    1.,  1000.]) # numpy

In [23]: torch.tensor([1., 1010.])
Out[23]: tensor([1.0000e+00, 1.0100e+03])
Out[21]: array([  1.00000000e+00,   1.01000000e+03]) # numpy
Out[9]: tensor([   1., 1010.]) # old
```
Pull Request resolved: pytorch#12746

Differential Revision: D10443800

Pulled By: ailzhang

fbshipit-source-id: f5e4e3fe9bf0b44af2c64c93a9ed42b73fa613f5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Printing: mismatch result with numpy in pytorch 0.4

5 participants