Skip to content

ATen Unary Ops#6030

Merged
colesbury merged 2 commits intopytorch:masterfrom
cpuhrsch:unops
Mar 28, 2018
Merged

ATen Unary Ops#6030
colesbury merged 2 commits intopytorch:masterfrom
cpuhrsch:unops

Conversation

@cpuhrsch
Copy link
Copy Markdown
Contributor

@cpuhrsch cpuhrsch commented Mar 27, 2018

Implements a few unary operations for which there are AVX intrinsics.

The perf comparison script is here.

EDIT: Appended the last column of "This branch" timings for readability
EDIT: Removed clone to become part of a larger effort around copy
EDIT: I will also add log, exp, cos, sin to this PR
EDIT: Strided case and log, exp, cos, sin will be handled in another PR
EDIT: Includes better formatting

Command - single core

taskset -c 0 python unary_comp.py

Compiled with gcc5.4.0 for python2.7

Master

ceil:  size: 10^0      count: 1000000  elapsed: 6.72463703156   this elapsed: 2.06619906425
ceil:  size: 10^1      count: 100000   elapsed: 5.31660294533   this elapsed: 0.451114177704
ceil:  size: 10^2      count: 10000    elapsed: 5.09328198433   this elapsed: 0.458932161331
ceil:  size: 10^3      count: 1000     elapsed: 4.86605906487   this elapsed: 0.425307035446
ceil:  size: 10^4      count: 100      elapsed: 5.73223686218   this elapsed: 1.51420688629
ceil:  size: 10^5      count: 10       elapsed: 5.6767308712    this elapsed: 1.48536610603
floor: size: 10^0      count: 1000000  elapsed: 6.85544586182   this elapsed: 2.23299098015
floor: size: 10^1      count: 100000   elapsed: 5.29100513458   this elapsed: 0.477941036224
floor: size: 10^2      count: 10000    elapsed: 5.09847211838   this elapsed: 0.46099615097
floor: size: 10^3      count: 1000     elapsed: 4.86015892029   this elapsed: 0.422957897186
floor: size: 10^4      count: 100      elapsed: 5.72806310654   this elapsed: 1.51649594307
floor: size: 10^5      count: 10       elapsed: 5.67707300186   this elapsed: 1.48646092415
round: size: 10^0      count: 1000000  elapsed: 11.6632659435   this elapsed: 2.48468589783
round: size: 10^1      count: 100000   elapsed: 12.5859880447   this elapsed: 0.49551987648
round: size: 10^2      count: 10000    elapsed: 12.5193960667   this elapsed: 0.464128017426
round: size: 10^3      count: 1000     elapsed: 12.4613730907   this elapsed: 0.422877073288
round: size: 10^4      count: 100      elapsed: 13.4095180035   this elapsed: 1.50900793076
round: size: 10^5      count: 10       elapsed: 13.3347780704   this elapsed: 1.48473095894
trunc: size: 10^0      count: 1000000  elapsed: 10.8710699081   this elapsed: 2.1772069931
trunc: size: 10^1      count: 100000   elapsed: 11.7434380054   this elapsed: 0.464560985565
trunc: size: 10^2      count: 10000    elapsed: 11.8657200336   this elapsed: 0.454663038254
trunc: size: 10^3      count: 1000     elapsed: 12.1129901409   this elapsed: 0.422108888626
trunc: size: 10^4      count: 100      elapsed: 12.7893900871   this elapsed: 1.51392889023
trunc: size: 10^5      count: 10       elapsed: 12.7805650234   this elapsed: 1.48477101326
sqrt:  size: 10^0      count: 1000000  elapsed: 17.1391599178   this elapsed: 3.0278441906
sqrt:  size: 10^1      count: 100000   elapsed: 17.5117669106   this elapsed: 1.05652308464
sqrt:  size: 10^2      count: 10000    elapsed: 17.5223078728   this elapsed: 0.902415990829
sqrt:  size: 10^3      count: 1000     elapsed: 18.2096610069   this elapsed: 0.851244926453
sqrt:  size: 10^4      count: 100      elapsed: 19.1470270157   this elapsed: 1.71396112442
sqrt:  size: 10^5      count: 10       elapsed: 19.0929000378   this elapsed: 1.65262508392

Command manycore (20 cores)

numactl -m 1  python unary_comp.py

Master

ceil:  size: 10^0      count: 1000000  elapsed: 6.76774907112    this elapsed: 2.08831596375
ceil:  size: 10^1      count: 100000   elapsed: 5.26921415329    this elapsed: 0.460593938828
ceil:  size: 10^2      count: 10000    elapsed: 5.0813229084     this elapsed: 0.503376960754
ceil:  size: 10^3      count: 1000     elapsed: 0.399098873138   this elapsed: 0.141521930695
ceil:  size: 10^4      count: 100      elapsed: 0.709737062454   this elapsed: 0.657402992249
ceil:  size: 10^5      count: 10       elapsed: 0.560704946518   this elapsed: 0.364257097244
floor: size: 10^0      count: 1000000  elapsed: 7.01760697365    this elapsed: 2.20468902588
floor: size: 10^1      count: 100000   elapsed: 5.28479790688    this elapsed: 0.455933094025
floor: size: 10^2      count: 10000    elapsed: 5.08667588234    this elapsed: 0.281991958618
floor: size: 10^3      count: 1000     elapsed: 0.316580057144   this elapsed: 0.0526778697968
floor: size: 10^4      count: 100      elapsed: 0.721877813339   this elapsed: 0.676480054855
floor: size: 10^5      count: 10       elapsed: 0.525444030762   this elapsed: 0.37157702446
round: size: 10^0      count: 1000000  elapsed: 13.678330183     this elapsed: 2.45732808113
round: size: 10^1      count: 100000   elapsed: 12.8416500092    this elapsed: 0.457631111145
round: size: 10^2      count: 10000    elapsed: 12.6348309517    this elapsed: 0.550966978073
round: size: 10^3      count: 1000     elapsed: 0.559453010559   this elapsed: 0.068146944046
round: size: 10^4      count: 100      elapsed: 0.801295995712   this elapsed: 0.669235944748
round: size: 10^5      count: 10       elapsed: 0.601754903793   this elapsed: 0.388570785522
trunc: size: 10^0      count: 1000000  elapsed: 11.7732570171    this elapsed: 2.36533784866
trunc: size: 10^1      count: 100000   elapsed: 11.8114311695    this elapsed: 0.458111047745
trunc: size: 10^2      count: 10000    elapsed: 11.8901190758    this elapsed: 0.452945947647
trunc: size: 10^3      count: 1000     elapsed: 0.453590869904   this elapsed: 0.0806839466095
trunc: size: 10^4      count: 100      elapsed: 0.773014068604   this elapsed: 0.653845071793
trunc: size: 10^5      count: 10       elapsed: 0.518759965897   this elapsed: 0.372617959976
sqrt:  size: 10^0      count: 1000000  elapsed: 14.4136710167    this elapsed: 3.02479696274
sqrt:  size: 10^1      count: 100000   elapsed: 17.2432360649    this elapsed: 1.0565290451
sqrt:  size: 10^2      count: 10000    elapsed: 17.5745770931    this elapsed: 0.80877995491
sqrt:  size: 10^3      count: 1000     elapsed: 0.722177028656   this elapsed: 0.142582178116
sqrt:  size: 10^4      count: 100      elapsed: 0.935206890106   this elapsed: 0.680450201035
sqrt:  size: 10^5      count: 10       elapsed: 0.755648136139   this elapsed: 0.383094072342

This comment was marked as off-topic.

Comment thread aten/src/ATen/native/cpu/Vec256.h Outdated

This comment was marked as off-topic.

@cpuhrsch
Copy link
Copy Markdown
Contributor Author

cpuhrsch commented Mar 27, 2018

Running the perf test 10x longer (10x more counts).

ceil:  size: 10^0 count: 10000000 elapsed: 40.4066889286  this elapsed: 24.6736090183  
ceil:  size: 10^1 count: 1000000  elapsed: 32.4437739849  this elapsed: 4.76001596451  
ceil:  size: 10^2 count: 100000   elapsed: 30.9976239204  this elapsed: 4.56428909302  
ceil:  size: 10^3 count: 10000    elapsed: 31.0188250542  this elapsed: 4.21380710602  
ceil:  size: 10^4 count: 1000     elapsed: 31.398140192   this elapsed: 9.11702895164  
ceil:  size: 10^5 count: 100      elapsed: 38.8919692039  this elapsed: 14.6707389355  
floor: size: 10^0 count: 10000000 elapsed: 39.4561290741  this elapsed: 23.4846880436  
floor: size: 10^1 count: 1000000  elapsed: 31.4482679367  this elapsed: 4.76394104958  
floor: size: 10^2 count: 100000   elapsed: 29.870413065   this elapsed: 4.56856608391  
floor: size: 10^3 count: 10000    elapsed: 29.8896141052  this elapsed: 4.23216915131  
floor: size: 10^4 count: 1000     elapsed: 30.3636591434  this elapsed: 9.24822616577  
floor: size: 10^5 count: 100      elapsed: 37.6921019554  this elapsed: 14.7144429684  
round: size: 10^0 count: 10000000 elapsed: 61.2804529667  this elapsed: 25.2223968506  
round: size: 10^1 count: 1000000  elapsed: 77.1899559498  this elapsed: 5.10782623291  
round: size: 10^2 count: 100000   elapsed: 77.4516279697  this elapsed: 4.61217093468  
round: size: 10^3 count: 10000    elapsed: 77.2506480217  this elapsed: 4.24468517303  
round: size: 10^4 count: 1000     elapsed: 78.2129211426  this elapsed: 9.26816487312  
round: size: 10^5 count: 100      elapsed: 86.2564959526  this elapsed: 14.7185451984  
trunc: size: 10^0 count: 10000000 elapsed: 47.831428051   this elapsed: 23.7832829952  
trunc: size: 10^1 count: 1000000  elapsed: 67.1300928593  this elapsed: 4.97098398209  
trunc: size: 10^2 count: 100000   elapsed: 68.6582689285  this elapsed: 4.57020521164  
trunc: size: 10^3 count: 10000    elapsed: 68.187787056   this elapsed: 4.23301100731  
trunc: size: 10^4 count: 1000     elapsed: 69.8524289131  this elapsed: 9.09349393845  
trunc: size: 10^5 count: 100      elapsed: 82.2862288952  this elapsed: 14.6826901436  
sqrt:  size: 10^0 count: 10000000 elapsed: 82.7828681469  this elapsed: 31.1945779324  
sqrt:  size: 10^1 count: 1000000  elapsed: 157.086585999  this elapsed: 10.826775074   
sqrt:  size: 10^2 count: 100000   elapsed: 166.278002977  this elapsed: 9.02990698814  
sqrt:  size: 10^3 count: 10000    elapsed: 166.416604996  this elapsed: 8.54917311668  
sqrt:  size: 10^4 count: 1000     elapsed: 167.497569084  this elapsed: 9.19029521942  
sqrt:  size: 10^5 count: 100      elapsed: 175.644635916  this elapsed: 16.5108749866

Comment thread aten/src/ATen/native/UnaryOps.cpp Outdated

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Comment thread aten/src/ATen/native/UnaryOps.cpp Outdated

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Comment thread aten/src/ATen/native/cpu/Vec256.h Outdated

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@colesbury
Copy link
Copy Markdown
Member

Nice speed-ups!

@yf225
Copy link
Copy Markdown
Contributor

yf225 commented Mar 27, 2018

The GPU perf test result can be ignored (the margin of error was too small)

@cpuhrsch cpuhrsch force-pushed the unops branch 5 times, most recently from e83934a to 9e9af7c Compare March 27, 2018 19:30
@colesbury colesbury merged commit bde2f6b into pytorch:master Mar 28, 2018
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Implements a few unary operations for which there are AVX intrinsics.

The perf comparison script is here:
https://paste.fedoraproject.org/paste/f1adcJhpGtzDNWImS34XzQ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants