Skip to content

Remove unnecessary Arg instantiation in cuda/function.pyx#3253

Merged
mergify[bot] merged 1 commit intocupy:masterfrom
asi1024:perf
Apr 6, 2020
Merged

Remove unnecessary Arg instantiation in cuda/function.pyx#3253
mergify[bot] merged 1 commit intocupy:masterfrom
asi1024:perf

Conversation

@asi1024
Copy link
Copy Markdown
Member

@asi1024 asi1024 commented Apr 2, 2020

This PR resolves performance degradation in function.pyx.
note: #2920 (review)

  • v8.0.01b
time_raw             - case 10                            :    CPU:    6.197 us   +/- 0.341 (min:    5.882 / max:    6.814) us     GPU:    9.843 us   +/- 1.630 (min:    8.704 / max:   13.056) us
time_raw             - case 100                           :    CPU:    5.816 us   +/- 0.195 (min:    5.621 / max:    6.114) us     GPU:    8.787 us   +/- 0.831 (min:    8.064 / max:   10.400) us
time_raw             - case 1000                          :    CPU:    6.399 us   +/- 0.194 (min:    6.213 / max:    6.682) us     GPU:   10.067 us   +/- 0.403 (min:    9.664 / max:   10.688) us
time_raw             - case 10000                         :    CPU:    6.029 us   +/- 0.260 (min:    5.758 / max:    6.444) us     GPU:   25.376 us   +/- 0.385 (min:   24.896 / max:   26.016) us
time_raw             - case 20000                         :    CPU:    5.994 us   +/- 0.269 (min:    5.720 / max:    6.460) us     GPU:   42.470 us   +/- 0.379 (min:   42.112 / max:   43.200) us
  • master
time_raw             - case 10                            :    CPU:   10.597 us   +/- 0.464 (min:   10.216 / max:   11.361) us     GPU:   14.394 us   +/- 2.039 (min:   12.800 / max:   18.368) us
time_raw             - case 100                           :    CPU:   10.103 us   +/- 0.239 (min:    9.860 / max:   10.512) us     GPU:   13.453 us   +/- 1.118 (min:   12.640 / max:   15.648) us
time_raw             - case 1000                          :    CPU:   10.352 us   +/- 0.282 (min:    9.969 / max:   10.693) us     GPU:   13.709 us   +/- 0.425 (min:   13.248 / max:   14.432) us
time_raw             - case 10000                         :    CPU:   10.400 us   +/- 0.323 (min:   10.021 / max:   10.917) us     GPU:   29.338 us   +/- 0.430 (min:   28.928 / max:   30.080) us
time_raw             - case 20000                         :    CPU:   10.636 us   +/- 0.205 (min:   10.279 / max:   10.903) us     GPU:   46.707 us   +/- 0.332 (min:   46.432 / max:   47.360) us
  • this PR (f2bee1b)
time_raw             - case 10                            :    CPU:    7.010 us   +/- 0.406 (min:    6.679 / max:    7.688) us     GPU:   10.912 us   +/- 2.074 (min:    9.344 / max:   14.976) us
time_raw             - case 100                           :    CPU:    6.775 us   +/- 0.222 (min:    6.532 / max:    7.181) us     GPU:    9.984 us   +/- 0.759 (min:    9.280 / max:   11.456) us
time_raw             - case 1000                          :    CPU:    6.854 us   +/- 0.202 (min:    6.642 / max:    7.187) us     GPU:   10.464 us   +/- 0.385 (min:   10.176 / max:   11.200) us
time_raw             - case 10000                         :    CPU:    6.980 us   +/- 0.236 (min:    6.687 / max:    7.276) us     GPU:   26.118 us   +/- 0.298 (min:   25.760 / max:   26.592) us
time_raw             - case 20000                         :    CPU:    6.930 us   +/- 0.281 (min:    6.673 / max:    7.316) us     GPU:   43.418 us   +/- 0.335 (min:   43.104 / max:   44.000) us

@asi1024 asi1024 added the cat:performance Performance in terms of speed or memory consumption label Apr 2, 2020
Comment on lines 92 to 93
if isinstance(x, core.ndarray):
return (<core.ndarray>x).get_pointer()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question (still trying to figure out what happened with the new Arg 😅): Isn't there a NdarrayArg? Why not also removing these two lines?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called only from here now and NdarrayArg does not reach here.
All indexers and scalars are converted to IndexerArg and ScalarArg, so these lines can be removed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks @asi1024. I am still fuzzy about when exactly NdarrayArg is needed, though...A few pointers from you are greatly appreciated!

Copy link
Copy Markdown
Member

@emcastillo emcastillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@emcastillo
Copy link
Copy Markdown
Member

Jenkins, test this please

@pfn-ci-bot
Copy link
Copy Markdown
Collaborator

Successfully created a job for commit 3d85810:

@emcastillo emcastillo added this to the v8.0.0b2 milestone Apr 6, 2020
@emcastillo emcastillo added the st:test-and-merge (deprecated) Ready to merge after test pass. label Apr 6, 2020
@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 3d85810, target branch master) succeeded!

@mergify mergify bot merged commit 0cef644 into cupy:master Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:performance Performance in terms of speed or memory consumption st:test-and-merge (deprecated) Ready to merge after test pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants