TensorIterator: Avoid nesting two levels of function_ref in for_each by peterbell10 · Pull Request #53613 · pytorch/pytorch

peterbell10 · 2021-03-09T16:25:08Z

When calling TensorIterator::for_each with a 1d loop, it creates a function_ref for the 1D iteration, then wraps it with LOOP_WRAPPER to transform it into a 2d loop. That 2d loop then gets wrapped in another function_ref. This can result in significant overhead if the 1d inner loop is over a small number of elements.

Instead, this wraps the 1d loop before type-erasure so only one level of function_ref is introduced. A simple benchmark demonstrates this is a win:

import torch
a = torch.rand((10000, 2))[::2]
%timeit a + a

Note the 2D tensor cannot be coalesced into 1D and both cpu_kernel and cpu_kernel_vec use 1D for_each. On master, this takes 42 us but with this change it's down to 32us.

facebook-github-bot · 2021-03-09T16:25:18Z

💊 CI failures summary and remediations

As of commit 310fb71 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

swolchok · 2021-03-09T17:27:09Z

I especially like this part. IIRC from looking at this previously, SmallVector overhead was a small problem.

Unfortunately, this doesn't work. data here is potentially shared between multiple threads so it does need to be created inside the lambda after all.

swolchok · 2021-03-09T17:28:45Z

could we avoid duplicating what used to be LOOP_WRAPPER with a template function of its own that took loop, data, and ntensor (by reference/value as appropriate)?

ezyang

This is really nice work, thanks!

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-03-11T00:30:12Z

@ezyang merged this pull request in 895735c.

…ytorch#53613) Summary: When calling `TensorIterator::for_each` with a 1d loop, it creates a `function_ref` for the 1D iteration, then wraps it with `LOOP_WRAPPER` to transform it into a 2d loop. That 2d loop then gets wrapped in another `function_ref`. This can result in significant overhead if the 1d inner loop is over a small number of elements. Instead, this wraps the 1d loop before type-erasure so only one level of `function_ref` is introduced. A simple benchmark demonstrates this is a win: ```python import torch a = torch.rand((10000, 2))[::2] %timeit a + a ``` Note the 2D tensor cannot be coalesced into 1D and both `cpu_kernel` and `cpu_kernel_vec` use 1D for_each. On master, this takes 42 us but with this change it's down to 32us. Pull Request resolved: pytorch#53613 Reviewed By: VitalyFedyunin Differential Revision: D26947143 Pulled By: ezyang fbshipit-source-id: 5189ada0d82bbf74170fb446763753f02478abf6

peterbell10 requested a review from swolchok March 9, 2021 16:25

facebook-github-bot added the cla signed label Mar 9, 2021

pytorchbot added the open source label Mar 9, 2021

swolchok reviewed Mar 9, 2021

View reviewed changes

TensorIterator: Avoid nesting two levels of function_ref in for_each

c6f7b50

peterbell10 force-pushed the TI_for_each branch from f0312b8 to 91e76a8 Compare March 9, 2021 18:02

Factor loop wrapper out into function template

310fb71

peterbell10 force-pushed the TI_for_each branch from 91e76a8 to 310fb71 Compare March 9, 2021 20:18

peterbell10 requested a review from swolchok March 10, 2021 14:02

mrshenli added the module: TensorIterator label Mar 10, 2021

mrshenli requested review from ezyang and izdeby March 10, 2021 15:16

mrshenli added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 10, 2021

ezyang approved these changes Mar 10, 2021

View reviewed changes

facebook-github-bot reviewed Mar 10, 2021

View reviewed changes

facebook-github-bot closed this in 895735c Mar 11, 2021

facebook-github-bot added the Merged label Mar 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorIterator: Avoid nesting two levels of function_ref in for_each#53613

TensorIterator: Avoid nesting two levels of function_ref in for_each#53613
peterbell10 wants to merge 2 commits intopytorch:masterfrom
peterbell10:TI_for_each

peterbell10 commented Mar 9, 2021

Uh oh!

facebook-github-bot commented Mar 9, 2021 •

edited

Loading

Uh oh!

swolchok Mar 9, 2021

Uh oh!

peterbell10 Mar 9, 2021

Uh oh!

swolchok Mar 9, 2021

Uh oh!

ezyang left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Mar 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

peterbell10 commented Mar 9, 2021

Uh oh!

facebook-github-bot commented Mar 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

swolchok Mar 9, 2021

Choose a reason for hiding this comment

Uh oh!

peterbell10 Mar 9, 2021

Choose a reason for hiding this comment

Uh oh!

swolchok Mar 9, 2021

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

facebook-github-bot commented Mar 9, 2021 •

edited

Loading