Skip to content

[cpu] Modify inductor opt flag --- ftree-loop-vectorize#121782

Closed
Valentine233 wants to merge 6 commits intomainfrom
tree_vec_target
Closed

[cpu] Modify inductor opt flag --- ftree-loop-vectorize#121782
Valentine233 wants to merge 6 commits intomainfrom
tree_vec_target

Conversation

@Valentine233
Copy link
Copy Markdown
Collaborator

@Valentine233 Valentine233 commented Mar 13, 2024

Fixes #115261, #113017.

For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.

Validation on 3 benchmark suites

FP32

image

Outlier models (speedup<0.8, single socket):

BF16

image

Outlier models (speedup<0.8, single socket):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 13, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121782

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit ae50a74 with merge base e7141d1 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Valentine233 Valentine233 changed the title [cpu] Modify inductor opt flag ftree-loop-vectorize [cpu] Modify inductor opt flag --- ftree-loop-vectorize Mar 13, 2024
@Valentine233 Valentine233 added the topic: not user facing topic category label Mar 13, 2024
@Valentine233 Valentine233 marked this pull request as draft March 13, 2024 02:46
Copy link
Copy Markdown
Collaborator

@jgong5 jgong5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please share performance numbers to make sure there is no regression.

Comment on lines +1250 to +1263
if not config.cpp.enable_tree_loop_vec_opt_flag:
base_flags += " -fno-tree-loop-vectorize"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the issue links as comment to explain why we have to disable this by default.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and added!

resnet50,float32,dynamic,default,2.28794694
timm_efficientnet,float32,static,cpp,2.72195686
mobilenet_v3_large,float32,static,cpp,3.02274304
mobilenet_v3_large,float32,static,cpp,2.9000000
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To pass CI, modify the speedup. According to the validation, this model doesn't have a perf regression.

@leslie-fang-intel
Copy link
Copy Markdown
Collaborator

@Valentine233 please help to check if whether we can enable the vectorization for the regression models.

@Valentine233
Copy link
Copy Markdown
Collaborator Author

@Valentine233 please help to check if whether we can enable the vectorization for the regression models.

Updated in PR description.

@Valentine233
Copy link
Copy Markdown
Collaborator Author

We may wait for all the regressions fixed, if they could be solved in short term.

@github-actions
Copy link
Copy Markdown
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions Bot added the Stale label Jun 11, 2024
@github-actions github-actions Bot closed this Jul 11, 2024
@github-actions github-actions Bot deleted the tree_vec_target branch August 12, 2024 01:59
pytorchmergebot pushed a commit that referenced this pull request Nov 7, 2024
Reopen #121782, as more optimizations have landed.

Fixes #115261, #113017.
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.

### Validation on 3 benchmark suites

#### FP32
![image](https://github.com/user-attachments/assets/ec920928-fa36-467f-ba07-d2c05c51b92e)

Outlier models (speedup<0.8, single socket): None.

#### BF16
![image](https://github.com/user-attachments/assets/4a301e5e-147d-4b74-beb1-40290969ed80)

Outlier models (speedup<0.8, single socket multi threads):

- functorch_dp_cifar10 0.58
- opacus_cifar10 0.57

Pull Request resolved: #136827
Approved by: https://github.com/jansel, https://github.com/jgong5
pytorchmergebot pushed a commit that referenced this pull request Nov 12, 2024
Reopen #121782, as more optimizations have landed.

Fixes #115261, #113017.
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.

### Validation on 3 benchmark suites

#### FP32
![image](https://github.com/user-attachments/assets/ec920928-fa36-467f-ba07-d2c05c51b92e)

Outlier models (speedup<0.8, single socket): None.

#### BF16
![image](https://github.com/user-attachments/assets/4a301e5e-147d-4b74-beb1-40290969ed80)

Outlier models (speedup<0.8, single socket multi threads):

- functorch_dp_cifar10 0.58
- opacus_cifar10 0.57

Pull Request resolved: #136827
Approved by: https://github.com/jansel, https://github.com/jgong5
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Reopen pytorch#121782, as more optimizations have landed.

Fixes pytorch#115261, pytorch#113017.
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.

### Validation on 3 benchmark suites

#### FP32
![image](https://github.com/user-attachments/assets/ec920928-fa36-467f-ba07-d2c05c51b92e)

Outlier models (speedup<0.8, single socket): None.

#### BF16
![image](https://github.com/user-attachments/assets/4a301e5e-147d-4b74-beb1-40290969ed80)

Outlier models (speedup<0.8, single socket multi threads):

- functorch_dp_cifar10 0.58
- opacus_cifar10 0.57

Pull Request resolved: pytorch#136827
Approved by: https://github.com/jansel, https://github.com/jgong5
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Reopen pytorch#121782, as more optimizations have landed.

Fixes pytorch#115261, pytorch#113017.
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.

### Validation on 3 benchmark suites

#### FP32
![image](https://github.com/user-attachments/assets/ec920928-fa36-467f-ba07-d2c05c51b92e)

Outlier models (speedup<0.8, single socket): None.

#### BF16
![image](https://github.com/user-attachments/assets/4a301e5e-147d-4b74-beb1-40290969ed80)

Outlier models (speedup<0.8, single socket multi threads):

- functorch_dp_cifar10 0.58
- opacus_cifar10 0.57

Pull Request resolved: pytorch#136827
Approved by: https://github.com/jansel, https://github.com/jgong5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Inductor][cpu][miscompile] Outputs of torch.matmul abnormally change with extra outputs

4 participants