Skip to content

[XPU] Update XPU C Shim Header#141086

Closed
ratnampa wants to merge 1 commit into
pytorch:mainfrom
ratnampa:ratnampa/update_xpu_c_shim
Closed

[XPU] Update XPU C Shim Header#141086
ratnampa wants to merge 1 commit into
pytorch:mainfrom
ratnampa:ratnampa/update_xpu_c_shim

Conversation

@ratnampa

@ratnampa ratnampa commented Nov 20, 2024

Copy link
Copy Markdown
Contributor

Fixes #141268

Caused by these commits: 34b2165 and 34e4205

The windows XPU builds are failing: https://github.com/pytorch/pytorch/actions/runs/11922274722/job/33228175750
due to recent PR merge with changes in fallback ops: 34e4205

This PR updates the XPU C Shim header file to overcome these build failures.

@pytorch-bot

pytorch-bot Bot commented Nov 20, 2024

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141086

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Unrelated Failures

As of commit 293b93e with merge base 8b4ae29 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@ratnampa ratnampa changed the title Update XPU C Shim Header [XPU] Update XPU C Shim Header Nov 20, 2024
@EikanWang EikanWang added topic: not user facing topic category ciflow/xpu Run XPU CI tasks labels Nov 20, 2024
@EikanWang EikanWang requested a review from etaf November 20, 2024 02:13
@EikanWang EikanWang requested a review from desertfire November 20, 2024 02:54
@etaf etaf requested a review from jansel November 20, 2024 05:18

@jansel jansel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing tests?

@etaf

etaf commented Nov 20, 2024

Copy link
Copy Markdown
Collaborator

Failing tests?

Hi, @jansel the failed job xpu / linux-jammy-xpu-2025_0-py3.9 / test (default, 1, 4, linux.idc.xpu) (gh) is known issue at #140917 and is fixed in PR #140916 (It's blocked by this PR to fix the build issue)

and the Lint / lintrunner-noclang / linux-job (gh)

Lint for test/test_nestedtensor.py:

should be non related issue because this PR does not change the file test/test_nestedtensor.py.

@etaf etaf requested a review from jansel November 20, 2024 06:49
@jansel

jansel commented Nov 21, 2024

Copy link
Copy Markdown
Contributor

Why isn't pytorchbot flagging these as preexisting failures? Can you rebase to viable/strict?

@etaf

etaf commented Nov 21, 2024

Copy link
Copy Markdown
Collaborator

@pytorchbot rebase

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

Successfully rebased ratnampa/update_xpu_c_shim onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout ratnampa/update_xpu_c_shim && git pull --rebase)

@dvrogozh dvrogozh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ratnampa : I suggest to update PR description and commit message by adding links to issue (#141268) and 2 commits (34b2165 and 34e4205) which caused this issue:

Fixes: #141268
Fixes: 34b2165bdb5 ("Insert aten.add into fallback_ops...")
Fixes: 34e420519df ("[Reland] dont decompose baddbmm (#141045)")

AOTI_TORCH_EXPORT AOTITorchError aoti_torch_xpu__addmm_activation(AtenTensorHandle self, AtenTensorHandle mat1, AtenTensorHandle mat2, double beta, double alpha, int32_t use_gelu, AtenTensorHandle* ret0);
AOTI_TORCH_EXPORT AOTITorchError aoti_torch_xpu__fused_moving_avg_obs_fq_helper_functional(AtenTensorHandle self, AtenTensorHandle observer_on, AtenTensorHandle fake_quant_on, AtenTensorHandle running_min, AtenTensorHandle running_max, AtenTensorHandle scale, AtenTensorHandle zero_point, double averaging_const, int64_t quant_min, int64_t quant_max, int64_t ch_axis, int32_t per_row_fake_quant, int32_t symmetric_quant, AtenTensorHandle* ret0, AtenTensorHandle* ret1, AtenTensorHandle* ret2, AtenTensorHandle* ret3, AtenTensorHandle* ret4, AtenTensorHandle* ret5);
AOTI_TORCH_EXPORT AOTITorchError aoti_torch_xpu__trilinear(AtenTensorHandle i1, AtenTensorHandle i2, AtenTensorHandle i3, const int64_t* expand1, int64_t expand1_len_, const int64_t* expand2, int64_t expand2_len_, const int64_t* expand3, int64_t expand3_len_, const int64_t* sumdim, int64_t sumdim_len_, int64_t unroll_dim, AtenTensorHandle* ret0);
AOTI_TORCH_EXPORT AOTITorchError aoti_torch_xpu_add_Scalar(AtenTensorHandle self, double other, double alpha, AtenTensorHandle* ret0);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am confused. The 34b2165 commit has added 2 items into torchgen/aoti/fallback_ops.py:

  1. "aten.add.Scalar",
  2. "aten.add.Tensor"

And c_shim_cuda.h and c_shim_cpu.h has 2 lines updated with above functions. python torchgen/gen.py --update-aoti-c-shim --xpu however adds only aten.add.Scalar... Build passes for me with that and issue is gone. @xytintel, @fengyuan14, all : can someone explain why only 1 line got updated for XPU? is that expected? is there some special logic in generation for XPU?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvrogozh aten.add.Tensor can be well supported by the Inductor. We do not need to fall back to Aten.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvrogozh @EikanWang , Comparing with aten.add.Scalar, the aten.add.Tensor is registered the torch-xpu-ops as an out-of-tree op, and it's aoti API is generated in torch/csrc/inductor/aoti_torch/generated/extend/c_shim_xpu.h.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvrogozh aten.add.Tensor can be well supported by the Inductor. We do not need to fall back to Aten.

Annoyingly, it was needed for complex number.

@guangyey

Copy link
Copy Markdown
Collaborator

@pytorchbot rebase

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot pytorchmergebot force-pushed the ratnampa/update_xpu_c_shim branch from f633808 to 03a9ec0 Compare November 22, 2024 08:57
@atalman

atalman commented Nov 22, 2024

Copy link
Copy Markdown
Collaborator

@chuanqi129 please fix lint. The rest of the errors seems to be not related

@EikanWang

EikanWang commented Nov 23, 2024

Copy link
Copy Markdown
Collaborator

@atalman , the lint error is not related with this PR as this PR does not touch the rnn.py. Let me rerun the lint ci.

@etaf

etaf commented Nov 23, 2024

Copy link
Copy Markdown
Collaborator

@guangyey

Copy link
Copy Markdown
Collaborator

@pytorchbot rebase

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

Successfully rebased ratnampa/update_xpu_c_shim onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout ratnampa/update_xpu_c_shim && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the ratnampa/update_xpu_c_shim branch from 03a9ec0 to aa96325 Compare November 23, 2024 10:34
@guangyey

Copy link
Copy Markdown
Collaborator

We will rebase this PR to the latest viable/strict branch and land this PR once there are no other failures.

@guangyey

Copy link
Copy Markdown
Collaborator

@pytorchbot rebase

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

Successfully rebased ratnampa/update_xpu_c_shim onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout ratnampa/update_xpu_c_shim && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the ratnampa/update_xpu_c_shim branch from aa96325 to 293b93e Compare November 23, 2024 13:39
@EikanWang

Copy link
Copy Markdown
Collaborator

@pytorchbot merge

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: xpu / linux-jammy-xpu-2025_0-py3.9 / test (default, 1, 4, linux.idc.xpu)

Details for Dev Infra team Raised by workflow job

@EikanWang

Copy link
Copy Markdown
Collaborator

@pytorchbot merge -i

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

pytorchmergebot pushed a commit to toyxu/pytorch that referenced this pull request Nov 24, 2024
Fixes pytorch#141268

Caused by these commits: pytorch@34b2165 and pytorch@34e4205

The windows XPU builds are failing: https://github.com/pytorch/pytorch/actions/runs/11922274722/job/33228175750
due to recent PR merge with changes in fallback ops: pytorch@34e4205

This PR updates the XPU C Shim header file to overcome these build failures.
Pull Request resolved: pytorch#141086
Approved by: https://github.com/etaf, https://github.com/EikanWang, https://github.com/jansel, https://github.com/malfet, https://github.com/dvrogozh, https://github.com/desertfire
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Fixes pytorch#141268

Caused by these commits: pytorch@34b2165 and pytorch@34e4205

The windows XPU builds are failing: https://github.com/pytorch/pytorch/actions/runs/11922274722/job/33228175750
due to recent PR merge with changes in fallback ops: pytorch@34e4205

This PR updates the XPU C Shim header file to overcome these build failures.
Pull Request resolved: pytorch#141086
Approved by: https://github.com/etaf, https://github.com/EikanWang, https://github.com/jansel, https://github.com/malfet, https://github.com/dvrogozh, https://github.com/desertfire
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged open source topic: not user facing topic category

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[Break XPU] xpu: build fails for XPU backend due to outdated aoti_torch/generated/c_shim_xpu.h