Skip to content

xpu: get xpu arch flags at runtime in cpp_extensions#152192

Closed
dvrogozh wants to merge 1 commit intopytorch:mainfrom
dvrogozh:extension
Closed

xpu: get xpu arch flags at runtime in cpp_extensions#152192
dvrogozh wants to merge 1 commit intopytorch:mainfrom
dvrogozh:extension

Conversation

@dvrogozh
Copy link
Contributor

@dvrogozh dvrogozh commented Apr 25, 2025

This commit moves query for xpu arch flags to runtime when building SYCL extensions which allows to adjust TORCH_XPU_ARCH_LIST at python script level. That's handy for example in ci test which gives a try few variants of the list.

CC: @malfet, @jingxu10, @EikanWang, @guangyey

cc @gujinghui @EikanWang @fengyuan14 @guangyey

@pytorch-bot
Copy link

pytorch-bot bot commented Apr 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152192

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 96f1e66 with merge base 9608e7f (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@dvrogozh
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Apr 25, 2025
@dvrogozh
Copy link
Contributor Author

@pytorchbot label "module: xpu"

@pytorch-bot pytorch-bot bot added the module: xpu Intel XPU related issues label Apr 25, 2025
@dvrogozh
Copy link
Contributor Author

@pytorchbot label "ciflow/xpu"

@pytorch-bot
Copy link

pytorch-bot bot commented Apr 25, 2025

To add these label(s) (ciflow/xpu) to the PR, please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Apr 28, 2025
@guangyey guangyey moved this to Review Required in PyTorch Intel Apr 28, 2025
@guangyey guangyey added the release notes: xpu release notes category label Apr 28, 2025
@pytorch-bot pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Apr 28, 2025
@dvrogozh
Copy link
Contributor Author

In the last push to PR - fix of linter issue, no other changes.

@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Apr 28, 2025
@pytorch-bot pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Apr 28, 2025
@guangyey guangyey added keep-going Don't stop on first failure, keep running tests until the end ciflow/xpu Run XPU CI tasks labels Apr 29, 2025
@guangyey
Copy link
Collaborator

guangyey commented May 8, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased extension onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout extension && git pull --rebase)

@guangyey guangyey added the ciflow/trunk Trigger trunk jobs on your pull request label May 8, 2025
@guangyey guangyey requested review from albanD, atalman and malfet May 8, 2025 02:25
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

# for default spir64 target and avoid device specific compilations entirely. Further, kernels
# will be JIT compiled at runtime.
def _get_sycl_target_flags():
if _get_sycl_arch_list() != '':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you build the full arch list just to check if it's empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User can either specify the empty arch list externally (via TORCH_XPU_ARCH_LIST building his extension) or default arch list can be empty if that's how user's pytorch version was originally built (with empty TORCH_XPU_ARCH_LIST when building pytorch). Further, empty arch list is still a valid case since it corresponds to JIT runtime compilation. This if handles options difference between JIT and AOT cases.

@guangyey
Copy link
Collaborator

guangyey commented May 9, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

This commit moves query for xpu arch flags to runtime when building
SYCL extensions which allows to adjust `TORCH_XPU_ARCH_LIST` at
python script level. That's handy for example in ci test which
gives a try few variants of the list.

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@pytorchmergebot
Copy link
Collaborator

Successfully rebased extension onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout extension && git pull --rebase)

@pytorch-bot pytorch-bot bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks labels May 9, 2025
@guangyey guangyey added ciflow/xpu Run XPU CI tasks ciflow/trunk Trigger trunk jobs on your pull request labels May 9, 2025
@guangyey
Copy link
Collaborator

guangyey commented May 9, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Comment on lines 296 to 313
'-fsycl',
'-fsycl-targets=spir64_gen,spir64' if _get_sycl_arch_list() != '' else '',
]

_SYCL_DLINK_FLAGS = [
*_COMMON_SYCL_FLAGS,
'-fsycl-link',
'--offload-compress',
f'-Xs "-device {_get_sycl_arch_list()}"' if _get_sycl_arch_list() != '' else '',
]
Copy link

@pengxin99 pengxin99 Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @dvrogozh, have some wonder on this pr, why remove '-fsycl-targets=spir64_gen,spir64' in _COMMON_SYCL_FLAGS, and it will cause '-fsycl-targets is lost in _SYCL_DLINK_FLAGS and make sycl kernel link failed like below:

RuntimeError: No kernel named _ZTS26AWQDequantizeKernelFunctorIN4sycl3_V16detail9half_impl4halfES4_Li16ELi16ELi16EE was found

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my understanding, -fsycl-targets is compiler flag, not a linker flag. Tests for sycl extension which build test sycl kernels and executed them do pass after this change. Can you, please, be more specific on the scenario where you see the noted issue? file the bug.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvrogozh According to the SYCL User’s Manual, the -fsycl-targets option is typically specified during the linking stage when AOT (Ahead-Of-Time) compilation is enabled. refer to Normally, '-fsycl-targets' is specified when linking an application, in which case the AOT compiled device binaries are embedded within the application’s fat executable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the link, this helps. However, this documentation notes 2 possible usages. One when linking the application, another - when calling compiler. As of now sycl cpp extension in pytorch follows the second scheme and I do not observe compilation, linker or runtime errors associated with that including the one @pengxin99 notes above. Snapshots which include compile/link command lines from running pytorch side tests are below.

@pengxin99, is your statement of the error an actual observation which you see using sycl cpp extension or speculation of what might happen? If that's actual observation - please, file the issue here at github and describe repro case. If you don't see the error, but believe there is something to improve in sycl cpp extension support - again, file the issue and describe what you want to achieve which currently can't be achieved or what needs to be changed and why. Either way, I strongly believe this discussion needs to be continued in the dedicated issue where you can link back to this thread.

[1/3] icpx -DTORCH_EXTENSION_NAME=torch_test_xpu_extension_ZLcQS -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1018\" -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include/torch/csrc/api/include -isystem /home/dvrogozh/miniforge3/include/python3.12 -fPIC -std=c++17 -fsycl -fsycl-targets=spir64_gen,spir64 -sycl-std=2020 -fsycl-host-compiler=c++ '-fsycl-host-compiler-options=-DTORCH_EXTENSION_NAME=torch_test_xpu_extension_ZLcQS -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\"_gcc\\" -DPYBIND11_STDLIB=\\"_libstdcpp\\" -DPYBIND11_BUILD_ABI=\\"_cxxabi1018\\" -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include/torch/csrc/api/include -isystem /home/dvrogozh/miniforge3/include/python3.12 -fPIC -std=c++17' -c -x c++ /home/dvrogozh/git/pytorch/pytorch/test/cpp_extensions/xpu_extension.sycl -o xpu_extension.sycl.o
[2/3] icpx xpu_extension.sycl.o -o sycl_dlink.o -fsycl -fsycl-link --offload-compress -Xs "-device pvc"  -Xs "-device pvc"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pengxin99, thank you for the reproducer. This helps - I can reproduce the issue. The reason why you stepped into the problem is that you've specified custom "-fsycl-targets=spir64_gen" option. This overrides the default set by cpp extension as -fsycl-targets=spir64,spir64_gen. If you will drop this, then runtime error will go away and tests will start to pass. If your desire is to support sycl kernels in sglang, then you probably want to drop this option anyways and use the default which will support both AOT targets (via spir64_gen) and JIT compilation (via spir64). There are some other options which you don't need to set as these are set by cpp extension by default - these are -fsycl and -std=c++17. @pengxin99 : Can you consider to change this on your side to be like below?

extra_compile_args = {
    "cxx": ["-O3"],
    "sycl": ["-ffast-math", "-fsycl-device-code-split=per_kernel"],
}

Above being said, behavior you've noted is indeed an actual issue. Effectively AOT compilation is currently not happening with sycl cpp extension. Instead JIT compilation happens (via spir64 target). The JIT compilation is the reason why I overlooked the problem - test did not fail. We need to adjust the test to cover such a case, i.e. need to have the case where we explicitly disable JIT compilation and have only AOT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #156249.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvrogozh, Thanks for the explanation, will follow your suggestion in my project.

dvrogozh added a commit to dvrogozh/pytorch that referenced this pull request Jun 19, 2025
Commit fixes AOT compilation in sycl cpp extension which got
accidentally dropped on aca2c99 (fallback to JIT compilation
had happened). Commit also fixes override logic for default sycl
targets allowing flexibility to specify targets externally. Further,
commit extends test coverage to cover such a case and fixes issue
in the test where consequent tests executed same (first) compiled
extension due to name conflicts.

Fixes: pytorch#156249
Fixes: aca2c99 ("xpu: get xpu arch flags at runtime in cpp_extensions (pytorch#152192)")
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
pytorchmergebot pushed a commit that referenced this pull request Jun 19, 2025
Commit fixes AOT compilation in sycl cpp extension which got accidentally dropped on aca2c99 (fallback to JIT compilation had happened). Commit also fixes override logic for default sycl targets allowing flexibility to specify targets externally. Further, commit extends test coverage to cover such a case and fixes issue in the test where consequent tests executed same (first) compiled extension due to name conflicts.

Fixes: #156249
Fixes: aca2c99 ("xpu: get xpu arch flags at runtime in cpp_extensions (#152192)")

CC: @pengxin99, @guangyey

Pull Request resolved: #156364
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks keep-going Don't stop on first failure, keep running tests until the end Merged module: xpu Intel XPU related issues open source release notes: xpu release notes category topic: not user facing topic category

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants