xpu: get xpu arch flags at runtime in cpp_extensions#152192
xpu: get xpu arch flags at runtime in cpp_extensions#152192dvrogozh wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152192
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 96f1e66 with merge base 9608e7f ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "topic: not user facing" |
|
@pytorchbot label "module: xpu" |
|
@pytorchbot label "ciflow/xpu" |
|
To add these label(s) (ciflow/xpu) to the PR, please first approve the workflows that are awaiting approval (scroll to the bottom of this page). This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
In the last push to PR - fix of linter issue, no other changes. |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
| # for default spir64 target and avoid device specific compilations entirely. Further, kernels | ||
| # will be JIT compiled at runtime. | ||
| def _get_sycl_target_flags(): | ||
| if _get_sycl_arch_list() != '': |
There was a problem hiding this comment.
Why do you build the full arch list just to check if it's empty?
There was a problem hiding this comment.
User can either specify the empty arch list externally (via TORCH_XPU_ARCH_LIST building his extension) or default arch list can be empty if that's how user's pytorch version was originally built (with empty TORCH_XPU_ARCH_LIST when building pytorch). Further, empty arch list is still a valid case since it corresponds to JIT runtime compilation. This if handles options difference between JIT and AOT cases.
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
This commit moves query for xpu arch flags to runtime when building SYCL extensions which allows to adjust `TORCH_XPU_ARCH_LIST` at python script level. That's handy for example in ci test which gives a try few variants of the list. Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
|
Successfully rebased |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
| '-fsycl', | ||
| '-fsycl-targets=spir64_gen,spir64' if _get_sycl_arch_list() != '' else '', | ||
| ] | ||
|
|
||
| _SYCL_DLINK_FLAGS = [ | ||
| *_COMMON_SYCL_FLAGS, | ||
| '-fsycl-link', | ||
| '--offload-compress', | ||
| f'-Xs "-device {_get_sycl_arch_list()}"' if _get_sycl_arch_list() != '' else '', | ||
| ] |
There was a problem hiding this comment.
Hi, @dvrogozh, have some wonder on this pr, why remove '-fsycl-targets=spir64_gen,spir64' in _COMMON_SYCL_FLAGS, and it will cause '-fsycl-targets is lost in _SYCL_DLINK_FLAGS and make sycl kernel link failed like below:
RuntimeError: No kernel named _ZTS26AWQDequantizeKernelFunctorIN4sycl3_V16detail9half_impl4halfES4_Li16ELi16ELi16EE was found
There was a problem hiding this comment.
Per my understanding, -fsycl-targets is compiler flag, not a linker flag. Tests for sycl extension which build test sycl kernels and executed them do pass after this change. Can you, please, be more specific on the scenario where you see the noted issue? file the bug.
There was a problem hiding this comment.
@dvrogozh According to the SYCL User’s Manual, the -fsycl-targets option is typically specified during the linking stage when AOT (Ahead-Of-Time) compilation is enabled. refer to Normally, '-fsycl-targets' is specified when linking an application, in which case the AOT compiled device binaries are embedded within the application’s fat executable.
There was a problem hiding this comment.
Thank you for the link, this helps. However, this documentation notes 2 possible usages. One when linking the application, another - when calling compiler. As of now sycl cpp extension in pytorch follows the second scheme and I do not observe compilation, linker or runtime errors associated with that including the one @pengxin99 notes above. Snapshots which include compile/link command lines from running pytorch side tests are below.
@pengxin99, is your statement of the error an actual observation which you see using sycl cpp extension or speculation of what might happen? If that's actual observation - please, file the issue here at github and describe repro case. If you don't see the error, but believe there is something to improve in sycl cpp extension support - again, file the issue and describe what you want to achieve which currently can't be achieved or what needs to be changed and why. Either way, I strongly believe this discussion needs to be continued in the dedicated issue where you can link back to this thread.
[1/3] icpx -DTORCH_EXTENSION_NAME=torch_test_xpu_extension_ZLcQS -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1018\" -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include/torch/csrc/api/include -isystem /home/dvrogozh/miniforge3/include/python3.12 -fPIC -std=c++17 -fsycl -fsycl-targets=spir64_gen,spir64 -sycl-std=2020 -fsycl-host-compiler=c++ '-fsycl-host-compiler-options=-DTORCH_EXTENSION_NAME=torch_test_xpu_extension_ZLcQS -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\"_gcc\\" -DPYBIND11_STDLIB=\\"_libstdcpp\\" -DPYBIND11_BUILD_ABI=\\"_cxxabi1018\\" -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include -isystem /home/dvrogozh/git/pytorch/pytorch/torch/include/torch/csrc/api/include -isystem /home/dvrogozh/miniforge3/include/python3.12 -fPIC -std=c++17' -c -x c++ /home/dvrogozh/git/pytorch/pytorch/test/cpp_extensions/xpu_extension.sycl -o xpu_extension.sycl.o
[2/3] icpx xpu_extension.sycl.o -o sycl_dlink.o -fsycl -fsycl-link --offload-compress -Xs "-device pvc" -Xs "-device pvc"
There was a problem hiding this comment.
Hi, @dvrogozh, you can reproduce this issue follow: https://github.com/pengxin99/sglang/blob/xpu_build_back_issue_repro/sgl-kernel/README.md
There was a problem hiding this comment.
Hi @pengxin99, thank you for the reproducer. This helps - I can reproduce the issue. The reason why you stepped into the problem is that you've specified custom "-fsycl-targets=spir64_gen" option. This overrides the default set by cpp extension as -fsycl-targets=spir64,spir64_gen. If you will drop this, then runtime error will go away and tests will start to pass. If your desire is to support sycl kernels in sglang, then you probably want to drop this option anyways and use the default which will support both AOT targets (via spir64_gen) and JIT compilation (via spir64). There are some other options which you don't need to set as these are set by cpp extension by default - these are -fsycl and -std=c++17. @pengxin99 : Can you consider to change this on your side to be like below?
extra_compile_args = {
"cxx": ["-O3"],
"sycl": ["-ffast-math", "-fsycl-device-code-split=per_kernel"],
}
Above being said, behavior you've noted is indeed an actual issue. Effectively AOT compilation is currently not happening with sycl cpp extension. Instead JIT compilation happens (via spir64 target). The JIT compilation is the reason why I overlooked the problem - test did not fail. We need to adjust the test to cover such a case, i.e. need to have the case where we explicitly disable JIT compilation and have only AOT.
There was a problem hiding this comment.
@dvrogozh, Thanks for the explanation, will follow your suggestion in my project.
Commit fixes AOT compilation in sycl cpp extension which got accidentally dropped on aca2c99 (fallback to JIT compilation had happened). Commit also fixes override logic for default sycl targets allowing flexibility to specify targets externally. Further, commit extends test coverage to cover such a case and fixes issue in the test where consequent tests executed same (first) compiled extension due to name conflicts. Fixes: pytorch#156249 Fixes: aca2c99 ("xpu: get xpu arch flags at runtime in cpp_extensions (pytorch#152192)") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Commit fixes AOT compilation in sycl cpp extension which got accidentally dropped on aca2c99 (fallback to JIT compilation had happened). Commit also fixes override logic for default sycl targets allowing flexibility to specify targets externally. Further, commit extends test coverage to cover such a case and fixes issue in the test where consequent tests executed same (first) compiled extension due to name conflicts. Fixes: #156249 Fixes: aca2c99 ("xpu: get xpu arch flags at runtime in cpp_extensions (#152192)") CC: @pengxin99, @guangyey Pull Request resolved: #156364 Approved by: https://github.com/ezyang
This commit moves query for xpu arch flags to runtime when building SYCL extensions which allows to adjust
TORCH_XPU_ARCH_LISTat python script level. That's handy for example in ci test which gives a try few variants of the list.CC: @malfet, @jingxu10, @EikanWang, @guangyey
cc @gujinghui @EikanWang @fengyuan14 @guangyey