Skip to content

[CD] Add pytorch xpu wheel build in nightly#129560

Closed
chuanqi129 wants to merge 6 commits intopytorch:mainfrom
chuanqi129:xpu_nightly
Closed

[CD] Add pytorch xpu wheel build in nightly#129560
chuanqi129 wants to merge 6 commits intopytorch:mainfrom
chuanqi129:xpu_nightly

Conversation

@chuanqi129
Copy link
Copy Markdown
Collaborator

@chuanqi129 chuanqi129 commented Jun 26, 2024

Add pytorch xpu wheel build in nightly after the xpu build image enabling PR pytorch/builder#1879 merged. Link to #114850

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Jun 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129560

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures

As of commit daf563e with merge base 46c5266 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.


ROCM_ARCHES = ["6.0", "6.1"]

XPU_ARCHES = ["xpu"]
Copy link
Copy Markdown
Contributor

@atalman atalman Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chuanqi129 separate this PR into 2.

  • First build triton-xpu wheel. The changes in this file and generated-linux-binary-manywheel-nightly.yml are not required at this point
  • Second PR actually add manywheel build. At this point add changes to this file and generated-linux-binary-manywheel-nightly.yml

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @atalman. Make sense, have split the triton-xpu wheel build in PR #129730.

@chuanqi129
Copy link
Copy Markdown
Collaborator Author

The xpu 3.8 wheel build failed on numpy 1.15 installation should be fixed by PR pytorch/builder#1909

@chuanqi129 chuanqi129 marked this pull request as ready for review July 5, 2024 08:44
@chuanqi129 chuanqi129 requested a review from a team as a code owner July 5, 2024 08:44
@chuanqi129 chuanqi129 requested a review from atalman July 5, 2024 09:16
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: xpu
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the meaning of DESIRED_CUDA?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. And the DESIRED_CUDA shared with all devices, including cpu, rocm etc.

conda-pytorchbot-token-test: ${{ secrets.CONDA_PYTORCHBOT_TOKEN_TEST }}
uses: ./.github/workflows/_binary-upload.yml

manywheel-py3_9-xpu-build:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the code is copy-past. Compared with Py3.8, is the python version the only difference? I'm just wondering if we can reuse the code (https://yaml.org/spec/1.2.2/#rule-c-ns-anchor-property)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @EikanWang , this workflow is generated by script .github/scripts/generate_ci_workflows.py based on template .github/templates/linux_binary_build_workflow.yml.j2

@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 8, 2024
@chuanqi129
Copy link
Copy Markdown
Collaborator Author

The build crash will be fixed by PR #130333

@chuanqi129
Copy link
Copy Markdown
Collaborator Author

Hi @atalman , all PR changes related test jobs are passed, especially for the XPU manylinux wheel build and test jobs. But there are some unrelated test job failures caused by conda, could you please help to double check and review the PR again? Thanks

@chuanqi129
Copy link
Copy Markdown
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 11, 2024
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Copy Markdown
Contributor

atalman commented Jul 11, 2024

@pytorchmergebot merge -f "lint is successful, previous run was fully green"

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

atalman pushed a commit to pytorch/test-infra that referenced this pull request Jul 23, 2024
Torch XPU nightly wheel build has been enabled with
#pytorch/pytorch#129560 landed, add XPU to
binary build generation. Works for
pytorch/pytorch#114850

```bash
> python tools/scripts/generate_binary_build_matrix.py --with-xpu enable
{
  "include": [
    {
      "python_version": "3.8",
      "gpu_arch_type": "cpu",
      "gpu_arch_version": "",
      "desired_cuda": "cpu",
      "container_image": "pytorch/manylinux-builder:cpu",
      "package_type": "manywheel",
      "build_name": "manywheel-py3_8-cpu",
      "validation_runner": "linux.2xlarge",
      "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu",
      "channel": "nightly",
      "upload_to_base_bucket": "no",
      "stable_version": "2.3.1",
      "use_split_build": false
    },
    {
      "python_version": "3.8",
      "gpu_arch_type": "cuda",
      "gpu_arch_version": "11.8",
      "desired_cuda": "cu118",
      "container_image": "pytorch/manylinux-builder:cuda11.8",
      "package_type": "manywheel",
      "build_name": "manywheel-py3_8-cuda11_8",
      "validation_runner": "linux.g5.4xlarge.nvidia.gpu",
      "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118",
      "channel": "nightly",
      "upload_to_base_bucket": "no",
      "stable_version": "2.3.1",
      "use_split_build": false
    },
    {
      "python_version": "3.8",
      "gpu_arch_type": "cuda",
      "gpu_arch_version": "12.1",
      "desired_cuda": "cu121",
      "container_image": "pytorch/manylinux-builder:cuda12.1",
      "package_type": "manywheel",
      "build_name": "manywheel-py3_8-cuda12_1",
      "validation_runner": "linux.g5.4xlarge.nvidia.gpu",
      "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121",
      "channel": "nightly",
      "upload_to_base_bucket": "no",
      "stable_version": "2.3.1",
      "use_split_build": false
    },
    {
      "python_version": "3.8",
      "gpu_arch_type": "cuda",
      "gpu_arch_version": "12.4",
      "desired_cuda": "cu124",
      "container_image": "pytorch/manylinux-builder:cuda12.4",
      "package_type": "manywheel",
      "build_name": "manywheel-py3_8-cuda12_4",
      "validation_runner": "linux.g5.4xlarge.nvidia.gpu",
      "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124",
      "channel": "nightly",
      "upload_to_base_bucket": "no",
      "stable_version": "2.3.1",
      "use_split_build": false
    },
    {
      "python_version": "3.8",
      "gpu_arch_type": "rocm",
      "gpu_arch_version": "6.0",
      "desired_cuda": "rocm6.0",
      "container_image": "pytorch/manylinux-builder:rocm6.0",
      "package_type": "manywheel",
      "build_name": "manywheel-py3_8-rocm6_0",
      "validation_runner": "linux.2xlarge",
      "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.0",
      "channel": "nightly",
      "upload_to_base_bucket": "no",
      "stable_version": "2.3.1",
      "use_split_build": false
    },
    {
      "python_version": "3.8",
      "gpu_arch_type": "rocm",
      "gpu_arch_version": "6.1",
      "desired_cuda": "rocm6.1",
      "container_image": "pytorch/manylinux-builder:rocm6.1",
      "package_type": "manywheel",
      "build_name": "manywheel-py3_8-rocm6_1",
      "validation_runner": "linux.2xlarge",
      "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1",
      "channel": "nightly",
      "upload_to_base_bucket": "no",
      "stable_version": "2.3.1",
      "use_split_build": false
    },
    {
      "python_version": "3.8",
      "gpu_arch_type": "xpu",
      "gpu_arch_version": "",
      "desired_cuda": "xpu",
      "container_image": "pytorch/manylinux2_28-builder:xpu",
      "package_type": "manywheel",
      "build_name": "manywheel-py3_8-xpu",
      "validation_runner": "linux.2xlarge",
      "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu",
      "channel": "nightly",
      "upload_to_base_bucket": "no",
      "stable_version": "2.3.1",
      "use_split_build": false
    }
  ]
}
```
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
Add pytorch xpu wheel build in nightly after the xpu build image enabling PR pytorch/builder#1879 merged

Pull Request resolved: pytorch#129560
Approved by: https://github.com/atalman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants