Skip to content

CI for Windows Arm64#148753

Closed
iremyux wants to merge 177 commits intopytorch:mainfrom
iremyux:trunk-win-arm64
Closed

CI for Windows Arm64#148753
iremyux wants to merge 177 commits intopytorch:mainfrom
iremyux:trunk-win-arm64

Conversation

@iremyux
Copy link
Collaborator

@iremyux iremyux commented Mar 7, 2025

This pull request adds a new CI workflow for Windows Arm64, named win-arm64-build-test.yml.
It can be triggered on any pull request by including the ciflow/win-arm64 tag.

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148753

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e65e294 with merge base d27d361 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: releng release notes category label Mar 7, 2025
@iremyux iremyux added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 7, 2025
@iremyux iremyux added ciflow/win-arm64 Trigger Windows Arm64 CI Workflows and removed ciflow/trunk Trigger trunk jobs on your pull request labels Mar 10, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 10, 2025

Warning: Unknown label ciflow/win-arm64.
Currently recognized labels are

  • ciflow/binaries
  • ciflow/binaries_libtorch
  • ciflow/binaries_wheel
  • ciflow/inductor
  • ciflow/inductor-periodic
  • ciflow/inductor-rocm
  • ciflow/inductor-perf-test-nightly-rocm
  • ciflow/inductor-perf-compare
  • ciflow/inductor-micro-benchmark
  • ciflow/inductor-micro-benchmark-cpu-x86
  • ciflow/inductor-cu126
  • ciflow/linux-aarch64
  • ciflow/mps
  • ciflow/nightly
  • ciflow/periodic
  • ciflow/rocm
  • ciflow/rocm-mi300
  • ciflow/s390
  • ciflow/slow
  • ciflow/trunk
  • ciflow/unstable
  • ciflow/xpu
  • ciflow/torchbench
  • ciflow/autoformat

Please add the new label to .github/pytorch-probot.yml

@iremyux iremyux added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 12, 2025
@iremyux
Copy link
Collaborator Author

iremyux commented Jul 7, 2025

Thank you for working on the PR. Few high level comments:

  • Thank you for adding build_pytorch.ps1, but please delete build_pytorch.bat from the same folder (or example why both of them needs to be present)

  • Please explain why more code can not be shared between _win-build.yml and _win-arm64-build.yml (I had exactly the same feedback for the binary builds)

  • Please do not added it to trunk, but rather create an opt-in win-arm64.yml for now, that could be invoked on ciflow/... Reason for that is twofold:

    • We don't want trunk workflow signal to be red, nor make developers think on how to fix build/test regression there
    • Cost (win-arm64 runners are no longer free and adding them to trunk workflow means they'll run on all commits)

Hey @malfet ,

  • The build_pytorch.bat file you mentioned is used by the older Windows x64 workflows. I’ve removed the one that was added as part of this PR.
  • I can move the win-arm64 build & test jobs off the trunk and instead have them triggered only by a tag (e.g., ciflow/win-arm64). However, merging win-x64-build.yml and win-arm64-build.yml into a single workflow while keeping the tag condition isn't ideal. It would require users to manually time tag pushes, and there's a risk that the win-arm64 workflow won’t consistently trigger when needed.
    So the options seem to be:
    • Merge both .yml files and keep win-arm64 running on trunk,
      or
    • Keep a separate win-arm64 workflow that’s triggered via a tag.
  • Also, I am afraid that the cache usage will be affected if win-arm64 is no longer run from trunk. We’ve added specific permissions in trunk.yml to ensure reliable cache access.

@iremyux iremyux added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 7, 2025
@iremyux iremyux removed the ciflow/trunk Trigger trunk jobs on your pull request label Jul 7, 2025
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seems like a working PR (I.e. it fails after 9 minutes, isn't it?)

@@ -0,0 +1,115 @@
name: win-arm64-test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is unsued right now, isn't it?

contents: read

jobs:
build:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this file can not cal _win-build.yml and newly added _win-arm64-test.yml?

@malfet
Copy link
Contributor

malfet commented Jul 7, 2025

  • The build_pytorch.bat file you mentioned is used by the older Windows x64 workflows. I’ve removed the one that was added as part of this PR.

I understand that x86 and arm builds are unrelated, but it would be good to move both on modern basis and enabling arm64 builds look like a good time for this exercise, but we can leave those separate

  • I can move the win-arm64 build & test jobs off the trunk and instead have them triggered only by a tag (e.g., ciflow/win-arm64). However, merging win-x64-build.yml and win-arm64-build.yml into a single workflow while keeping the tag condition isn't ideal. It would require users to manually time tag pushes, and there's a risk that the win-arm64 workflow won’t consistently trigger when needed.

Sure, that's the risk, but it's not greater or worse than having win-arm64-build.yml as separate file

So the options seem to be:

  • Merge both .yml files and keep win-arm64 running on trunk,

I don't think it's reasonable to enable new platform builds on every commit until we have some measure of stability of the platform. One way to get sense of that, is to first add it as opt-in, then as periodic and finally enable on every commit.

or
  • Keep a separate win-arm64 workflow that’s triggered via a tag.

Sure, you can keep workflows separate while they are opt-in via tag, but whenever you want to move them to periodic I would insist that x86 and arm build workflows should share more and more logic.

  • Also, I am afraid that the cache usage will be affected if win-arm64 is no longer run from trunk. We’ve added specific permissions in trunk.yml to ensure reliable cache access.
    Cache will eventually be populate once workflow runs periodically. But first, it needs to run to a completion

@iremyux iremyux changed the title Trunk workflow for Windows Arm64 CI for Windows Arm64 Jul 8, 2025
Copy link
Member

@seemethere seemethere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also in agreement with @malfet here, I don't think windows arm64 is bespoke enough from windows x86 to completely separate the two workflows.

I'd ideally like to see this merged into our existing windows base jobs and integrated into workflows using the same reusable workflows to match what we're doing for x86 while keeping it separate and opt in only.

For an example you can refer to:

win-vs2022-cpu-py3-build:
name: win-vs2022-cpu-py3
uses: ./.github/workflows/_win-build.yml
needs: get-label-type
with:
build-environment: win-vs2022-cpu-py3
cuda-version: cpu
runner: "${{ needs.get-label-type.outputs.label-type }}windows.4xlarge.nonephemeral"
test-matrix: |
{ include: [
{ config: "default", shard: 1, num_shards: 3, runner: "${{ needs.get-label-type.outputs.label-type }}windows.4xlarge.nonephemeral" },
{ config: "default", shard: 2, num_shards: 3, runner: "${{ needs.get-label-type.outputs.label-type }}windows.4xlarge.nonephemeral" },
{ config: "default", shard: 3, num_shards: 3, runner: "${{ needs.get-label-type.outputs.label-type }}windows.4xlarge.nonephemeral" },
]}
secrets: inherit
win-vs2022-cpu-py3-test:
name: win-vs2022-cpu-py3
uses: ./.github/workflows/_win-test.yml
needs:
- win-vs2022-cpu-py3-build
- target-determination
with:
build-environment: win-vs2022-cpu-py3
cuda-version: cpu
test-matrix: ${{ needs.win-vs2022-cpu-py3-build.outputs.test-matrix }}
disable-monitor: false
secrets: inherit

My biggest worry here is that by creating a bespoke pathway for arm64 that we're ultimately making it even more difficult to maintain our windows builds (by introducing even more code to maintain).

@iremyux
Copy link
Collaborator Author

iremyux commented Jul 9, 2025

@malfet @seemethere , Thanks for the comments.

I believe attempting to merge both platforms is complicating the process and decreasing readability. Since there are few common steps between x86 and arm64, using the same workflow file requires us to either:

  • Add a condition to each step (so that if it's an arm64 architecture without the appropriate tag, the workflow is skipped)
    or
  • Create a separate job, which ends up looking like the example below, leading to a lot of skipped jobs of the same type.
image

This is assuming we follow what @seemethere suggested, calling from trunk again.

If we decide to keep the x86 runs on the trunk and trigger the same workflow with a tag for arm64, we'll end up with a similar check logic again, leading to a lof if conditions again. Both approaches make the code harder to read.

@iremyux
Copy link
Collaborator Author

iremyux commented Jul 15, 2025

hey @malfet ,

I don't think it's reasonable to enable new platform builds on every commit until we have some measure of stability of the platform. One way to get sense of that, is to first add it as opt-in, then as periodic and finally enable on every commit.

This PR right now does exactly that. There is one single workflow for building and testing - triggered by the ciflow/win-arm64 tag.

@seemethere, I understand your concerns, but currently, Windows Arm64 runners require considerable extra effort. They don't share many steps in common with the other Windows runners. So for now, I believe having a separate workflow will actually be simpler to manage.

When we have the green light to go periodic, we will be merging the two workflows as you requested.

If we can settle on having a separate, non-periodic workflow, I'd appreciate it if you could review the changes introduced in this PR.

@malfet malfet added the topic: not user facing topic category label Jul 23, 2025
@malfet
Copy link
Contributor

malfet commented Jul 23, 2025

@pytorchbot merge -f "Lint + opt-in workflow are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@iremyux iremyux deleted the trunk-win-arm64 branch July 29, 2025 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/win-arm64 Trigger Windows Arm64 CI Workflows Merged open source release notes: releng release notes category topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants