Skip to content

Add option to define OpenBLAS version for manylinux Dockerfile_2_28_aarch64#150106

Closed
davsva01 wants to merge 2 commits intopytorch:mainfrom
davsva01:manylinux_openblas_version
Closed

Add option to define OpenBLAS version for manylinux Dockerfile_2_28_aarch64#150106
davsva01 wants to merge 2 commits intopytorch:mainfrom
davsva01:manylinux_openblas_version

Conversation

@davsva01
Copy link
Contributor

@davsva01 davsva01 commented Mar 27, 2025

Adds optional variable OPENBLAS_VERSION to .ci/docker/common/install_openblas.sh used to define which version of OpenBLAS to install. Adds argument to Dockerfile_2_28_aarch64 image.

cc @malfet @snadampal @milpuz01 @aditew01 @nikhil-arm @fadara01

@davsva01 davsva01 requested a review from jeffdaily as a code owner March 27, 2025 11:25
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Mar 27, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 27, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150106

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 4700d8e with merge base 28cb3c0 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 31, 2025
@nikhil-arm
Copy link
Collaborator

@pytorchbot label "ciflow/linux-aarch64"
@pytorchbot label "module: arm"

@pytorch-bot pytorch-bot bot added the ciflow/linux-aarch64 linux aarch64 CI workflow label May 12, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented May 12, 2025

To add the ciflow label ciflow/linux-aarch64 please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/linux-aarch64 linux aarch64 CI workflow label May 12, 2025
@nikhil-arm
Copy link
Collaborator

@pytorchbot label "module: arm"

@pytorch-bot pytorch-bot bot added the module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 label May 12, 2025
Copy link
Collaborator

@fadara01 fadara01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of this PR is that when we need to update the OpenBLAS version, we now have to (remember to) change the version in 2 scripts (.ci/docker/common/install_openblas.sh and .ci/docker/manywheel/build.sh) instead of just 1.

Is it possible to make sure that the version is only set/hardcoded in one place?

We currently have this problem of having to update multiple scripts on a version change in ACL. The price we paid for that was having ACL version in manylinux ahead of that in CI, leading to green CI, but having test failures with manylinux builds

@Mousius
Copy link
Contributor

Mousius commented May 21, 2025

The downside of this PR is that when we need to update the OpenBLAS version, we now have to (remember to) change the version in 2 scripts (.ci/docker/common/install_openblas.sh and .ci/docker/manywheel/build.sh) instead of just 1.

Is it possible to make sure that the version is only set/hardcoded in one place?

We currently have this problem of having to update multiple scripts on a version change in ACL. The price we paid for that was having ACL version in manylinux ahead of that in CI, leading to green CI, but having test failures with manylinux builds

You mean remove the 0.3.29 default in install_openblas.sh and ensure all the version come from build.sh?

@fadara01
Copy link
Collaborator

fadara01 commented May 22, 2025

You mean remove the 0.3.29 default in install_openblas.sh and ensure all the version come from build.sh?

Yeah, having the version hard-coded in just one place.

@davsva01 davsva01 force-pushed the manylinux_openblas_version branch from 37a0d91 to 806d87a Compare May 27, 2025 09:38
@davsva01
Copy link
Contributor Author

You mean remove the 0.3.29 default in install_openblas.sh and ensure all the version come from build.sh?

Yeah, having the version hard-coded in just one place.

Currently the only image using install_openblas.sh is .ci/docker/manywheel/Dockerfile_2_28_aarch64. We have planned work to refactor the build process ensuring manylinux and CI rely on the same build scripts, so will revisit defaults and where version definition should reside as part of that work.

@fadara01 fadara01 self-requested a review May 27, 2025 10:55
Copy link
Collaborator

@fadara01 fadara01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thank you

@fadara01
Copy link
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #150106, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

@fadara01
Copy link
Collaborator

CI failure is unrelated

@malfet
Copy link
Contributor

malfet commented May 29, 2025

@pytorchbot merge

@fadara01
Copy link
Collaborator

@pytorchbot rebase

@fadara01
Copy link
Collaborator

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu)

@malfet this seems unrelated, would it be acceptable for me to force-merge here?

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased manylinux_openblas_version onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout manylinux_openblas_version && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the manylinux_openblas_version branch from 3141748 to 22ade68 Compare May 29, 2025 17:10
@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label May 29, 2025
@fadara01
Copy link
Collaborator

fadara01 commented Jun 2, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased manylinux_openblas_version onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout manylinux_openblas_version && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the manylinux_openblas_version branch from 22ade68 to 17dbdc4 Compare June 2, 2025 07:46
@fadara01
Copy link
Collaborator

fadara01 commented Jun 2, 2025

@pytorchbot merge -h

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 2, 2025

PyTorchBot Help

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

In order to invoke the bot on your PR, include a line that starts with
@pytorchbot anywhere in a comment. That line will form the command; no
multi-line commands are allowed. Some commands may be used on issues as specified below.

Example:
    Some extra context, blah blah, wow this PR looks awesome

    @pytorchbot merge

optional arguments:
  -h, --help            Show this help message and exit.

command:
  {merge,revert,rebase,label,drci,cherry-pick,close}
    merge               Merge a PR
    revert              Revert a PR
    rebase              Rebase a PR
    label               Add label to a PR
    drci                Update Dr. CI
    cherry-pick         Cherry pick a PR onto a release branch
    close               Close a PR

Merge

usage: @pytorchbot merge [-f MESSAGE | -i] [-ic] [-r [{viable/strict,main}]]

Merge an accepted PR, subject to the rules in .github/merge_rules.json.
By default, this will wait for all required checks (lint, pull) to succeed before merging.

optional arguments:
  -f MESSAGE, --force MESSAGE
                        Merge without checking anything. This requires a reason for auditting purpose, for example:
                        @pytorchbot merge -f 'Minor update to fix lint. Expecting all PR tests to pass'
                        
                        Please use `-f` as last resort, prefer `--ignore-current` to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.
  -i, --ignore-current  Merge while ignoring the currently failing jobs.  Behaves like -f if there are no pending jobs.
  -ic                   Old flag for --ignore-current. Deprecated in favor of -i.
  -r [{viable/strict,main}], --rebase [{viable/strict,main}]
                        Rebase the PR to re run checks before merging.  Accepts viable/strict or main as branch options and will default to viable/strict if not specified.

Revert

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Revert a merged PR. This requires that you are a Meta employee.

Example:
  @pytorchbot revert -m="This is breaking tests on trunk. hud.pytorch.org/" -c=nosignal

optional arguments:
  -m MESSAGE, --message MESSAGE
                        The reason you are reverting, will be put in the commit message. Must be longer than 3 words.
  -c {nosignal,ignoredsignal,landrace,weird,ghfirst}, --classification {nosignal,ignoredsignal,landrace,weird,ghfirst}
                        A machine-friendly classification of the revert reason.

Rebase

usage: @pytorchbot rebase [-s | -b BRANCH]

Rebase a PR. Rebasing defaults to the stable viable/strict branch of pytorch.
Repeat contributor may use this command to rebase their PR.

optional arguments:
  -s, --stable          [DEPRECATED] Rebase onto viable/strict
  -b BRANCH, --branch BRANCH
                        Branch you would like to rebase to

Label

usage: @pytorchbot label labels [labels ...]

Adds label to a PR or Issue [Can be used on Issues]

positional arguments:
  labels  Labels to add to given Pull Request or Issue [Can be used on Issues]

Dr CI

usage: @pytorchbot drci 

Update Dr. CI. Updates the Dr. CI comment on the PR in case it's gotten out of sync with actual CI results.

cherry-pick

usage: @pytorchbot cherry-pick --onto ONTO [--fixes FIXES] -c
                               {regression,critical,fixnewfeature,docs,release}

Cherry pick a pull request onto a release branch for inclusion in a release

optional arguments:
  --onto ONTO, --into ONTO
                        Branch you would like to cherry pick onto (Example: release/2.1)
  --fixes FIXES         Link to the issue that your PR fixes (Example: https://github.com/pytorch/pytorch/issues/110666)
  -c {regression,critical,fixnewfeature,docs,release}, --classification {regression,critical,fixnewfeature,docs,release}
                        A machine-friendly classification of the cherry-pick reason.

Close

usage: @pytorchbot close

Close a PR [Can be used on issues]

@fadara01
Copy link
Collaborator

fadara01 commented Jun 3, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

davsva01 and others added 2 commits June 3, 2025 08:05
…arch64

Adds optional variable OPENBLAS_VERSION to .ci/docker/common/install_openblas.sh
used to define which version of OpenBLAS to install. Adds argument to Dockerfile_2_28_aarch64 image.
Co-authored-by: Fadi Arafeh <115173828+fadara01@users.noreply.github.com>
@pytorchmergebot
Copy link
Collaborator

Successfully rebased manylinux_openblas_version onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout manylinux_openblas_version && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the manylinux_openblas_version branch from 17dbdc4 to 4700d8e Compare June 3, 2025 08:05
@fadara01
Copy link
Collaborator

fadara01 commented Jun 3, 2025

failures are unrelated.
@pytorchbot merge -i

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 3, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 4 checks: Build manywheel docker images for s390x / build-docker-cpu-s390x, docker-builds / docker-build (linux.12xlarge, pytorch-linux-jammy-py3-clang12-executorch), pull / linux-jammy-py3-clang12-executorch / build, pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

iupaikov-amd pushed a commit to ROCm/pytorch that referenced this pull request Jun 4, 2025
…arch64 (pytorch#150106)

Adds optional variable OPENBLAS_VERSION to `.ci/docker/common/install_openblas.sh` used to define which version of OpenBLAS to install. Adds argument to `Dockerfile_2_28_aarch64` image.

Pull Request resolved: pytorch#150106
Approved by: https://github.com/aditew01, https://github.com/fadara01, https://github.com/malfet

Co-authored-by: Fadi Arafeh <115173828+fadara01@users.noreply.github.com>
angelayi pushed a commit to angelayi/pytorch that referenced this pull request Jun 5, 2025
…arch64 (pytorch#150106)

Adds optional variable OPENBLAS_VERSION to `.ci/docker/common/install_openblas.sh` used to define which version of OpenBLAS to install. Adds argument to `Dockerfile_2_28_aarch64` image.

Pull Request resolved: pytorch#150106
Approved by: https://github.com/aditew01, https://github.com/fadara01, https://github.com/malfet

Co-authored-by: Fadi Arafeh <115173828+fadara01@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants