Skip to content

MAINT: bump OpenBLAS "the old way"#20215

Merged
tylerjereddy merged 5 commits intoscipy:mainfrom
tylerjereddy:treddy_issue_16930
Mar 12, 2024
Merged

MAINT: bump OpenBLAS "the old way"#20215
tylerjereddy merged 5 commits intoscipy:mainfrom
tylerjereddy:treddy_issue_16930

Conversation

@tylerjereddy
Copy link
Copy Markdown
Contributor

Not ready for merge... this is actually quite messy to fix I think, because of our current infrastructure transition around OpenBLAS.

[skip cirrus] [skip circle]

@tylerjereddy tylerjereddy added defect A clear bug or issue that prevents SciPy from being installed or used as expected maintenance Items related to regular maintenance tasks labels Mar 7, 2024
@tylerjereddy tylerjereddy added this to the 1.13.0 milestone Mar 7, 2024
@tylerjereddy
Copy link
Copy Markdown
Contributor Author

cc @mattip

@mattip
Copy link
Copy Markdown
Contributor

mattip commented Mar 7, 2024

I think you can use the v0.3.26 tarballs from https://anaconda.org/scientific-python-nightly-wheels/openblas-libs, so you would need to modify tools/openblas_support.py:

- OPENBLAS_V = '0.3.21.dev'
- OPENBLAS_LONG = 'v0.3.20-571-g3dec11c6'
+ OPENBLAS_V = '0.3.26'
+ OPENBLAS_LONG = 'v0.3.26'

@tylerjereddy
Copy link
Copy Markdown
Contributor Author

I tried that, but it turns into scipy-openblas32 land problems instead of the usual old approach:

  Run-time dependency scipy-openblas found: YES 0.3.26

  ../scipy/meson.build:242:4: ERROR: File _distributor_init_local.py does not exist.

I'd probably have to start making shims to adjust this branch to match your new infrastructure, no?

@mattip
Copy link
Copy Markdown
Contributor

mattip commented Mar 8, 2024

Sorry. I must have overwritten it at some point. It seems the previous one was

- OPENBLAS_V = '0.3.21.dev'
- OPENBLAS_LONG = 'v0.3.20-571-g3dec11c6'
+ OPENBLAS_V = '0.3.24-dev'
+ OPENBLAS_LONG = 'v0.3.24-95-g9d425a5f'

@tylerjereddy
Copy link
Copy Markdown
Contributor Author

That fails too--I'm pretty sure I tried most of the binaries up there yesterday. I'll paste the error I got yesterday for that scenario below:

  [241/1494] Linking target scipy/special/_ellip_harm_2.cpython-311-x86_64-linux-gnu.so
  FAILED: scipy/special/_ellip_harm_2.cpython-311-x86_64-linux-gnu.so
  cc  -o scipy/special/_ellip_harm_2.cpython-311-x86_64-linux-gnu.so scipy/special/_ellip_harm_2.cpython-311-x86_64-linux-gnu.so.p/meson-generated__ellip_harm_2.c.o scipy/special/_ellip_harm_2.cpython-311-x86_64-linux-gnu.so.p/sf_error.c.o -Wl,--as-needed -Wl,--allow-shlib-undefined -Wl,-O1 -shared -fPIC -Wl,--start-group -lm -Wl,--version-script=/project/scipy/_build_utils/link-version-pyinit.map -L/usr/local/lib -lopenblas -Wl,--end-group
  /opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld: cannot find -lopenblas
  collect2: error: ld returned 1 exit status

@mattip
Copy link
Copy Markdown
Contributor

mattip commented Mar 9, 2024

Ok I will make sure there is a 0.3.26 you can use. It might take me a day or so.

@mattip
Copy link
Copy Markdown
Contributor

mattip commented Mar 11, 2024

Which platforms do you need openblas tarballs for? Do you ship linux aarch64? That builds on travis which is flaky and not cooperating (this is a known problem with travis CI, would be nice to find a different aarch64 CI system but that is another problem).

@andyfaff
Copy link
Copy Markdown
Contributor

Cirrusci can handle Linux aarch64 relatively straightforwardly

@mattip
Copy link
Copy Markdown
Contributor

mattip commented Mar 11, 2024

It seems #20228 is working, using the 0.3.26 tarballs I uploaded today to https://anaconda.org/multibuild-wheels-staging/openblas-libs/files

tylerjereddy and others added 4 commits March 11, 2024 10:12
* Related to scipygh-16930

* added a regression test, based on the reproducer in the issue,
that I confirmed does indeed fail locally on x86_64 Linux via:
`CIBW_BUILD=cp311-* cibuildwheel --platform linux --archs x86_64`

* this assumes that scipygh-20074 needs a little more time because
of all the moving parts related to that

* I spent a few hours on this today, and unfortunately I don't
see an obvious fix because the "newest of the old" OpenBLAS binaries we were
using at https://anaconda.org/multibuild-wheels-staging/openblas-libs/files
are too old to fix the test via `cibuildwheel` in my hands; conversely,
the new OpenBLAS binaries (https://anaconda.org/scientific-python-nightly-wheels/openblas-libs/files)
don't seem to offer a version that is simultaneously new enough while
not introducing at least a subset of the challenges experienced
by scipygh-20074 (yikes!)

[skip cirrus] [skip circle]
* attempt a wheel build after pulling in fixes
from Matti P. to bump to OpenBLAS `0.3.26` (and
after Matti built special "old style" binaries
for this upstream)

[wheel build]
@tylerjereddy tylerjereddy marked this pull request as ready for review March 11, 2024 16:22
@tylerjereddy tylerjereddy requested a review from rgommers as a code owner March 11, 2024 16:23
@tylerjereddy
Copy link
Copy Markdown
Contributor Author

@mattip Awesome, thanks a lot!

I confirmed locally that with your commits pulled in from the cross-linked PR combined with your work to provide a new OpenBLAS build upstream, this branch passes the regression test to fix the issue in question.

I'm going to proceed with a full wheel build with your two commits cherry-picked in here now, and if there are no related failures we can move forward I think.

@tylerjereddy
Copy link
Copy Markdown
Contributor Author

tylerjereddy commented Mar 11, 2024

@andyfaff is there a way for me to manually override the Cirrus Linux ARM wheel build skips on a PR/feature branch like this? (without changing the source code--I'm pretty sure my manual re-run trigger just skipped again and uploaded some previous artifact)

@andyfaff
Copy link
Copy Markdown
Contributor

Do I understand you want the cirrus runs to be forced to go again? If so, I've just forced them to run again.

@rgommers
Copy link
Copy Markdown
Member

So close! A single failure on the macOS x86-64 jobs:

  ../venv-test/lib/python3.10/site-packages/scipy/linalg/tests/test_blas.py:1102: in test_gh_169309
      assert_allclose(actual, expected)
          actual     = 0.0
          expected   = 22.360679774997898
          x          = array([10, 10, 10, 10, 10, 10, 10, 10, 10])
  /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/contextlib.py:79: in inner
      return func(*args, **kwds)
  E   AssertionError: 
  E   Not equal to tolerance rtol=1e-07, atol=0
  E   
  E   Mismatched elements: 1 / 1 (100%)
  E   Max absolute difference among violations: 22.36067977
  E   Max relative difference among violations: 1.
  E    ACTUAL: array(0.)
  E    DESIRED: array(22.36068)

So perhaps not fully fixed upstream. Perhaps drop the added test from this PR, and @ev-br may want to investigate further? Getting OpenBLAS 0.3.26 shipped in the next release will be quite nice either way.

@ev-br
Copy link
Copy Markdown
Member

ev-br commented Mar 11, 2024

Sure, feel free to skip it, I'll take a look. dnrm2 sounds familiar.

* skip the new regression test because
OpenBLAS `0.3.26` doesn't fully fix scipygh-16930

[skip circle] [skip cirrus]
@tylerjereddy tylerjereddy changed the title WIP, BUG, MAINT: bump OpenBLAS "the old way" MAINT: bump OpenBLAS "the old way" Mar 11, 2024
@tylerjereddy
Copy link
Copy Markdown
Contributor Author

tylerjereddy commented Mar 11, 2024

Looks like the wheels failed the new test on a few different platforms even with OpenBLAS 0.3.26. I've pushed in a commit to xfail (and run=False) the new test for now, with reference to the issue that should remain open for now.

Once the regular CI is happy, perhaps we should squash-merge.

@tylerjereddy
Copy link
Copy Markdown
Contributor Author

I believe the remaining CI failures are gh-20230.

@tylerjereddy tylerjereddy merged commit 8315173 into scipy:main Mar 12, 2024
@tylerjereddy tylerjereddy deleted the treddy_issue_16930 branch March 12, 2024 16:20
@tylerjereddy
Copy link
Copy Markdown
Contributor Author

I squash-merged--I'd like to see a few days of the new OpenBLAS getting flushed through before branching for 1.13.0 around Sunday (I think).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

defect A clear bug or issue that prevents SciPy from being installed or used as expected maintenance Items related to regular maintenance tasks scipy.linalg

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants