MAINT: Test during import to detect bugs with Accelerate(MacOS) LAPACK#15695
MAINT: Test during import to detect bugs with Accelerate(MacOS) LAPACK#15695mattip merged 4 commits intonumpy:masterfrom
Conversation
mattip
left a comment
There was a problem hiding this comment.
This seems like a good start for a test, now we need to make sure it works as advertised.
We run macOS on azure-pipelines CI via the azure-pipelines.yml file, around lines 58-148. Is there a strategy to add a macOS run that will avoid the "install pre-built openblas" step, successfully build NumPy in the "Build NumPy step, but then get a runtime error when importing it. I think you might be able to use conditionals with a variable like "TEST_NO_BLAS" to skip/execute some steps.
|
I can give setting up the azure build a shot if you are too busy to do it. |
|
On the azure-pipeline setup, is it as straight-forward as editing the yaml file, any backed setup in azure to do for it? I can set it up, no issues. Let me look at the docs and do that as well |
|
I was thinking of the following way to test the mac OS accelerate build import failure In the step: "install pre-built openblas" Store the local variable $target After the step : "Verify OpenBLAS version" Do the following # Clean up OpenBLAS installed in system path, build
- script: |
git clean -xdf
libs=$(find $obltarget/lib/* -maxdepth 1 -not -type d | xargs -n 1 basename)
includes=$(find $obltarget/include/* -maxdepth 1 -not -type d | xargs -n 1 basename)
for lib in $libs; do rm /usr/local/lib/$lib; done
for include in $includes; do rm /usr/local/include/$include; done
displayName: "Clean build for Accelerate Bug Testing"
- script: python setup.py build -j 4 build_src --verbose-cfg install
displayName: "Build NumPy without OpenBLAS"
env:
BLAS: None
LAPACK: None
ATLAS: None
ACCELERATE: None
CC: /usr/bin/clang
# wait until after dev build of NumPy to pip
- script: |
set +e
python -c "import numpy as np"
checkoutput=$?
if [ $checkoutput == 1 ]; then exit 0; else exit 1;fi
displayName: "Check if numpy import fails with accelerate"There might be some gap in the way I am thinking on this, first time ever using Azure pipelines. perhaps there is a better way to do this Trying to clean this up using strategy section as mentioned above |
|
The steps are called for the "normal" OpenBLAS build as well, so you will need to toggle behaviour depending on whether you want the import to succeed or fail. Also if you are testing the "Accelerate" option, you will need to somehow skip the rest of the tests once import fails. And if you are already toggling steps on and off, maybe instead of setting up the OpenBLAS |
|
Here is an updated azure pipeline job for MacOS - job: macOS
pool:
# NOTE: at time of writing, there is a danger
# that using an invalid vmIMage string for macOS
# image silently redirects to a Windows build on Azure;
# for now, use the only image name officially present in
# the docs even though i.e., numba uses another in their
# azure config for mac os -- Microsoft has indicated
# they will patch this issue
vmImage: macOS-10.14
strategy:
maxParallel: 3
matrix:
Python36:
PYTHON_VERSION: '3.6'
USE_OPENBLAS: '1'
Python36-ILP64:
PYTHON_VERSION: '3.6'
NPY_USE_BLAS_ILP64: '1'
USE_OPENBLAS: '1'
Accelerate:
PYTHON_VERSION: '3.6'
USE_OPENBLAS: '0'
steps:
# the @0 refers to the (major) version of the *task* on Microsoft's
# end, not the order in the build matrix nor anything to do
# with version of Python selected
- task: UsePythonVersion@0
inputs:
versionSpec: $(PYTHON_VERSION)
addToPath: true
architecture: 'x64'
# NOTE: do we have a compelling reason to use older / newer
# versions of Xcode toolchain for testing?
- script: /bin/bash -c "sudo xcode-select -s /Applications/Xcode_10.app/Contents/Developer"
displayName: 'select Xcode version'
# NOTE: might be better if we could avoid installing
# two C compilers, but with homebrew looks like we're
# now stuck getting the full gcc toolchain instead of
# just pulling in gfortran
- script: |
# same version of gfortran as the wheel builds
brew install gcc49
# manually link critical gfortran libraries
ln -s /usr/local/Cellar/gcc@4.9/4.9.4_1/lib/gcc/4.9/libgfortran.3.dylib /usr/local/lib/libgfortran.3.dylib
ln -s /usr/local/Cellar/gcc@4.9/4.9.4_1/lib/gcc/4.9/libquadmath.0.dylib /usr/local/lib/libquadmath.0.dylib
# Manually symlink gfortran-4.9 to plain gfortran for f2py.
# No longer needed after Feb 13 2020 as gfortran is already present
# and the attempted link errors. Keep this for future reference.
# ln -s /usr/local/bin/gfortran-4.9 /usr/local/bin/gfortran
displayName: 'make gfortran available on mac os vm'
# use the pre-built openblas binary that most closely
# matches our MacOS wheel builds -- currently based
# primarily on file size / name details
- script: |
python -m pip install urllib3
target=$(python tools/openblas_support.py)
ls -lR $target
# manually link to appropriate system paths
cp $target/lib/* /usr/local/lib/
cp $target/include/* /usr/local/include/
displayName: 'install pre-built openblas'
condition: eq($(USE_OPENBLAS), '1')
- script: python -m pip install --upgrade pip setuptools wheel
displayName: 'Install tools'
- script: |
python -m pip install -r test_requirements.txt
python -m pip install vulture docutils sphinx==2.2.0 numpydoc
displayName: 'Install dependencies; some are optional to avoid test skips'
- script: /bin/bash -c "! vulture . --min-confidence 100 --exclude doc/,numpy/distutils/ | grep 'unreachable'"
displayName: 'Check for unreachable code paths in Python modules'
# prefer usage of clang over gcc proper
# to match likely scenario on many user mac machines
- script: python setup.py build -j 4 build_src --verbose-cfg install
displayName: 'Build NumPy'
env:
BLAS: None
LAPACK: None
ATLAS: None
ACCELERATE: None
CC: /usr/bin/clang
# wait until after dev build of NumPy to pip
# install matplotlib to avoid pip install of older numpy
- script: python -m pip install matplotlib
displayName: 'Install matplotlib before refguide run'
- script: python runtests.py -g --refguide-check
displayName: 'Run Refuide Check'
condition: eq($(USE_OPENBLAS), '1')
- script: python runtests.py -n --mode=full -- -rsx --junitxml=junit/test-results.xml
displayName: 'Run Full NumPy Test Suite'
condition: eq($USE_OPENBLAS, '1')
- bash: python tools/openblas_support.py --check_version
displayName: 'Verify OpenBLAS version'
condition: eq($(USE_OPENBLAS), '1')
- script: |
set +e
python -c "import numpy as np"
checkoutput=$?
if [ $checkout == 1 ]; then exit 0; else exit 1;fi
displayName: "Check if numpy import fails with accelerate"
condition: eq($(USE_OPENBLAS), '0')
- task: PublishTestResults@2
condition: succeededOrFailed()
inputs:
testResultsFiles: '**/test-*.xml'
failTaskOnFailedTests: true
testRunTitle: 'Publish test results for Python 3.6 64-bit full Mac OS'Does this look correct?
|
|
Looks good. You can add it to the PR to test it out. I think the actual test could be improved by grepping the stdout/stderr for Any idea by how much the test slows down import on darwin? Is it measurable with |
bbef455 to
1b5dc27
Compare
|
Added a grep command to verify as well. Checking if the pipeline passes |
a14ca3e to
517ddd0
Compare
|
@mattip , got all the changes passing. Requesting your feedback on the same |
|
Does the test affect import time on macOS? |
As far I checked it does not seem to affect import speed much # With the new test
time python -c "import numpy as np"
python(53875,0x1108b25c0) malloc: can't allocate region
*** mach_vm_map(size=18446744072026976256) failed (error code=3)
python(53875,0x1108b25c0) malloc: *** set a breakpoint in malloc_error_break to debug
init_dgelsd failed init
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/rakvas/venvs/numpyenv/lib/python3.7/site-packages/numpy-1.19.0.dev0+9034b17-py3.7-macosx-10.14-x86_64.egg/numpy/__init__.py", line 284, in <module>
raise RuntimeError(msg)
RuntimeError: Polyfit sanity test emitted a warning, most likely due to using a buggy Accelerate backend. If you compiled yourself, see site.cfg.example for information. Otherwise report this to the vendor that provided NumPy.
RankWarning: Polyfit may be poorly conditioned
real 0m0.185s
user 0m0.135s
sys 0m0.039s
(numpyenv) 8c859043244e:Workspace rakvas$ Without the new test # Without test
time python -c "import numpy as np"
real 0m0.155s
user 0m0.116s
sys 0m0.032sSubsequent runs of the same time test has slight variance above or below the above figure. This on a mac running a 4 core cpu with 16gb RAM with timeit code code = '''
try:
import numpy as np
except RuntimeError as e:
pass
'''
import timeit
print(timeit.timeit(stmt=code, number=10))The results are # With test
0.11233833800000001
# Without test
0.107218399 |
2a9197c to
8e2ca9d
Compare
|
Regarding , doing the pip, without pip The test does fails with numpy source folder error, I am trying to make the build (in build step) --inplace now |
This comment has been minimized.
This comment has been minimized.
|
@seberg Now weirdly windows build is failing - mingw (exited 1) - mingw not installed. An error occurred during installation:And I touched no part of that |
|
Re-based on master and all the test seems to pass , @mattip Hope this puts the PR in a much better shape. Anything else we need to look into? |
|
Could you compare the time to import NumPy when it is compiled with OpenBLAS with and without the extra check? |
Here are some updated numbers when running with OpenBLAS compiled # With test change
0.119404168
# Without test change
0.112477977 |
|
I assume the units are seconds on those timings. So the test seems to slow things down by 7msec or about 6% of total import time. Can we find a faster way to test this? |
|
The test did produce variable results, there were occasions where the opposite was true, i.e the new one imported faster than the older one . Let me do a run with more iterations , and cleaner environment with nothing else in background. I took the lowest number from an average of 5 runs of the script above previously . I will also look into possibly shortening it |
|
Redid the test with 1000 iterations to average on Ensured, no other process was running. Run system information ----------Python Info----------
Version : 3.7.4
Compiler : Clang 11.0.0 (clang-1100.0.33.8)
Build : ('default', 'Oct 12 2019 18:55:28')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 20.0.2
Directory : /Users/rakvas/venvs/numpyenv/lib/python3.7/site-packages/pip
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
Platform : Darwin-18.7.0-x86_64-i386-64bit
system : Darwin
release : 18.7.0
version : Darwin Kernel Version 18.7.0: Thu Jan 23 06:52:12 PST 2020; root:xnu-4903.278.25~1/RELEASE_X86_64
----------Hardware Info----------
machine : x86_64
processor : i386
b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SGX BMI1 HLE AVX2 SMEP BMI2 ERMS INVPCID RTM FPU_CSDS MPX RDSEED ADX SMAP CLFSOPT IPT MDCLEAR TSXFA IBRS STIBP L1DF SSBD'
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
code = '''
try:
import numpy as np
except RuntimeError as e:
pass
'''
import timeit
print(timeit.timeit(stmt=code, number=1000))The result for a build built with openblas was as follows After further analysis, the culprit turned out to be platform.system() which calls uname underneath, essentially a subprocess call to check for mac Replacing this with sys.platform (String baked in with python compilation) resolves the additional time taken Comparing sys.platform and platform.system() import timeit
code = '''
import sys
x=sys.platform
'''
code_2 = '''
import platform
x=platform.system()
'''
print("SysTime:{}".format(timeit.timeit(stmt=code,number=1000)))
print("PlatTime:{}".format(timeit.timeit(stmt=code_2,number=1000)))
Output
SysTime:0.00039271700000000076
PlatTime:0.023952089000000003After making the change #With and without change for openBLAS build
hovers between 0.094 and 0.096 |
|
Nice catch. One last fix and I think this is ready. |
Done. Made the change |
|
/azp run |
|
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
|
Is azure pipelines down?? |
|
@vrakesh, yes. Here's a comment by a github staff member Friday evening: #15747 (comment) |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Thanks @vrakesh |
fixes #15647