Skip to content

MAINT: Test during import to detect bugs with Accelerate(MacOS) LAPACK#15695

Merged
mattip merged 4 commits intonumpy:masterfrom
vrakesh:mac_os_test
Mar 15, 2020
Merged

MAINT: Test during import to detect bugs with Accelerate(MacOS) LAPACK#15695
mattip merged 4 commits intonumpy:masterfrom
vrakesh:mac_os_test

Conversation

@vrakesh
Copy link
Copy Markdown
Member

@vrakesh vrakesh commented Mar 3, 2020

fixes #15647

Copy link
Copy Markdown
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good start for a test, now we need to make sure it works as advertised.

We run macOS on azure-pipelines CI via the azure-pipelines.yml file, around lines 58-148. Is there a strategy to add a macOS run that will avoid the "install pre-built openblas" step, successfully build NumPy in the "Build NumPy step, but then get a runtime error when importing it. I think you might be able to use conditionals with a variable like "TEST_NO_BLAS" to skip/execute some steps.

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 3, 2020

I can give setting up the azure build a shot if you are too busy to do it.

@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 3, 2020

On the azure-pipeline setup, is it as straight-forward as editing the yaml file, any backed setup in azure to do for it? I can set it up, no issues. Let me look at the docs and do that as well

@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 4, 2020

I was thinking of the following way to test the mac OS accelerate build import failure

In the step: "install pre-built openblas"

Store the local variable $target
echo "##vso[task.setvariable variable=obltarget]$target"

After the step : "Verify OpenBLAS version"

Do the following

    # Clean up OpenBLAS installed in system path, build
    - script: |
        git clean -xdf
        libs=$(find $obltarget/lib/* -maxdepth 1 -not -type d | xargs -n 1 basename)
        includes=$(find $obltarget/include/* -maxdepth 1 -not -type d | xargs -n 1 basename)
        for lib in $libs; do rm /usr/local/lib/$lib; done
        for include in $includes; do rm /usr/local/include/$include; done
      displayName: "Clean build for Accelerate Bug Testing"
    - script: python setup.py build -j 4 build_src --verbose-cfg install
      displayName: "Build NumPy without OpenBLAS"
      env:
        BLAS: None
        LAPACK: None
        ATLAS: None
        ACCELERATE: None
        CC: /usr/bin/clang
      # wait until after dev build of NumPy to pip
    - script: |
      set +e
      python -c "import numpy as np"
      checkoutput=$?
      if [ $checkoutput == 1 ]; then exit 0; else exit 1;fi
      displayName: "Check if numpy import fails with accelerate"

There might be some gap in the way I am thinking on this, first time ever using Azure pipelines. perhaps there is a better way to do this

Trying to clean this up using strategy section as mentioned above

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 4, 2020

The steps are called for the "normal" OpenBLAS build as well, so you will need to toggle behaviour depending on whether you want the import to succeed or fail. Also if you are testing the "Accelerate" option, you will need to somehow skip the rest of the tests once import fails. And if you are already toggling steps on and off, maybe instead of setting up the OpenBLAS target then deleting it, just skip the "install pre-built openblas" in the first place?

@charris charris changed the title TST: Test during import to detect bugs with Accelerate(MacOS) LAPACK MAINT: Test during import to detect bugs with Accelerate(MacOS) LAPACK Mar 4, 2020
@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 4, 2020

Here is an updated azure pipeline job for MacOS

- job: macOS
    pool:
      # NOTE: at time of writing, there is a danger
      # that using an invalid vmIMage string for macOS
      # image silently redirects to a Windows build on Azure;
      # for now, use the only image name officially present in
      # the docs even though i.e., numba uses another in their
      # azure config for mac os -- Microsoft has indicated
      # they will patch this issue
      vmImage: macOS-10.14
    strategy:
      maxParallel: 3
      matrix:
          Python36:
            PYTHON_VERSION: '3.6'
            USE_OPENBLAS: '1'
          Python36-ILP64:
            PYTHON_VERSION: '3.6'
            NPY_USE_BLAS_ILP64: '1'
            USE_OPENBLAS: '1'
          Accelerate:
            PYTHON_VERSION: '3.6'
            USE_OPENBLAS: '0'

    steps:
    # the @0 refers to the (major) version of the *task* on Microsoft's
    # end, not the order in the build matrix nor anything to do
    # with version of Python selected
    - task: UsePythonVersion@0
      inputs:
        versionSpec: $(PYTHON_VERSION)
        addToPath: true
        architecture: 'x64'
    # NOTE: do we have a compelling reason to use older / newer
    # versions of Xcode toolchain for testing?
    - script: /bin/bash -c "sudo xcode-select -s /Applications/Xcode_10.app/Contents/Developer"
      displayName: 'select Xcode version'
    # NOTE: might be better if we could avoid installing
    # two C compilers, but with homebrew looks like we're
    # now stuck getting the full gcc toolchain instead of
    # just pulling in gfortran
    - script: |
        # same version of gfortran as the wheel builds
        brew install gcc49
        # manually link critical gfortran libraries
        ln -s /usr/local/Cellar/gcc@4.9/4.9.4_1/lib/gcc/4.9/libgfortran.3.dylib /usr/local/lib/libgfortran.3.dylib
        ln -s /usr/local/Cellar/gcc@4.9/4.9.4_1/lib/gcc/4.9/libquadmath.0.dylib /usr/local/lib/libquadmath.0.dylib
        # Manually symlink gfortran-4.9 to plain gfortran for f2py.
        # No longer needed after Feb 13 2020 as gfortran is already present
        # and the attempted link errors. Keep this for future reference.
        # ln -s /usr/local/bin/gfortran-4.9 /usr/local/bin/gfortran
      displayName: 'make gfortran available on mac os vm'
    # use the pre-built openblas binary that most closely
    # matches our MacOS wheel builds -- currently based
    # primarily on file size / name details
    - script: |
        python -m pip install urllib3
        target=$(python tools/openblas_support.py)
        ls -lR $target
        # manually link to appropriate system paths
        cp $target/lib/* /usr/local/lib/
        cp $target/include/* /usr/local/include/
      displayName: 'install pre-built openblas'
      condition: eq($(USE_OPENBLAS), '1')
    - script: python -m pip install --upgrade pip setuptools wheel
      displayName: 'Install tools'
    - script: |
        python -m pip install -r test_requirements.txt
        python -m pip install vulture docutils sphinx==2.2.0 numpydoc
      displayName: 'Install dependencies; some are optional to avoid test skips'
    - script: /bin/bash -c "! vulture . --min-confidence 100 --exclude doc/,numpy/distutils/ | grep 'unreachable'"
      displayName: 'Check for unreachable code paths in Python modules'
    # prefer usage of clang over gcc proper
    # to match likely scenario on many user mac machines
    - script: python setup.py build -j 4 build_src --verbose-cfg install
      displayName: 'Build NumPy'
      env:
        BLAS: None
        LAPACK: None
        ATLAS: None
        ACCELERATE: None
        CC: /usr/bin/clang
    # wait until after dev build of NumPy to pip
    # install matplotlib to avoid pip install of older numpy
    - script: python -m pip install matplotlib
      displayName: 'Install matplotlib before refguide run'
    - script: python runtests.py -g --refguide-check
      displayName: 'Run Refuide Check'
      condition: eq($(USE_OPENBLAS), '1')
    - script: python runtests.py -n --mode=full -- -rsx --junitxml=junit/test-results.xml
      displayName: 'Run Full NumPy Test Suite'
      condition: eq($USE_OPENBLAS, '1')
    - bash: python tools/openblas_support.py --check_version
      displayName: 'Verify OpenBLAS version'
      condition: eq($(USE_OPENBLAS), '1')
    - script: |
      set +e
      python -c "import numpy as np"
      checkoutput=$?
      if [ $checkout == 1 ]; then exit 0; else exit 1;fi
      displayName: "Check if numpy import fails with accelerate"
      condition: eq($(USE_OPENBLAS), '0')
    - task: PublishTestResults@2
      condition: succeededOrFailed()
      inputs:
        testResultsFiles: '**/test-*.xml'
        failTaskOnFailedTests: true
        testRunTitle: 'Publish test results for Python 3.6 64-bit full Mac OS'

Does this look correct?

  1. I added another strategy called Acclerate,
  2. Used USE_OPENBLAS variable as a conditional on certain steps, including installing openblas.
  3. Added one more step when openblas is not installed, to test "numpy import"

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 5, 2020

Looks good. You can add it to the PR to test it out. I think the actual test could be improved by grepping the stdout/stderr for Accelerate in the error message, but let's leave that for fine-tuning when this is running.

Any idea by how much the test slows down import on darwin? Is it measurable with timeit ?

@vrakesh vrakesh force-pushed the mac_os_test branch 2 times, most recently from bbef455 to 1b5dc27 Compare March 6, 2020 21:42
@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 6, 2020

Added a grep command to verify as well. Checking if the pipeline passes

@vrakesh vrakesh force-pushed the mac_os_test branch 7 times, most recently from a14ca3e to 517ddd0 Compare March 7, 2020 00:28
@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 7, 2020

@mattip , got all the changes passing. Requesting your feedback on the same

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 7, 2020

Does the test affect import time on macOS?

@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 9, 2020

Does the test affect import time on macOS?

As far I checked it does not seem to affect import speed much

# With the new test

time python -c "import numpy as np"
python(53875,0x1108b25c0) malloc: can't allocate region
*** mach_vm_map(size=18446744072026976256) failed (error code=3)
python(53875,0x1108b25c0) malloc: *** set a breakpoint in malloc_error_break to debug
init_dgelsd failed init
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/rakvas/venvs/numpyenv/lib/python3.7/site-packages/numpy-1.19.0.dev0+9034b17-py3.7-macosx-10.14-x86_64.egg/numpy/__init__.py", line 284, in <module>
    raise RuntimeError(msg)
RuntimeError: Polyfit sanity test emitted a warning, most likely due to using a buggy Accelerate backend. If you compiled yourself, see site.cfg.example for information. Otherwise report this to the vendor that provided NumPy.
RankWarning: Polyfit may be poorly conditioned


real	0m0.185s
user	0m0.135s
sys	0m0.039s
(numpyenv) 8c859043244e:Workspace rakvas$ 

Without the new test

# Without test
time python -c "import numpy as np"

real	0m0.155s
user	0m0.116s
sys	0m0.032s

Subsequent runs of the same time test has slight variance above or below the above figure. This on a mac running a 4 core cpu with 16gb RAM

with timeit code

code = '''                                                                                                                                                                                                                
try:
    import numpy as np
except RuntimeError as e:
    pass
'''

import timeit
print(timeit.timeit(stmt=code, number=10))

The results are

# With test
0.11233833800000001
# Without test
0.107218399

@vrakesh vrakesh force-pushed the mac_os_test branch 3 times, most recently from 2a9197c to 8e2ca9d Compare March 9, 2020 23:19
@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 9, 2020

Regarding , doing the pip, without pip The test does fails with numpy source folder error, I am trying to make the build (in build step) --inplace now

@seberg

This comment has been minimized.

@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 10, 2020

@seberg
I figured it out, in the cloud, with Acclerate=None, the cloud mac instance, does not link to Acclerate BLAS , unlike my laptop Mac or other previous experiences of Mac users here, where even setting Acclerate to None, linked default blas/lapack to accelerate. Hence while building for this specific test on cloud, Accelerate=None should not be set.

Now weirdly windows build is failing

 - mingw (exited 1) - mingw not installed. An error occurred during installation:

And I touched no part of that

@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 10, 2020

Re-based on master and all the test seems to pass , @mattip Hope this puts the PR in a much better shape. Anything else we need to look into?

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 11, 2020

Could you compare the time to import NumPy when it is compiled with OpenBLAS with and without the extra check?

@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 11, 2020

Could you compare the time to import NumPy when it is compiled with OpenBLAS with and without the extra check?

Here are some updated numbers when running with OpenBLAS compiled
The numbers are obtained running the timeit code mentioned above

# With test change
0.119404168

# Without test change
0.112477977

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 12, 2020

I assume the units are seconds on those timings. So the test seems to slow things down by 7msec or about 6% of total import time. Can we find a faster way to test this?

@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 12, 2020

The test did produce variable results, there were occasions where the opposite was true, i.e the new one imported faster than the older one . Let me do a run with more iterations , and cleaner environment with nothing else in background. I took the lowest number from an average of 5 runs of the script above previously . I will also look into possibly shortening it

@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 13, 2020

Redid the test with 1000 iterations to average on

Ensured, no other process was running.

Run system information

----------Python Info----------
Version      : 3.7.4
Compiler     : Clang 11.0.0 (clang-1100.0.33.8)
Build        : ('default', 'Oct 12 2019 18:55:28')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 20.0.2
Directory    : /Users/rakvas/venvs/numpyenv/lib/python3.7/site-packages/pip
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
Platform     : Darwin-18.7.0-x86_64-i386-64bit
system       : Darwin
release      : 18.7.0
version      : Darwin Kernel Version 18.7.0: Thu Jan 23 06:52:12 PST 2020; root:xnu-4903.278.25~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SGX BMI1 HLE AVX2 SMEP BMI2 ERMS INVPCID RTM FPU_CSDS MPX RDSEED ADX SMAP CLFSOPT IPT MDCLEAR TSXFA IBRS STIBP L1DF SSBD'
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
code = '''                                                                                                                                                                                                                
try:
    import numpy as np
except RuntimeError as e:
    pass
'''

import timeit
print(timeit.timeit(stmt=code, number=1000))

The result for a build built with openblas was as follows

#With change 
0.104423457
#Without change
0.09629199

8.4% increase in import time

After further analysis, the culprit turned out to be platform.system() which calls uname underneath, essentially a subprocess call to check for mac

Replacing this with sys.platform (String baked in with python compilation) resolves the additional time taken

Comparing sys.platform and platform.system()

import timeit
code = '''
import sys
x=sys.platform
'''
code_2 = '''
import platform
x=platform.system()
'''

print("SysTime:{}".format(timeit.timeit(stmt=code,number=1000)))
print("PlatTime:{}".format(timeit.timeit(stmt=code_2,number=1000)))

Output
SysTime:0.00039271700000000076
PlatTime:0.023952089000000003

After making the change

#With and without change for openBLAS build
hovers between 0.094 and 0.096

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 13, 2020

Nice catch. One last fix and I think this is ready.

@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 13, 2020

Nice catch. One last fix and I think this is ready.

Done. Made the change

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 14, 2020

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@vrakesh
Copy link
Copy Markdown
Member Author

vrakesh commented Mar 15, 2020

Is azure pipelines down??

@WarrenWeckesser
Copy link
Copy Markdown
Member

@vrakesh, yes. Here's a comment by a github staff member Friday evening: #15747 (comment)
As far as I know, the problem still exists.

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 15, 2020

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mattip mattip merged commit cd3b44d into numpy:master Mar 15, 2020
@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 15, 2020

Thanks @vrakesh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Check for buggy Accelerate on MacOS

5 participants