Skip to content

DEBUG bus errors on macos wheels#29628

Closed
ogrisel wants to merge 22 commits into
scikit-learn:mainfrom
ogrisel:debug-bus-errors-on-macos-wheels
Closed

DEBUG bus errors on macos wheels#29628
ogrisel wants to merge 22 commits into
scikit-learn:mainfrom
ogrisel:debug-bus-errors-on-macos-wheels

Conversation

@ogrisel

@ogrisel ogrisel commented Aug 6, 2024

Copy link
Copy Markdown
Member

@github-actions

github-actions Bot commented Aug 6, 2024

Copy link
Copy Markdown

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: dcb41d8. Link to the linter CI: here

@ogrisel

ogrisel commented Aug 6, 2024

Copy link
Copy Markdown
Member Author

Victory!

We managed to trigger a crash and collect a core dump file:

https://github.com/scikit-learn/scikit-learn/actions/runs/10267815411/job/28409279851?pr=29628#step:6:2328

Here are the relevant snippets from the Python level and lldb native backtraces:

  Current thread 0x00000001ec854c00 (most recent call first):
    File "/private/var/folders/zn/hj183dg15s713b47j2wlhwzw0000gn/T/cibw-run-rl1ximw0/cp311-macosx_arm64/venv-test-arm64/lib/python3.11/site-packages/numpy/lib/_arraysetops_impl.py", line 356 in _unique1d
    File "/private/var/folders/zn/hj183dg15s713b47j2wlhwzw0000gn/T/cibw-run-rl1ximw0/cp311-macosx_arm64/venv-test-arm64/lib/python3.11/site-packages/numpy/lib/_arraysetops_impl.py", line 289 in unique
    File "/private/var/folders/zn/hj183dg15s713b47j2wlhwzw0000gn/T/cibw-run-rl1ximw0/cp311-macosx_arm64/venv-test-arm64/lib/python3.11/site-packages/numpy/lib/_arraysetops_impl.py", line 1142 in union1d
    File "/private/var/folders/zn/hj183dg15s713b47j2wlhwzw0000gn/T/cibw-run-rl1ximw0/cp311-macosx_arm64/venv-test-arm64/lib/python3.11/site-packages/sklearn/utils/_array_api.py", line 212 in _union1d
    File "/private/var/folders/zn/hj183dg15s713b47j2wlhwzw0000gn/T/cibw-run-rl1ximw0/cp311-macosx_arm64/venv-test-arm64/lib/python3.11/site-packages/sklearn/metrics/_classification.py", line 119 in _check_targets
    File "/private/var/folders/zn/hj183dg15s713b47j2wlhwzw0000gn/T/cibw-run-rl1ximw0/cp311-macosx_arm64/venv-test-arm64/lib/python3.11/site-packages/sklearn/metrics/_classification.py", line 219 in accuracy_score
    File "/private/var/folders/zn/hj183dg15s713b47j2wlhwzw0000gn/T/cibw-run-rl1ximw0/cp311-macosx_arm64/venv-test-arm64/lib/python3.11/site-packages/sklearn/utils/_param_validation.py", line 216 in wrapper
    File "/private/var/folders/zn/hj183dg15s713b47j2wlhwzw0000gn/T/cibw-run-rl1ximw0/cp311-macosx_arm64/venv-test-arm64/lib/python3.11/site-packages/sklearn/base.py", line 764 in score
    File "/private/var/folders/zn/hj183dg15s713b47j2wlhwzw0000gn/T/cibw-run-rl1ximw0/cp311-macosx_arm64/venv-test-arm64/lib/python3.11/site-packages/sklearn/utils/estimator_checks.py", line 1875 in check_pipeline_consistency
   * thread #1
    * frame #0: 0x00000001849e2a60 libsystem_kernel.dylib`__pthread_kill + 8
      frame #1: 0x0000000184a1ac20 libsystem_pthread.dylib`pthread_kill + 288
      frame #2: 0x00000001848f11e0 libsystem_c.dylib`raise + 32
      frame #3: 0x0000000105e0d83c Python`faulthandler_fatal_error + 392
      frame #4: 0x0000000184a4b584 libsystem_platform.dylib`_sigtramp + 56
      frame #5: 0x000000010681654c _multiarray_umath.cpython-311-darwin.so`void hwy::N_NEON::detail::Recurse<(hwy::N_NEON::detail::RecurseMode)0, hwy::N_NEON::Simd<long long, 2ul, 0>, hwy::N_NEON::detail::SharedTraits<hwy::N_NEON::detail::TraitsLane<hwy::N_NEON::detail::OrderAscending<long long>>>, long long>(hwy::N_NEON::Simd<long long, 2ul, 0>, hwy::N_NEON::detail::SharedTraits<hwy::N_NEON::detail::TraitsLane<hwy::N_NEON::detail::OrderAscending<long long>>>, long long*, unsigned long, long long*, unsigned long long*, unsigned long, unsigned long) + 800
      frame #6: 0x000000010681654c _multiarray_umath.cpython-311-darwin.so`void hwy::N_NEON::detail::Recurse<(hwy::N_NEON::detail::RecurseMode)0, hwy::N_NEON::Simd<long long, 2ul, 0>, hwy::N_NEON::detail::SharedTraits<hwy::N_NEON::detail::TraitsLane<hwy::N_NEON::detail::OrderAscending<long long>>>, long long>(hwy::N_NEON::Simd<long long, 2ul, 0>, hwy::N_NEON::detail::SharedTraits<hwy::N_NEON::detail::TraitsLane<hwy::N_NEON::detail::OrderAscending<long long>>>, long long*, unsigned long, long long*, unsigned long long*, unsigned long, unsigned long) + 800
      frame #7: 0x0000000106810644 _multiarray_umath.cpython-311-darwin.so`void np::highway::qsort_simd::QSort_ASIMD<long long>(long long*, long) + 108
      frame #8: 0x0000000106715d24 _multiarray_umath.cpython-311-darwin.so`quicksort_long + 68
      frame #9: 0x00000001066d7184 _multiarray_umath.cpython-311-darwin.so`_new_sortlike + 688
      frame #10: 0x00000001066eab20 _multiarray_umath.cpython-311-darwin.so`array_sort + 572

Maybe it's a SIMD related problem and memory alignment might come into play.

Here is a link to the dumped core file for that crash. I will try to have a look interactively:

https://github.com/scikit-learn/scikit-learn/actions/runs/10267815411/artifacts/1781211741

@ogrisel

ogrisel commented Aug 7, 2024

Copy link
Copy Markdown
Member Author

It seems that numpy dev has fixed the problem as I could trigger the macos arm64 wheel build and have them pass successfully 5 times in a row.

I will trigger some more time just to be sure that our problem is fixed.

@ogrisel

ogrisel commented Aug 7, 2024

Copy link
Copy Markdown
Member Author

Let's close.

@ogrisel ogrisel closed this Aug 7, 2024
@ogrisel ogrisel deleted the debug-bus-errors-on-macos-wheels branch August 7, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant