Skip to content

EXC_BAD_ACCESS when using GemmImplUsingEigen in MacOS #98002

@FrancescAlted

Description

@FrancescAlted

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes (main branch, as I am compiling from sources)

Source

source

TensorFlow version

2.19.0

Custom code

No

OS platform and distribution

MacOS 10.3 and later

Mobile device

No response

Python version

3.10, 3.12

Bazel version

Using CMake 3.5

GCC/compiler version

Default clang/gcc in xcode

CUDA/cuDNN version

None

GPU model and memory

None

Current behavior?

When compiling a shared library including tensorflow (2.19.0) for using in my library (https://github.com/ironArray/Blosc2-Btune), the code crashes with EXC_BAD_ACCESS. The same library works well with Linux and Windows builds (see https://github.com/ironArray/Blosc2-Btune/actions/runs/16655330101/job/47138750245); the only platform that fails is MacOS.

The same code used to work past year with tensorflow 2.14.0 (https://pypi.org/project/blosc2-btune/#files). I have tried using the same tensorflow 2.14.0 now, but the current toolchain gives the same crash. I have used lldb to get a backtrace:

> BLOSC_TRACE=1 BTUNE_TRACE=1 lldb python btune_config.py                               (btune)
(lldb) target create "python"
Current executable set to '/Users/faltet/miniforge3/envs/btune/bin/python' (arm64).
(lldb) settings set -- target.run-args  "btune_config.py"
(lldb) run
Process 69210 launched: '/Users/faltet/miniforge3/envs/btune/bin/python' (arm64)
[info] - Failed to load libblosc2_btune.so directly, error: dlopen(libblosc2_btune.so, 0x0001): tried: 'libblosc2_btune.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibblosc2_btune.so' (no such file), '/Users/faltet/miniforge3/envs/btune/bin/../lib/libblosc2_btune.so' (no such file), '/usr/lib/libblosc2_btune.so' (no such file, not in dyld cache), 'libblosc2_btune.so' (no such file)
 (/var/folders/y6/nj790rtn62lfktb1sh__79hc0000gn/T/tmp7mu28tbt/build/_deps/blosc2-src/blosc/blosc-private.h:285)
[info] - Trying to get plugin path with python
 (/var/folders/y6/nj790rtn62lfktb1sh__79hc0000gn/T/tmp7mu28tbt/build/_deps/blosc2-src/blosc/blosc-private.h:250)
[info] - Successfully loaded library with Python path: /Users/faltet/miniforge3/envs/btune/lib/python3.12/site-packages/blosc2_btune/libblosc2_btune.so
 (/var/folders/y6/nj790rtn62lfktb1sh__79hc0000gn/T/tmp7mu28tbt/build/_deps/blosc2-src/blosc/blosc-private.h:304)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Btune version: 1.2.1.dev0
Performance Mode: DECOMP, Compression tradeoff: 0.300000, Bandwidth: 20 GB/s
Behaviour: Waits - 0, Softs - 5, Hards - 10, Repeat Mode - STOP
TRACE: time load model: 0.006913
Process 69210 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1006ef218)
    frame #0: 0x00000001034b006c libblosc2_btune.so`___lldb_unnamed_symbol4990 + 1356
libblosc2_btune.so`___lldb_unnamed_symbol4990:
->  0x1034b006c <+1356>: ldr    s18, [x3]
    0x1034b0070 <+1360>: fmul   s18, s17, s18
    0x1034b0074 <+1364>: fadd   s16, s16, s18
    0x1034b0078 <+1368>: ldr    s18, [x25]
Target 0: (python) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1006ef218)
  * frame #0: 0x00000001034b006c libblosc2_btune.so`___lldb_unnamed_symbol4990 + 1356
    frame #1: 0x00000001034afa98 libblosc2_btune.so`___lldb_unnamed_symbol4989 + 196
    frame #2: 0x00000001034af05c libblosc2_btune.so`tflite::cpu_backend_gemm::detail::GemmImplUsingEigen::Run(tflite::cpu_backend_gemm::MatrixParams<float> const&, float const*, tflite::cpu_backend_gemm::MatrixParams<float> const&, float const*, tflite::cpu_backend_gemm::MatrixParams<float> const&, float*, tflite::cpu_backend_gemm::GemmParams<float, float, (tflite::cpu_backend_gemm::QuantizationFlavor)0> const&, tflite::CpuBackendContext*) + 680
    frame #3: 0x000000010350a7fc libblosc2_btune.so`___lldb_unnamed_symbol5440 + 496
    frame #4: 0x0000000103508e24 libblosc2_btune.so`___lldb_unnamed_symbol5435 + 504
    frame #5: 0x00000001034ffdb8 libblosc2_btune.so`TfLiteStatus tflite::ops::builtin::fully_connected::Eval<(tflite::ops::builtin::fully_connected::KernelType)1>(TfLiteContext*, TfLiteNode*) + 484
    frame #6: 0x00000001033bd75c libblosc2_btune.so`tflite::Subgraph::InvokeImpl() + 1340
    frame #7: 0x00000001033bd1f8 libblosc2_btune.so`tflite::Subgraph::Invoke() + 20
    frame #8: 0x00000001033af618 libblosc2_btune.so`tflite::impl::Interpreter::Invoke() + 124
    frame #9: 0x000000010328bc00 libblosc2_btune.so`btune_model_inference + 712
    frame #10: 0x000000010328a14c libblosc2_btune.so`btune_next_cparams + 280
    frame #11: 0x0000000101774444 blosc2_ext.cpython-312-darwin.so`___lldb_unnamed_symbol1802 + 620
    frame #12: 0x0000000101773dfc blosc2_ext.cpython-312-darwin.so`___lldb_unnamed_symbol1801 + 112
    frame #13: 0x0000000101796488 blosc2_ext.cpython-312-darwin.so`___lldb_unnamed_symbol2029 + 5436
    frame #14: 0x0000000101794160 blosc2_ext.cpython-312-darwin.so`___lldb_unnamed_symbol2024 + 60
    frame #15: 0x00000001017940e4 blosc2_ext.cpython-312-darwin.so`___lldb_unnamed_symbol2023 + 664
    frame #16: 0x0000000101683648 blosc2_ext.cpython-312-darwin.so`___lldb_unnamed_symbol971 + 868
    frame #17: 0x00000001000780b8 python`_PyVectorcall_Call + 132
    frame #18: 0x00000001001b408c python`_PyEval_EvalFrameDefault + 59672
    frame #19: 0x00000001001a48e8 python`PyEval_EvalCode + 276
    frame #20: 0x000000010021fabc python`run_mod + 228
    frame #21: 0x000000010021f38c python`_PyRun_SimpleFileObject + 1548
    frame #22: 0x000000010021e51c python`_PyRun_AnyFileObject + 264
    frame #23: 0x000000010024d668 python`pymain_run_file + 368
    frame #24: 0x000000010024cfbc python`Py_RunMain + 2452
    frame #25: 0x000000010024e0b0 python`pymain_main + 668
    frame #26: 0x0000000100004650 python`main + 56
    frame #27: 0x0000000187f1ab98 dyld`start + 6076
(lldb)

I am quite convinced that the issue should be weird interaction between modern toolchain (although I have tested with MacOS 10.13 too, with the same result) and tensorflow, but despite my attempts, I cannot see what it is. Thanks in advance for any hint you may provide!

Standalone code to reproduce the issue

There are step-by-step instructions to reproduce the issue in: https://github.com/ironArray/Blosc2-Btune/blob/main/README-DEVELOPERS.md and https://github.com/ironArray/Blosc2-Btune/blob/main/RELEASING.rst

Relevant log output

You can find log of the builds at e.g.: https://github.com/ironArray/Blosc2-Btune/actions/runs/16655330101/job/47138750245

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions