-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Bug: Segmentation Fault during CUDA Initialization with GPU Offloading Enabled #696
Description
Contact Details
No response
What happened?
Segmentation Fault during CUDA Initialization with GPU Offloading Enabled
Description:
When running the binary with GPU offloading enabled (e.g., using -ngl 1), the application crashes with a segmentation fault at address 0x328. Running the binary without GPU support (e.g., using --gpu disable) works correctly. The logs indicate that the crash occurs during CUDA initialization, suggesting a possible null pointer dereference or misconfiguration during the dynamic linking of the CUDA module.
Environment:
- OS: Linux (Debian-based, Cosmopolitan 4.0.2, kernel 6.1.x)
- GPU: NVIDIA A100 (or similar)
- Driver/CUDA: NVIDIA driver version 535.x; CUDA Toolkit version 12.x
- CUDA Installation: Installed in a custom location (configured via environment variables)
- Build System: Cosmocc toolchain with Make
- Model: Qwen2.5-0.5B-Instruct-GGUF (a small model with no expected GPU memory issues)
Steps to Reproduce:
- Set the environment variables.
export CUDA_PATH=<CUSTOM_CUDA_PATH>
export CUDA_HOME=<CUSTOM_CUDA_PATH>
export CUDA_INC_PATH=<CUSTOM_CUDA_PATH>/include
export PATH="<CUSTOM_CUDA_PATH>/bin:$PATH"
export LD_LIBRARY_PATH="<CUSTOM_CUDA_PATH>/lib64:$LD_LIBRARY_PATH"-
Build the project:
make -j8
-
Run the binary with GPU offloading enabled:
./o/llama.cpp/main -m /path/to/model.gguf -ngl 999
The binary crashes with a segmentation fault (see error below). The crash occurs consistently when any GPU offloading is enabled—even a minimal layer count (e.g., -ngl 1) triggers the fault. Running with --gpu disable allows the model to load and operate normally. The crash address (0x328) and early log messages hint at a potential issue in the CUDA initialization code (referenced in llama.cpp/ggml-cuda.cu and llama.cpp/ggml-cuda.h).
Any assistance or direction would be greatly appreciated.
Version
llamafile v0.9.0
What operating system are you seeing the problem on?
No response
Relevant log output
██╗ ██╗ █████╗ ███╗ ███╗ █████╗ ███████╗██╗██╗ ███████╗
██║ ██║ ██╔══██╗████╗ ████║██╔══██╗██╔════╝██║██║ ██╔════╝
██║ ██║ ███████║██╔████╔██║███████║█████╗ ██║██║ █████╗
██║ ██║ ██╔══██║██║╚██╔╝██║██╔══██║██╔══╝ ██║██║ ██╔══╝
███████╗███████╗██║ ██║██║ ╚═╝ ██║██║ ██║██║ ██║███████╗███████╗
╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚══════╝
launching server...
error: Uncaught SIGSEGV (SEGV_MAPERR) at 0x328 on coder-cspiegel-gpu-dc4f9657c-lq6mx pid 50320 tid 50324
./main
No error information
Linux Cosmopolitan 4.0.2 MODE=x86_64; #1 SMP PREEMPT_DYNAMIC Debian 6.1.119-1 (2024-11-22) coder-cspiegel-gpu-dc4f9657c-lq6mx 6.1.0-28-amd64
RAX 0000000000000320 RBX 000000007fcbffe0 RDI 0000000000000001
RCX 00007f9a6f842880 RDX 0000000000000001 RSI 0000000000000000
RBP 00007f9a6bc88940 RSP 00007f9a6bc888f8 RIP 00007f9a6f8d3ae8
R8 0000000005c00000 R9 00007f9a6bc88cf0 R10 0000000000100000
R11 0000000000000001 R12 00007f9a6bc88968 R13 00007f9a6bc88968
R14 0000000000000001 R15 0000000000000000
TLS 00007f9a60482b40
XMM0 00000000000000000000000000000000 XMM8 00000000000000000000000000000000
XMM1 00000000000000000000000000000000 XMM9 ffffffffffffffffffffffffffffffff
XMM2 00000000000000000000000000000190 XMM10 ffffffffffffffffffffffffffffffff
XMM3 00007f9a6bf06e4000007f9a6bf07e10 XMM11 00000000000000000000000000000000
XMM4 000000000000000000007f9a6bf07b70 XMM12 00000000000000000000000000000000
XMM5 00000000000000000000000000000000 XMM13 ffffffffffffffffffffffffffffffff
XMM6 6e6577512f736c65646f6d2f73746365 XMM14 00000000000000000000000000000000
XMM7 6a6f72702f74327369612f617461642f XMM15 00000000000000000000000000000000
cosmoaddr2line /home/htc/cspiegel/repositories/llamafile/o/llama.cpp/main/main.com.dbg 7f9a6f8d3ae8 7f9a6bfd0389 7f9a6bfc5982 7f9a6bfa1c08 7f9a6bfe66db 7f9a6bf1eb7b 7f9a6f842880
7f9a6bc85e40 7f9a6f8d3ae8 NULL+0
7f9a6bc88940 7f9a6bfd0389 NULL+0
7f9a6bc88990 7f9a6bfc5982 NULL+0
7f9a6bc889b0 7f9a6bfa1c08 NULL+0
7f9a6bc889f0 7f9a6bfe66db NULL+0
7f9a6bc88b00 7f9a6bf1eb7b NULL+0
<dangerous frame>
000000400000-000000ae31e0 r-xi- 7052kb
000000ae4000-000003252000 rw-i- 39mb
000003252000-0006fe000000 28gb
0006fe000000-0006fe001000 rw-pa 4096b
0006fe001000-7f9a0f9b9000 128tb
7f9a0f9b9000-7f9a27ffff60 r--s- 390mb
7f9a28000000-7f9a4e934000 617mb
7f9a4e934000-7f9a4eb34000 rw-pa 2048kb
7f9a4eb34000-7f9a4f800000 13mb
7f9a4f800000-7f9a50000000 rw-pa 8192kb
7f9a50000000-7f9a602a3000 259mb
7f9a602a3000-7f9a632a3000 rw-pa 48mb
7f9a632a3000-7f9a6bc77000 138mb
7f9a6bc77000-7f9a6bc78000 ---pa 4096b
7f9a6bc78000-7f9a6bc8c000 rw-pa 80kb
7f9a6bc8c000-7f9a6fa31000 62mb
7f9a6fa31000-7f9a6fa31980 rw-pa 2432b
7f9a6fa32000-7f9a6fa7e000 304kb
7f9a6fa7e000-7f9a6fbb85d0 rw-pa 1257kb
7f9a6fbb9000-7f9a6fcae3c8 r--s- 981kb
7f9a6fcaf000-7f9a6feef000 rw-pa 2304kb
7f9a6feef000-7ffe26769000 399gb
7ffe26769000-7ffe26869000 ---pa 1024kb
7ffe26869000-7ffe27069000 rw-pa 8192kb
# 532'811'776 bytes in 15 mappings