Name and Version
llama-cli --version crashes, see below. Built from git:
git status
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
root@a30e40de2e02:/opt/llama/build/bin# git log
commit 3fc65063d9c356510b86fc2f15ca8aea711bfc47 (grafted, HEAD ->
master, origin/master, origin/HEAD)
Operating systems
Linux
GGML backends
SYCL
Hardware
I have 2x Intel B70 (32GB).
Models
Can't get far enough to try a model when trying to get version or available devices crashes.
Problem description & steps to reproduce
I am attempting to build a container to serve my 2x Intel B70 cards via RPC, so I may spread the load with my other ROCm hosts. To this end, I modified and simplified the SYCL Dockerfile to build llama.cpp with additional RPC support. The build succeeds, and sycl-ls shows my devices, but llama-cli crashes on initialization.
Dockerfile
ARG ONEAPI_VERSION=2025.3.3-0-devel-ubuntu24.04
FROM docker.io/intel/deep-learning-essentials:${ONEAPI_VERSION} as base
ARG intel_arch=bmg_g21
ARG IGC_VERSION=v2.30.1
ARG IGC_VERSION_FULL=2_2.30.1+20950
ARG COMPUTE_RUNTIME_VERSION=26.09.37435.1
ARG COMPUTE_RUNTIME_VERSION_FULL=26.09.37435.1-0
ARG IGDGMM_VERSION=22.9.0
RUN --mount=type=cache,destination=/tmp/neo \
cd /tmp/neo && wget -c \
https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-core-${IGC_VERSION_FULL}_amd64.deb \
https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-opencl-${IGC_VERSION_FULL}_amd64.deb \
https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/intel-ocloc-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/intel-ocloc_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/intel-opencl-icd-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/intel-opencl-icd_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/libigdgmm12_${IGDGMM_VERSION}_amd64.deb \
https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/libze-intel-gpu1-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/libze-intel-gpu1_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& dpkg --install *.deb
FROM base as build
RUN --mount=type=cache,destination=/var/lib/apt \
--mount=type=cache,destination=/var/cache/apt \
apt-get update \
&& apt-get dist-upgrade -y \
&& apt-get install -y \
ccache \
git \
libgomp1 \
libssl-dev \
ninja-build
ARG CCACHE_DIR=/var/cache/ccache
ARG CFLAGS="${CFLAGS} -O3"
ARG CXXFLAGS="${CFLAGS} -O3"
ARG rebuild=''
ARG branch=master
RUN git clone --depth=1 --recurse-submodules --branch=${branch:-master} \
https://github.com/ggml-org/llama.cpp /opt/llama
WORKDIR /opt/llama
RUN --mount=type=cache,destination=${CCACHE_DIR} \
bash -c "source /opt/intel/oneapi/setvars.sh --force && \
cmake -B build -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_RPC=ON \
-DGGML_SYCL=ON \
-DGGML_SYCL_DEVICE_ARCH=${intel_arch} \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx \
&& cmake --build build --target rpc-server -j$(nproc) \
&& mkdir -vp /app \
&& cp -vrL build/bin/* /app/"
FROM base as app
COPY --from=build /app /app
WORKDIR /app
VOLUME /var/cache/llama
ENV ZES_ENABLE_SYSMAN=1
ENV UR_L0_ENABLE_RELAXED_ALLOCATION_LIMITS=1
ENV GGML_RPC_DEBUG=1
ENV LLAMA_CACHE=/var/cache/llama
ENV ONEAPI_DEVICE_SELECTOR="level_zero:0"
ENTRYPOINT ["/app/rpc-server"]
CMD ["--host", "0.0.0.0", "--cache"]
EXPOSE 50052
First Bad Commit
Unknown.
Relevant log output
Logs
sycl-ls output:
INFO: Output filtered by ONEAPI_DEVICE_SELECTOR environment variable, which is set to level_zero:*.
To see device ids, use the --ignore-device-selectors CLI option.
[level_zero:gpu] Intel(R) oneAPI Unified Runtime over Level-Zero V2, Intel(R) Graphics [0xe223] 20.2.0 [1.14.37435+1]
[level_zero:gpu] Intel(R) oneAPI Unified Runtime over Level-Zero V2, Intel(R) Graphics [0xe223] 20.2.0 [1.14.37435+1]
llama-cli --list-devices output:
/opt/llama/build/bin/libggml-base.so.0(+0x15ae8)[0x7fc30ac16ae8]
/opt/llama/build/bin/libggml-base.so.0(ggml_print_backtrace+0x285)[0x7fc30ac16ac5]
/opt/llama/build/bin/libggml-base.so.0(+0x2eee6)[0x7fc30ac2fee6]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb0da)[0x7fc30aa3e0da]
/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt10unexpectedv+0x0)[0x7fc30aa28a55]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb391)[0x7fc30aa3e391]
/opt/intel/oneapi/compiler/2025.3/lib/libsycl.so.8(+0x102557)[0x7fc302887557]
/opt/intel/oneapi/compiler/2025.3/lib/libsycl.so.8(+0x1f0e8b)[0x7fc302975e8b]
/opt/intel/oneapi/compiler/2025.3/lib/libsycl.so.8(+0x32f783)[0x7fc302ab4783]
/opt/intel/oneapi/compiler/2025.3/lib/libsycl.so.8(_ZN4sycl3_V17contextC2ERKSt6vectorINS0_6deviceESaIS3_EESt8functionIFvNS0_14exception_listEEERKNS0_13property_listE+0x5df)[0x7fc302ab37cf]
/opt/intel/oneapi/compiler/2025.3/lib/libsycl.so.8(_ZN4sycl3_V17contextC2ERKSt6vectorINS0_6deviceESaIS3_EERKNS0_13property_listE+0x44)[0x7fc302ab3164]
/opt/intel/oneapi/compiler/2025.3/lib/libsycl.so.8(+0x26a508)[0x7fc3029ef508]
/opt/intel/oneapi/compiler/2025.3/lib/libsycl.so.8(+0x21b25d)[0x7fc3029a025d]
/opt/intel/oneapi/compiler/2025.3/lib/libsycl.so.8(+0x392b73)[0x7fc302b17b73]
/opt/intel/oneapi/compiler/2025.3/lib/libsycl.so.8(_ZN4sycl3_V15queueC1ERKNS0_6deviceERKSt8functionIFvNS0_14exception_listEEERKNS0_13property_listE+0x3d)[0x7fc302b130dd]
/opt/llama/build/bin/libggml-sycl.so.0(_ZN4sycl3_V15queueC2ERKNS0_13property_listE+0x8b)[0x7fc30ae62d5b]
/opt/llama/build/bin/libggml-sycl.so.0(_ZSt11make_sharedIN4dpct10device_extEJRN4sycl3_V16deviceEEESt10shared_ptrINSt9enable_ifIXntsr8is_arrayIT_EE5valueES8_E4typeEEDpOT0_+0x98)[0x7fc30ae61208]
/opt/llama/build/bin/libggml-sycl.so.0(_ZN4dpct7dev_mgrC2Ev+0x10d)[0x7fc30ae5f08d]
/opt/llama/build/bin/libggml-sycl.so.0(+0x12b560)[0x7fc30ae2c560]
/opt/llama/build/bin/libggml-sycl.so.0(ggml_backend_sycl_reg+0x543)[0x7fc30ae2ef53]
/opt/llama/build/bin/libggml.so.0(_ZN21ggml_backend_registryC2Ev+0x1d)[0x7fc30d82588d]
/opt/llama/build/bin/libggml.so.0(+0x5ae2)[0x7fc30d823ae2]
/opt/llama/build/bin/libggml.so.0(ggml_backend_load_all_from_path+0x6d)[0x7fc30d82228d]
llama-cli[0x52baa4]
llama-cli[0x543795]
llama-cli[0x42df78]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7fc30a6821ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7fc30a68228b]
llama-cli[0x42de45]
terminate called after throwing an instance of 'sycl::_V1::exception'
what(): level_zero backend failed with error: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
Name and Version
llama-cli --versioncrashes, see below. Built from git:Operating systems
Linux
GGML backends
SYCL
Hardware
I have 2x Intel B70 (32GB).
Models
Can't get far enough to try a model when trying to get version or available devices crashes.
Problem description & steps to reproduce
I am attempting to build a container to serve my 2x Intel B70 cards via RPC, so I may spread the load with my other ROCm hosts. To this end, I modified and simplified the SYCL Dockerfile to build llama.cpp with additional RPC support. The build succeeds, and
sycl-lsshows my devices, butllama-clicrashes on initialization.Dockerfile
First Bad Commit
Unknown.
Relevant log output
Logs
sycl-lsoutput:llama-cli --list-devicesoutput: