Fix build for Android by rgerganov · Pull Request #125 · ggml-org/llama.cpp

rgerganov · 2023-03-14T09:05:11Z

The project can be built for Android with NDK and CMake like this:

cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI='arm64-v8a' -DANDROID_PLATFORM=android-23 ..

However, vdotq_* intrinsics are not available on Android. Fix this by checking for ANDROID and use the code replaced by commit 84d9015 in this case.

The project can be built for Android with NDK and CMake like this: cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI='arm64-v8a' -DANDROID_PLATFORM=android-23 .. However, vdotq_* intrinsics are not available on Android. Fix this by checking for __ANDROID__ and use the code replaced by commit 84d9015 in this case.

rgerganov · 2023-03-14T09:22:31Z

Turns out this is not needed as long as we have -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod

…ortable Make app server module importable

#21944) * Thread safety per request only * Fix ROPE yarn case * Fix sticky stateful config * Use i4/i8 directly for symmetric quant * Use weightless caching * Add WeightlessCacheAttribute to reduce NPU memory usage * Gelu tanh support (#125) * Imrope support (#126) * fix(openvino): explicit ov::Tensor frees in ggml_backend_openvino_free * add GPU,NPU support in OV Dockerfile * add build-openvino.yml ci * Fix sticky stateful config * add concurrency to ov-gpu ci runs. Move OV CI to build-openvino.yml * fix thread-safety of shared runtime context * rope type abstraction for frontend translations * fix editorconfig --------- Co-authored-by: Mustafa Cavus <mustafa.cavus@intel.com> Co-authored-by: Dan Hoffman <dhoff749@gmail.com> Co-authored-by: Ravi Panchumarthy <ravi.panchumarthy@intel.com>

ggml-org#21944) * Thread safety per request only * Fix ROPE yarn case * Fix sticky stateful config * Use i4/i8 directly for symmetric quant * Use weightless caching * Add WeightlessCacheAttribute to reduce NPU memory usage * Gelu tanh support (ggml-org#125) * Imrope support (ggml-org#126) * fix(openvino): explicit ov::Tensor frees in ggml_backend_openvino_free * add GPU,NPU support in OV Dockerfile * add build-openvino.yml ci * Fix sticky stateful config * add concurrency to ov-gpu ci runs. Move OV CI to build-openvino.yml * fix thread-safety of shared runtime context * rope type abstraction for frontend translations * fix editorconfig --------- Co-authored-by: Mustafa Cavus <mustafa.cavus@intel.com> Co-authored-by: Dan Hoffman <dhoff749@gmail.com> Co-authored-by: Ravi Panchumarthy <ravi.panchumarthy@intel.com>

* q4_0_r4: 6% faster PP on NEON * qx_0_r4_q8_0 template Applied to q4_0_r4 and q5_0_r4. It makes q5_0_r4 PP ~7% faster. * Apply qx_0_r4_q8_0 template also to q6_0_r4 and iq4_nl_x4 * Simplify * Minor iq4_xs_r4 improvement on NEON --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

ggml-org#21944) * Thread safety per request only * Fix ROPE yarn case * Fix sticky stateful config * Use i4/i8 directly for symmetric quant * Use weightless caching * Add WeightlessCacheAttribute to reduce NPU memory usage * Gelu tanh support (ggml-org#125) * Imrope support (ggml-org#126) * fix(openvino): explicit ov::Tensor frees in ggml_backend_openvino_free * add GPU,NPU support in OV Dockerfile * add build-openvino.yml ci * Fix sticky stateful config * add concurrency to ov-gpu ci runs. Move OV CI to build-openvino.yml * fix thread-safety of shared runtime context * rope type abstraction for frontend translations * fix editorconfig --------- Co-authored-by: Mustafa Cavus <mustafa.cavus@intel.com> Co-authored-by: Dan Hoffman <dhoff749@gmail.com> Co-authored-by: Ravi Panchumarthy <ravi.panchumarthy@intel.com>

… (#71) Measured perplexity on Qwen3.5-0.8B-BF16 / wikitext-2 / ctx=512: | cache-type | PPL | vs f16 | |------------|--------|--------| | f16 | 19.08 | baseline | | q8_0 | 19.08 | lossless | | tbq3_0 | 1252.30 | 65x worse | | tbq4_0 | 1393.00 | 73x worse | TBQ KV-cache produces near-random output. Likely root cause is statistical: TBQ's rotated-domain codebook was calibrated for weight distributions, not the K/V tensor distributions seen during inference. The encoding scheme itself cannot faithfully represent KV values. Snoop-kube's cluster audit confirms zero deployments use tbq* KV-cache (every host uses q8_0 or q4_0). DFlash also defaults to q8_0 (PR #65). No production consumer exists. This PR adds a one-line experimental note to the --cache-type-k/v and --cache-type-k-draft/v-draft help text, referencing issue #70 for the full data + recommendation. Code path stays in place — Markus may have roadmap intent I'm not aware of; this just stops anyone reading --help from assuming tbq* is a usable choice without checking. Follow-ups if Markus prefers full removal: * drop tbq3_0/tbq4_0 from common/arg.cpp's kv_cache_types list * keep the ftypes (TBQ weight quantization is separate from KV use) * close issues ggml-org#124 + ggml-org#125 as wont-fix

rgerganov closed this Mar 14, 2023

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this pull request Dec 19, 2023

Merge pull request ggml-org#125 from Stonelinks/app-server-module-imp…

79ba9ed

…ortable Make app server module importable

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

thedanhoffman pushed a commit to thedanhoffman/llama.cpp that referenced this pull request Apr 14, 2026

Gelu tanh support (ggml-org#125)

77bd354

marksverdhei mentioned this pull request Jun 12, 2026

fix(ggml): TurboQ rotate-block OOB — unbreaks Windows x64 CI heiervang-technologies/ht-llama.cpp#104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix build for Android#125

Fix build for Android#125
rgerganov wants to merge 1 commit into
ggml-org:masterfrom
rgerganov:fix-android

rgerganov commented Mar 14, 2023

Uh oh!

rgerganov commented Mar 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rgerganov commented Mar 14, 2023

Uh oh!

rgerganov commented Mar 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant