fix: add POSIX functionality for Linux compilation by valentynbez · Pull Request #51 · ggml-org/llama.cpp

valentynbez · 2023-03-12T14:42:43Z

Small fix to compile binaries properly on Linux:

defines CLOCK_MONOTONIC in ggml.c
Closes error: 'CLOCK_MONOTONIC' undeclared #54

prusnak · 2023-03-12T15:04:25Z

Can you add a short comment why this is required? (I assume some functions are not recognized but not obvious which ones)

valentynbez · 2023-03-12T18:27:46Z

Sure!

prusnak · 2023-03-12T18:34:46Z

Sure!

I meant into the source code just above the #define

Btw the issue #54 mentions _POSIX_C_SOURCE=199309L; why did you chose newer standard and thus less compatibility?

prusnak · 2023-03-13T12:55:08Z

Small fix to compile binaries properly on Linux:

Which OS and which GCC version do you use?

For me the master compiles just fine on Debian 11 with GCC 10.2.1 (even without the proposed define).

valentynbez · 2023-03-13T14:07:43Z

CentOS7 with GCC 10.2.0

ggerganov · 2023-03-13T16:46:15Z

This flags has been discussed in whisper.cpp too and I still don't know when it should be added and when not.
Hopefully someone can clarify.

ggml-org/whisper.cpp#37
ggml-org/whisper.cpp#576

prusnak · 2023-03-18T10:40:24Z

Now thinking more about this, probably the cleanest option is to add compilation flags to the build system like suggested in ggml-org/whisper.cpp#37 + ggml-org/whisper.cpp#576

CFLAGS += -D_POSIX_SOURCE -D_GNU_SOURCE
CXXFLAGS += -D_POSIX_SOURCE -D_GNU_SOURCE

We can probably set this for all compilers on all OSes, because either a compiler understands the flag and sets the value to the supported level or the flag is ignored.

@valentynbez can you confirm this fixed the issue for you on CentOS 7?

* NEON Flash Attention: add support for Q8_0, Q4_0, Q4_1 * NEON Flash Attention: quantized K*Q for q4_0 I could finally take advantage of the matrix multiplication templates. We get quite a bit of speedup that way for q4_0: For Gemma-2b using mul_mat_qX_0_q8_0<DequantizerQ40, q_step> results in PP-2048 = 287 t/s vs 268 t/s when converting the q4_0 k-cache and Q to fp16 and using fp16 multiplication. * NEON Flash Attention: quantized K*Q for q4_1 * NEON Flash Attention: quantized K*Q for q8_0 This makes quite a bit of difference: For Gemma2-2b PP-8192 is 228 t/s with quantized K*Q vs 178 t/s when converting things to fp16 and using fp16 matrix multiplication. We have PP-512 = 307 t/s, so PP-8192 is now ~75% of the performance of PP-512. In contrast, llama.cpp with Q8_0 cache is 38% of PP-512. * Zen4 Flash Attention: quantized K*Q for q4_0, q4_1, q8_0 * AVX2 Flash Attention: quantized K*Q for q4_0, q4_1, q8_0 * Tidy up FlashMS * Delete no longer used stuff With the usage of quantized matrix multiplications for quantized k- and/or v-cache, we no longer need the helper methods loading entire rows. * Disallow mixing bf16 with other types for kv caches --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

- turbo4 K+V results on Qwen3.5-27B (-0.32% vs q8_0) and Qwen3-14B (+6.3%) - Sparse V dequant benchmarks: MoE native dequant +10.9% at 8K - Gemma-3 turbo3 results post-iSWA fix (+3.3%) - KVLinC no-K-rotation negative result - Speculative decoding negative result - CUDA 13.2 compatibility verified - Experiments TheTom#31, TheTom#39, TheTom#42, TheTom#45, ggml-org#49, ggml-org#50, ggml-org#51 status updates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: add POSIX functionality for Linux compilation

526ddc0

fix: older standard for compatibility

63a8b00

ggerganov added the help wanted Needs help from the community label Mar 14, 2023

gjmulder added the build Compilation issues label Mar 20, 2023

ggerganov approved these changes Mar 22, 2023

View reviewed changes

ggerganov merged commit 9794052 into ggml-org:master Mar 22, 2023

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

hapm mentioned this pull request Feb 24, 2026

glm-ocr model crashes with GGML_ASSERT – rope dimension metadata missing? ollama/ollama#14401

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add POSIX functionality for Linux compilation#51

fix: add POSIX functionality for Linux compilation#51
ggerganov merged 2 commits intoggml-org:masterfrom
valentynbez:master

valentynbez commented Mar 12, 2023 •

edited

Loading

Uh oh!

prusnak commented Mar 12, 2023

Uh oh!

valentynbez commented Mar 12, 2023

Uh oh!

prusnak commented Mar 12, 2023 •

edited

Loading

Uh oh!

prusnak commented Mar 13, 2023 •

edited

Loading

Uh oh!

valentynbez commented Mar 13, 2023

Uh oh!

ggerganov commented Mar 13, 2023

Uh oh!

prusnak commented Mar 18, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

valentynbez commented Mar 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prusnak commented Mar 12, 2023

Uh oh!

valentynbez commented Mar 12, 2023

Uh oh!

prusnak commented Mar 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prusnak commented Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valentynbez commented Mar 13, 2023

Uh oh!

ggerganov commented Mar 13, 2023

Uh oh!

prusnak commented Mar 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

valentynbez commented Mar 12, 2023 •

edited

Loading

prusnak commented Mar 12, 2023 •

edited

Loading

prusnak commented Mar 13, 2023 •

edited

Loading

prusnak commented Mar 18, 2023 •

edited

Loading