Skip to content

fix: add POSIX functionality for Linux compilation#51

Merged
ggerganov merged 2 commits intoggml-org:masterfrom
valentynbez:master
Mar 22, 2023
Merged

fix: add POSIX functionality for Linux compilation#51
ggerganov merged 2 commits intoggml-org:masterfrom
valentynbez:master

Conversation

@valentynbez
Copy link
Copy Markdown
Contributor

@valentynbez valentynbez commented Mar 12, 2023

Small fix to compile binaries properly on Linux:

@prusnak
Copy link
Copy Markdown
Contributor

prusnak commented Mar 12, 2023

Can you add a short comment why this is required? (I assume some functions are not recognized but not obvious which ones)

@valentynbez
Copy link
Copy Markdown
Contributor Author

Sure!

@prusnak
Copy link
Copy Markdown
Contributor

prusnak commented Mar 12, 2023

Sure!

I meant into the source code just above the #define

Btw the issue #54 mentions _POSIX_C_SOURCE=199309L; why did you chose newer standard and thus less compatibility?

@prusnak
Copy link
Copy Markdown
Contributor

prusnak commented Mar 13, 2023

Small fix to compile binaries properly on Linux:

Which OS and which GCC version do you use?

For me the master compiles just fine on Debian 11 with GCC 10.2.1 (even without the proposed define).

@valentynbez
Copy link
Copy Markdown
Contributor Author

CentOS7 with GCC 10.2.0

@ggerganov
Copy link
Copy Markdown
Member

This flags has been discussed in whisper.cpp too and I still don't know when it should be added and when not.
Hopefully someone can clarify.

ggml-org/whisper.cpp#37
ggml-org/whisper.cpp#576

@ggerganov ggerganov added the help wanted Needs help from the community label Mar 14, 2023
@prusnak
Copy link
Copy Markdown
Contributor

prusnak commented Mar 18, 2023

Now thinking more about this, probably the cleanest option is to add compilation flags to the build system like suggested in ggml-org/whisper.cpp#37 + ggml-org/whisper.cpp#576

CFLAGS += -D_POSIX_SOURCE -D_GNU_SOURCE
CXXFLAGS += -D_POSIX_SOURCE -D_GNU_SOURCE

We can probably set this for all compilers on all OSes, because either a compiler understands the flag and sets the value to the supported level or the flag is ignored.

@valentynbez can you confirm this fixed the issue for you on CentOS 7?

@gjmulder gjmulder added the build Compilation issues label Mar 20, 2023
@ggerganov ggerganov merged commit 9794052 into ggml-org:master Mar 22, 2023
SamuelOliveirads pushed a commit to SamuelOliveirads/llama.cpp that referenced this pull request Dec 29, 2025
* NEON Flash Attention: add support for Q8_0, Q4_0, Q4_1

* NEON Flash Attention: quantized K*Q for q4_0

I could finally take advantage of the matrix multiplication
templates. We get quite a bit of speedup that way for q4_0:
For Gemma-2b using mul_mat_qX_0_q8_0<DequantizerQ40, q_step>
results in PP-2048 = 287 t/s vs 268 t/s when converting the
q4_0 k-cache and Q to fp16 and using fp16 multiplication.

* NEON Flash Attention: quantized K*Q for q4_1

* NEON Flash Attention: quantized K*Q for q8_0

This makes quite a bit of difference:
For Gemma2-2b PP-8192 is 228 t/s with quantized K*Q vs
178 t/s when converting things to fp16 and using fp16
matrix multiplication.
We have PP-512 = 307 t/s, so PP-8192 is now ~75% of the
performance of PP-512. In contrast, llama.cpp with Q8_0
cache is 38% of PP-512.

* Zen4 Flash Attention: quantized K*Q for q4_0, q4_1, q8_0

* AVX2 Flash Attention: quantized K*Q for q4_0, q4_1, q8_0

* Tidy up FlashMS

* Delete no longer used stuff

With the usage of quantized matrix multiplications for
quantized k- and/or v-cache, we no longer need the
helper methods loading entire rows.

* Disallow mixing bf16 with other types for kv caches

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
spiritbuun added a commit to spiritbuun/llama-cpp-turboquant-cuda that referenced this pull request Mar 27, 2026
- turbo4 K+V results on Qwen3.5-27B (-0.32% vs q8_0) and Qwen3-14B (+6.3%)
- Sparse V dequant benchmarks: MoE native dequant +10.9% at 8K
- Gemma-3 turbo3 results post-iSWA fix (+3.3%)
- KVLinC no-K-rotation negative result
- Speculative decoding negative result
- CUDA 13.2 compatibility verified
- Experiments TheTom#31, TheTom#39, TheTom#42, TheTom#45, ggml-org#49, ggml-org#50, ggml-org#51 status updates

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Compilation issues help wanted Needs help from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

error: 'CLOCK_MONOTONIC' undeclared

4 participants