Skip to content

Add section to README on how to run the project on Android#130

Merged
ggerganov merged 1 commit intoggml-org:masterfrom
rgerganov:android
Mar 14, 2023
Merged

Add section to README on how to run the project on Android#130
ggerganov merged 1 commit intoggml-org:masterfrom
rgerganov:android

Conversation

@rgerganov
Copy link
Collaborator

No description provided.

@ggerganov ggerganov merged commit 60f819a into ggml-org:master Mar 14, 2023
rooprob pushed a commit to rooprob/llama.cpp that referenced this pull request Aug 2, 2023
SamuelOliveirads pushed a commit to SamuelOliveirads/llama.cpp that referenced this pull request Dec 29, 2025
* Adding q6_k_r4

* q6_k_r4: 1st functional AVX2 version

* q6_k_r4: AVX2 and simple Zen4

"Simple" as in processing 4 instead of 8 rows at once.
On Zen4 we get PP-512(LLaMA-3.1-8B) = 238.3 t/s vs
195.2 t/s for Q6_K. TG-128 @ 1 thread is 7.94 t/s
vs 5.38 t/s for Q6_K.

* q6_k_r4: 1st NEON version

PP-512(LLaMA-3.1-8B) = 78 t/s vs 57.6 t/s for q6_K.
TG-128 is slightly lower rthan q6_K for low number of threads,
becomes very slightly better at 8 threads.

* q6_k_r4: slightly faster NEON

PP-512(LLaMA-3.1-8B) = 83.25 t/s

* q6_k_r4: slightly faster Zen4

238.3 t/s -> 243.2 t/s

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants