AVX2 optimizations for Q5_0, Q5_1 by sw · Pull Request #1195 · ggml-org/llama.cpp

sw · 2023-04-26T19:02:22Z

Note: This isn't for master, but the branch in PR #1187.

This adds AVX2 optimizations for Q5_0 and Q5_1, with the help of another look-up table.

I didn't test ARM NEON, maybe I broke something there. You should probably check if the u8->u32->u8 roundtrip still makes sense.

I'll look into using _mm256_shuffle_epi8, but you may want to merge this to the PR branch as it is.

ggerganov · 2023-04-26T19:26:55Z

ggml.c

+#define B1(c,s,n)  0x ## n ## c ,  0x ## n ## s
+#define B2(c,s,n) B1(c,s,n ## c), B1(c,s,n ## s)
+#define B3(c,s,n) B2(c,s,n ## c), B2(c,s,n ## s)
+#define B4(c,s,n) B3(c,s,n ## c), B3(c,s,n ## s)
+#define B5(c,s,n) B4(c,s,n ## c), B4(c,s,n ## s)
+#define B6(c,s,n) B5(c,s,n ## c), B5(c,s,n ## s)
+#define B7(c,s,n) B6(c,s,n ## c), B6(c,s,n ## s)
+#define B8(c,s  ) B7(c,s,     c), B7(c,s,     s)
+
+// precomputed tables for expanding 8bits to 8 bytes (shl 4)
+static const uint64_t table_b2b_u[1 << 8] = { B8(00, 10) };
+static const uint64_t table_b2b_i[1 << 8] = { B8(F0, 00) };


ggerganov

The ARM NEON results are the same

* ggml : add Q5_0 quantization (cuBLAS only) * ggml : fix Q5_0 qh -> uint32_t * ggml : fix q5_0 histogram stats * ggml : q5_0 scalar dot product * ggml : q5_0 ARM NEON dot * ggml : q5_0 more efficient ARM NEON using uint64_t masks * ggml : rename Q5_0 -> Q5_1 * ggml : adding Q5_0 mode * quantize : add Q5_0 and Q5_1 to map * ggml : AVX2 optimizations for Q5_0, Q5_1 (#1195) --------- Co-authored-by: Stephan Walter <stephan@walter.name>

AVX2 optimizations for Q5_0, Q5_1

33e50f7

sw requested a review from ggerganov April 26, 2023 19:13

ggerganov reviewed Apr 26, 2023

View reviewed changes

ggerganov approved these changes Apr 26, 2023

View reviewed changes

ggerganov merged commit 2bfa1fe into ggml-org:q5_0 Apr 26, 2023

sw deleted the q5-cpp branch April 26, 2023 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AVX2 optimizations for Q5_0, Q5_1#1195

AVX2 optimizations for Q5_0, Q5_1#1195
ggerganov merged 1 commit intoggml-org:q5_0from
sw:q5-cpp

sw commented Apr 26, 2023 •

edited

Loading

Uh oh!

ggerganov Apr 26, 2023

Uh oh!

ggerganov left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sw commented Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov Apr 26, 2023

Choose a reason for hiding this comment

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sw commented Apr 26, 2023 •

edited

Loading