Skip to content

Add windows to the CI#98

Merged
ggerganov merged 1 commit intoggml-org:masterfrom
etra0:windows-ci
Mar 13, 2023
Merged

Add windows to the CI#98
ggerganov merged 1 commit intoggml-org:masterfrom
etra0:windows-ci

Conversation

@etra0
Copy link
Contributor

@etra0 etra0 commented Mar 13, 2023

No description provided.

@ggerganov ggerganov merged commit 2f700a2 into ggml-org:master Mar 13, 2023
blackhole89 pushed a commit that referenced this pull request Mar 15, 2023
SamuelOliveirads pushed a commit to SamuelOliveirads/llama.cpp that referenced this pull request Dec 29, 2025
Introduces caching of GGML graph to avoid unnecessary full rebuild between each token.
KV cache parameters, which change with each token, are updated directly in cached GGML
graph. Can be disabled with GGML_DISABLE_GRAPH_CACHING environment variable.
SamuelOliveirads pushed a commit to SamuelOliveirads/llama.cpp that referenced this pull request Dec 29, 2025
* Adapting iq2_bn to work without separate scale tensors

Why? It is becoming burdensome to maintain the special Bitnet
conversion in convert_hf_to_gguf.py, so I thnk it is better
to make iq1_bn and iq2_bn just work with the mainline
conversion script (which does not generate scales).

* Adapting iq1_bn to work without separate scale tensors

* Adapting iq2_bn: CUDA dequantize

* Adapting iq2_bn: CUDA works

* Adapting iq1_bn: CUDA works

* Adapting iq1_bn, iq2_bn: NEON

* Adapting iq1_bn, iq2_bn: Metal

Dequantize works, but there is still something wrong
with the dot products.

* WIP

Absoolutely don't see what is wrong with the iq1_bn and iq2_bn
vector dot product kernels.

* Remove iq1_tn and iq2_tn - Part 1

Now that iq1_bn and iq2_bn have per row scales, there is no
reason to also have iq1_tn and iq2_tn.

* Remove iq1_tn and iq2_tn - Part 2

* Bitnet: use the standard llm_build_kv to build self attention

My main motivation was to enable FA. But FA does not work anyway
because head size is 100 for the Botnet ternary models
(and I had forgotten this little detail).

* Revert "Avoid rebuild of GGML graph for each token (ggml-org#98)"

This reverts commit f2d315b.
As far as I can tell, the commit breaks Metal TG.

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
jesusmb1995 pushed a commit to jesusmb1995/llama.cpp that referenced this pull request Feb 20, 2026
* ggml-vulkan: Add TQ2_0 dequantize and mul_mat vec

* ggml-vulkan: Enable coopmat support for Android

* ggml-vulkan: Add mul_mm path for TQ2_0

* SET_ROWS and GET_ROWS has no TQ2_0 support yet.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Vulkan: Fix TQ2_0 mul_mm pipeline

* Add support for microsoft/bitnet-b1.58-2B-4T (HF to GGUF).

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Vulkan: TQ2_0 x Q8_1 MUL_MAT perf improvements

* Vulkan: Add TQ1_0 infra

* Vulkan: Add MUL_MAT_MAT and MUL_MAT_VEC support for TQ1

* Make sure we report the supported ops + datatypes.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

---------

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
Co-authored-by: vineet <vineet.suryan@collabora.com>
Co-authored-by: Marcus Edel <marcus.edel@collabora.com>
Co-authored-by: Italo Nicola <italo.nicola@collabora.com>
jesusmb1995 pushed a commit to jesusmb1995/llama.cpp that referenced this pull request Mar 4, 2026
* ggml-vulkan: Add TQ2_0 dequantize and mul_mat vec

* ggml-vulkan: Enable coopmat support for Android

* ggml-vulkan: Add mul_mm path for TQ2_0

* SET_ROWS and GET_ROWS has no TQ2_0 support yet.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Vulkan: Fix TQ2_0 mul_mm pipeline

* Add support for microsoft/bitnet-b1.58-2B-4T (HF to GGUF).

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

* Vulkan: TQ2_0 x Q8_1 MUL_MAT perf improvements

* Vulkan: Add TQ1_0 infra

* Vulkan: Add MUL_MAT_MAT and MUL_MAT_VEC support for TQ1

* Make sure we report the supported ops + datatypes.

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>

---------

Signed-off-by: Marcus Edel <marcus.edel@collabora.com>
Co-authored-by: vineet <vineet.suryan@collabora.com>
Co-authored-by: Marcus Edel <marcus.edel@collabora.com>
Co-authored-by: Italo Nicola <italo.nicola@collabora.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants