Add LICENSE by prusnak · Pull Request #21 · ggml-org/llama.cpp

prusnak · 2023-03-11T20:33:41Z

This repository is missing a LICENSE file. I assume it should be MIT, the same license as used in https://github.com/ggerganov/whisper.cpp and https://github.com/ggerganov/ggml, so I went ahead and created this PR.

If the plan is not to release the source code under the MIT license, feel free to close the PR.

DDBE12 · 2023-07-27T06:51:31Z

This is a mere fork or portation of this: https://github.com/facebookresearch/llama The original is made and owned by Facebook/Meta and has a proprietary Meta license: https://github.com/facebookresearch/llama/blob/main/LICENSE On the one hand, I'm not even sure if that's a legal way of licensing to slap an MIT license on a simple portation of somebody else's code.

On the other hand, being made and owned by Meta, it's no surprise that Replicate, a framework to run Llama locally, states in their very own TOS they intend on stealing, publishing, and probably selling to third parties every single bit of content their users are giving them, just like Facebook/Meta does: https://replicate.com/terms In Meta's own Llama license linked above, they even provide for lawsuits against Llama and Meta for stealing user content, by threatening to immediately terminate user accounts for trying to get the rights to one's property back, while Replicate, Llama, and Meta themselves are keeping the stolen user content.

prusnak · 2023-07-27T08:02:56Z

This is a mere fork or portation of this

You argument is invalid because:

this is not a fork, because the code is written from scratch, not reusing any of the original code
software licenses do not recognize such thing as "portation"
when the llama.cpp project was conceived, the LLaMA license was still an open-source license (GPLv3) - see https://github.com/facebookresearch/llama/blob/llama_v1/LICENSE - if this was a fork, the llama.cpp would also need to have the same license, but since it is not (see point 1), it is fine to have MIT here as well

DDBE12 · 2023-07-27T13:34:24Z

"when the llama.cpp project was conceived, the LLaMA license was still an open-source license (GPLv3) - see https://github.com/facebookresearch/llama/blob/llama_v1/LICENSE - if this was a fork, the llama.cpp would also need to have the same license, but since it is not (see point 1), it is fine to have MIT here as well"

The only reason this fork or portation now has an MIT license is because you as an outsider literally "assumed" that it should have an MIT license when you didn't even know this is originally Meta code ported to C++, just see your very own OP above,

@yairpatch

Adding support for Qwen3-VL by @yairpatch

) This allows for a better comparison between different models or different tensors of the same model where the magnitude of the model weights may differ. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

…padding

agent : show command/path details in tool execution display

Added Metal shader implementations: - quantize_turbo3_0 / quantize_turbo4_0 (per-block quantization) - dequantize_turbo3_0 / dequantize_turbo4_0 (type4x4 and type4 variants) - kernel_set_rows_turbo template (128-element block size) - Flash attention instantiations for all dk/dv variants Added TURBO3_0/TURBO4_0 to Metal device SET_ROWS validation. Builds successfully. Testing with Qwen 3.5 35B-A3B MoE on M5 Max. Note: Initial version uses simplified quantization (no rotation matrix) for Metal compatibility. Full rotation requires custom kernel with extra buffer bindings — tracked for follow-up. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…g#21 Embedded pre-computed 128×128 rotation and QJL matrices (256KB constant memory) directly in the Metal shader. Both quantize and dequantize now perform the full TurboQuant algorithm: Quantize: normalize → rotate → codebook → inverse rotate → residual → QJL Dequantize: codebook → inverse rotate → QJL correction → rescale Previous version (no rotation) produced garbage. This should produce meaningful output since the rotation Gaussianizes the KV distribution. Note: dequantize does full 128-element rotation per chunk (8× work). Optimization possible with caching or restructured kernel in follow-up. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ml-org#21 - Inlined turbo-matrices.h directly into ggml-metal.metal (256KB) to fix JIT compilation failure with #include - Added C round-trip test (test-turbo-quant.c): turbo3 cosine=0.906, turbo4 cosine=0.966 — matches Python prototype - Metal library loads successfully ("loaded in 5.9 sec") - Model runs on Metal but output quality needs debugging (Metal quantize/dequantize may have a bug vs the working C version) C round-trip PROVES the algorithm works in C. Metal shader needs debugging — likely an issue with the dequantize chunk addressing or the large constant arrays in thread-local memory. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Added Metal shader implementations: - quantize_turbo3_0 / quantize_turbo4_0 (per-block quantization) - dequantize_turbo3_0 / dequantize_turbo4_0 (type4x4 and type4 variants) - kernel_set_rows_turbo template (128-element block size) - Flash attention instantiations for all dk/dv variants Added TURBO3_0/TURBO4_0 to Metal device SET_ROWS validation. Builds successfully. Testing with Qwen 3.5 35B-A3B MoE on M5 Max. Note: Initial version uses simplified quantization (no rotation matrix) for Metal compatibility. Full rotation requires custom kernel with extra buffer bindings — tracked for follow-up. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…g#21 Embedded pre-computed 128×128 rotation and QJL matrices (256KB constant memory) directly in the Metal shader. Both quantize and dequantize now perform the full TurboQuant algorithm: Quantize: normalize → rotate → codebook → inverse rotate → residual → QJL Dequantize: codebook → inverse rotate → QJL correction → rescale Previous version (no rotation) produced garbage. This should produce meaningful output since the rotation Gaussianizes the KV distribution. Note: dequantize does full 128-element rotation per chunk (8× work). Optimization possible with caching or restructured kernel in follow-up. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ml-org#21 - Inlined turbo-matrices.h directly into ggml-metal.metal (256KB) to fix JIT compilation failure with #include - Added C round-trip test (test-turbo-quant.c): turbo3 cosine=0.906, turbo4 cosine=0.966 — matches Python prototype - Metal library loads successfully ("loaded in 5.9 sec") - Model runs on Metal but output quality needs debugging (Metal quantize/dequantize may have a bug vs the working C version) C round-trip PROVES the algorithm works in C. Metal shader needs debugging — likely an issue with the dequantize chunk addressing or the large constant arrays in thread-local memory. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Added Metal shader implementations: - quantize_turbo3_0 / quantize_turbo4_0 (per-block quantization) - dequantize_turbo3_0 / dequantize_turbo4_0 (type4x4 and type4 variants) - kernel_set_rows_turbo template (128-element block size) - Flash attention instantiations for all dk/dv variants Added TURBO3_0/TURBO4_0 to Metal device SET_ROWS validation. Builds successfully. Testing with Qwen 3.5 35B-A3B MoE on M5 Max. Note: Initial version uses simplified quantization (no rotation matrix) for Metal compatibility. Full rotation requires custom kernel with extra buffer bindings — tracked for follow-up. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…g#21 Embedded pre-computed 128×128 rotation and QJL matrices (256KB constant memory) directly in the Metal shader. Both quantize and dequantize now perform the full TurboQuant algorithm: Quantize: normalize → rotate → codebook → inverse rotate → residual → QJL Dequantize: codebook → inverse rotate → QJL correction → rescale Previous version (no rotation) produced garbage. This should produce meaningful output since the rotation Gaussianizes the KV distribution. Note: dequantize does full 128-element rotation per chunk (8× work). Optimization possible with caching or restructured kernel in follow-up. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ml-org#21 - Inlined turbo-matrices.h directly into ggml-metal.metal (256KB) to fix JIT compilation failure with #include - Added C round-trip test (test-turbo-quant.c): turbo3 cosine=0.906, turbo4 cosine=0.966 — matches Python prototype - Metal library loads successfully ("loaded in 5.9 sec") - Model runs on Metal but output quality needs debugging (Metal quantize/dequantize may have a bug vs the working C version) C round-trip PROVES the algorithm works in C. Metal shader needs debugging — likely an issue with the dequantize chunk addressing or the large constant arrays in thread-local memory. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add LICENSE

fc3f02e

ggerganov merged commit 6a9a67f into ggml-org:master Mar 12, 2023

prusnak deleted the license branch March 12, 2023 08:53

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

JJJYmmm pushed a commit to JJJYmmm/llama.cpp that referenced this pull request Oct 22, 2025

Merge pull request ggml-org#21 from yairpatch/master

b065bf3

Adding support for Qwen3-VL by @yairpatch

uttampc1 mentioned this pull request Nov 18, 2025

Throughput improvement for small batch sizes #17342

Open

Aishor added a commit to Aishor/llama.cpp-chaman that referenced this pull request Jan 9, 2026

[CHAMAN-REC-ggml-org#21] Force preserve user n_seq_max intent during …

c7570bd

…padding

sainnhe mentioned this pull request Jan 25, 2026

Eval bug: coredump due to ops of discontinuous tensor memory #19078

Closed

feyleth mentioned this pull request Mar 11, 2026

Eval bug: vision model crash #20418

Open

julien-c pushed a commit to julien-c/llama.cpp that referenced this pull request Mar 17, 2026

Merge pull request ggml-org#21 from gary149/agent-display-tool-details

de9dd1f

agent : show command/path details in tool execution display

rubin55 mentioned this pull request Mar 26, 2026

Eval bug: Unresolved Symbol <__memcpy_chk> when running (any?) model #21041

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LICENSE#21

Add LICENSE#21
ggerganov merged 1 commit intoggml-org:masterfrom
prusnak:license

prusnak commented Mar 11, 2023

Uh oh!

DDBE12 commented Jul 27, 2023 •

edited

Loading

Uh oh!

prusnak commented Jul 27, 2023

Uh oh!

DDBE12 commented Jul 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

prusnak commented Mar 11, 2023

Uh oh!

DDBE12 commented Jul 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prusnak commented Jul 27, 2023

Uh oh!

DDBE12 commented Jul 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DDBE12 commented Jul 27, 2023 •

edited

Loading