Skip to content

Add LICENSE#21

Merged
ggerganov merged 1 commit intoggml-org:masterfrom
prusnak:license
Mar 12, 2023
Merged

Add LICENSE#21
ggerganov merged 1 commit intoggml-org:masterfrom
prusnak:license

Conversation

@prusnak
Copy link
Copy Markdown
Contributor

@prusnak prusnak commented Mar 11, 2023

This repository is missing a LICENSE file. I assume it should be MIT, the same license as used in https://github.com/ggerganov/whisper.cpp and https://github.com/ggerganov/ggml, so I went ahead and created this PR.

If the plan is not to release the source code under the MIT license, feel free to close the PR.

@ggerganov ggerganov merged commit 6a9a67f into ggml-org:master Mar 12, 2023
@prusnak prusnak deleted the license branch March 12, 2023 08:53
@DDBE12
Copy link
Copy Markdown

DDBE12 commented Jul 27, 2023

This is a mere fork or portation of this: https://github.com/facebookresearch/llama The original is made and owned by Facebook/Meta and has a proprietary Meta license: https://github.com/facebookresearch/llama/blob/main/LICENSE On the one hand, I'm not even sure if that's a legal way of licensing to slap an MIT license on a simple portation of somebody else's code.

On the other hand, being made and owned by Meta, it's no surprise that Replicate, a framework to run Llama locally, states in their very own TOS they intend on stealing, publishing, and probably selling to third parties every single bit of content their users are giving them, just like Facebook/Meta does: https://replicate.com/terms In Meta's own Llama license linked above, they even provide for lawsuits against Llama and Meta for stealing user content, by threatening to immediately terminate user accounts for trying to get the rights to one's property back, while Replicate, Llama, and Meta themselves are keeping the stolen user content.

@prusnak
Copy link
Copy Markdown
Contributor Author

prusnak commented Jul 27, 2023

This is a mere fork or portation of this

You argument is invalid because:

  1. this is not a fork, because the code is written from scratch, not reusing any of the original code
  2. software licenses do not recognize such thing as "portation"
  3. when the llama.cpp project was conceived, the LLaMA license was still an open-source license (GPLv3) - see https://github.com/facebookresearch/llama/blob/llama_v1/LICENSE - if this was a fork, the llama.cpp would also need to have the same license, but since it is not (see point 1), it is fine to have MIT here as well

@DDBE12
Copy link
Copy Markdown

DDBE12 commented Jul 27, 2023

"when the llama.cpp project was conceived, the LLaMA license was still an open-source license (GPLv3) - see https://github.com/facebookresearch/llama/blob/llama_v1/LICENSE - if this was a fork, the llama.cpp would also need to have the same license, but since it is not (see point 1), it is fine to have MIT here as well"

The only reason this fork or portation now has an MIT license is because you as an outsider literally "assumed" that it should have an MIT license when you didn't even know this is originally Meta code ported to C++, just see your very own OP above,

JJJYmmm pushed a commit to JJJYmmm/llama.cpp that referenced this pull request Oct 22, 2025
SamuelOliveirads pushed a commit to SamuelOliveirads/llama.cpp that referenced this pull request Dec 29, 2025
)

This allows for a better comparison between different models
or different tensors of the same model where the magnitude of
the model weights may differ.

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Aishor added a commit to Aishor/llama.cpp-chaman that referenced this pull request Jan 9, 2026
julien-c pushed a commit to julien-c/llama.cpp that referenced this pull request Mar 17, 2026
agent : show command/path details in tool execution display
TheTom referenced this pull request in TheTom/llama-cpp-turboquant Mar 25, 2026


Added Metal shader implementations:
- quantize_turbo3_0 / quantize_turbo4_0 (per-block quantization)
- dequantize_turbo3_0 / dequantize_turbo4_0 (type4x4 and type4 variants)
- kernel_set_rows_turbo template (128-element block size)
- Flash attention instantiations for all dk/dv variants

Added TURBO3_0/TURBO4_0 to Metal device SET_ROWS validation.

Builds successfully. Testing with Qwen 3.5 35B-A3B MoE on M5 Max.

Note: Initial version uses simplified quantization (no rotation matrix)
for Metal compatibility. Full rotation requires custom kernel with extra
buffer bindings — tracked for follow-up.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TheTom referenced this pull request in TheTom/llama-cpp-turboquant Mar 25, 2026
…g#21

Embedded pre-computed 128×128 rotation and QJL matrices (256KB constant
memory) directly in the Metal shader. Both quantize and dequantize now
perform the full TurboQuant algorithm:

Quantize: normalize → rotate → codebook → inverse rotate → residual → QJL
Dequantize: codebook → inverse rotate → QJL correction → rescale

Previous version (no rotation) produced garbage. This should produce
meaningful output since the rotation Gaussianizes the KV distribution.

Note: dequantize does full 128-element rotation per chunk (8× work).
Optimization possible with caching or restructured kernel in follow-up.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TheTom referenced this pull request in TheTom/llama-cpp-turboquant Mar 25, 2026
…ml-org#21

- Inlined turbo-matrices.h directly into ggml-metal.metal (256KB)
  to fix JIT compilation failure with #include
- Added C round-trip test (test-turbo-quant.c):
  turbo3 cosine=0.906, turbo4 cosine=0.966 — matches Python prototype
- Metal library loads successfully ("loaded in 5.9 sec")
- Model runs on Metal but output quality needs debugging
  (Metal quantize/dequantize may have a bug vs the working C version)

C round-trip PROVES the algorithm works in C. Metal shader needs
debugging — likely an issue with the dequantize chunk addressing
or the large constant arrays in thread-local memory.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TheTom referenced this pull request in TheTom/llama-cpp-turboquant Mar 26, 2026


Added Metal shader implementations:
- quantize_turbo3_0 / quantize_turbo4_0 (per-block quantization)
- dequantize_turbo3_0 / dequantize_turbo4_0 (type4x4 and type4 variants)
- kernel_set_rows_turbo template (128-element block size)
- Flash attention instantiations for all dk/dv variants

Added TURBO3_0/TURBO4_0 to Metal device SET_ROWS validation.

Builds successfully. Testing with Qwen 3.5 35B-A3B MoE on M5 Max.

Note: Initial version uses simplified quantization (no rotation matrix)
for Metal compatibility. Full rotation requires custom kernel with extra
buffer bindings — tracked for follow-up.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TheTom referenced this pull request in TheTom/llama-cpp-turboquant Mar 26, 2026
…g#21

Embedded pre-computed 128×128 rotation and QJL matrices (256KB constant
memory) directly in the Metal shader. Both quantize and dequantize now
perform the full TurboQuant algorithm:

Quantize: normalize → rotate → codebook → inverse rotate → residual → QJL
Dequantize: codebook → inverse rotate → QJL correction → rescale

Previous version (no rotation) produced garbage. This should produce
meaningful output since the rotation Gaussianizes the KV distribution.

Note: dequantize does full 128-element rotation per chunk (8× work).
Optimization possible with caching or restructured kernel in follow-up.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TheTom referenced this pull request in TheTom/llama-cpp-turboquant Mar 26, 2026
…ml-org#21

- Inlined turbo-matrices.h directly into ggml-metal.metal (256KB)
  to fix JIT compilation failure with #include
- Added C round-trip test (test-turbo-quant.c):
  turbo3 cosine=0.906, turbo4 cosine=0.966 — matches Python prototype
- Metal library loads successfully ("loaded in 5.9 sec")
- Model runs on Metal but output quality needs debugging
  (Metal quantize/dequantize may have a bug vs the working C version)

C round-trip PROVES the algorithm works in C. Metal shader needs
debugging — likely an issue with the dequantize chunk addressing
or the large constant arrays in thread-local memory.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
didlawowo pushed a commit to didlawowo/llama.cpp that referenced this pull request Mar 27, 2026


Added Metal shader implementations:
- quantize_turbo3_0 / quantize_turbo4_0 (per-block quantization)
- dequantize_turbo3_0 / dequantize_turbo4_0 (type4x4 and type4 variants)
- kernel_set_rows_turbo template (128-element block size)
- Flash attention instantiations for all dk/dv variants

Added TURBO3_0/TURBO4_0 to Metal device SET_ROWS validation.

Builds successfully. Testing with Qwen 3.5 35B-A3B MoE on M5 Max.

Note: Initial version uses simplified quantization (no rotation matrix)
for Metal compatibility. Full rotation requires custom kernel with extra
buffer bindings — tracked for follow-up.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
didlawowo pushed a commit to didlawowo/llama.cpp that referenced this pull request Mar 27, 2026
…g#21

Embedded pre-computed 128×128 rotation and QJL matrices (256KB constant
memory) directly in the Metal shader. Both quantize and dequantize now
perform the full TurboQuant algorithm:

Quantize: normalize → rotate → codebook → inverse rotate → residual → QJL
Dequantize: codebook → inverse rotate → QJL correction → rescale

Previous version (no rotation) produced garbage. This should produce
meaningful output since the rotation Gaussianizes the KV distribution.

Note: dequantize does full 128-element rotation per chunk (8× work).
Optimization possible with caching or restructured kernel in follow-up.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
didlawowo pushed a commit to didlawowo/llama.cpp that referenced this pull request Mar 27, 2026
…ml-org#21

- Inlined turbo-matrices.h directly into ggml-metal.metal (256KB)
  to fix JIT compilation failure with #include
- Added C round-trip test (test-turbo-quant.c):
  turbo3 cosine=0.906, turbo4 cosine=0.966 — matches Python prototype
- Metal library loads successfully ("loaded in 5.9 sec")
- Model runs on Metal but output quality needs debugging
  (Metal quantize/dequantize may have a bug vs the working C version)

C round-trip PROVES the algorithm works in C. Metal shader needs
debugging — likely an issue with the dequantize chunk addressing
or the large constant arrays in thread-local memory.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants