-
Notifications
You must be signed in to change notification settings - Fork 675
Upstream: TurboQuant discussion + contribution requirements for llama.cpp #27
Copy link
Copy link
Open
Labels
P2Normal developmentNormal developmenttype:portPorting to llama.cpp/MLX/etcPorting to llama.cpp/MLX/etc
Description
Upstream llama.cpp Activity
Active discussions
- Discussion #20969: TurboQuant early discussion
- Issue #20977: Feature request (Mar 25). mudler has experimental fork.
Contribution requirements (PR #19762, merged Mar 13)
To upstream a new quant type, must provide:
- GGUF conversion support
- Perplexity vs FP16 comparison
- KL divergence data
- CPU performance baselines
Relevant optimizations to rebase on
- PR #20962: Metal Tensor API — 26% mul_mat improvement
- PR #20609: MXFP flash attention SoA layout pattern
Key lesson from upstream
Custom quant types that aren't in Metal SET_ROWS whitelist silently fall back to CPU.
We hit this bug. Pattern confirmed by MXFP4 experience (PR #20609).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P2Normal developmentNormal developmenttype:portPorting to llama.cpp/MLX/etcPorting to llama.cpp/MLX/etc