Skip to content

Upstream: TurboQuant discussion + contribution requirements for llama.cpp #27

@TheTom

Description

@TheTom

Upstream llama.cpp Activity

Active discussions

Contribution requirements (PR #19762, merged Mar 13)

To upstream a new quant type, must provide:

  • GGUF conversion support
  • Perplexity vs FP16 comparison
  • KL divergence data
  • CPU performance baselines

Relevant optimizations to rebase on

  • PR #20962: Metal Tensor API — 26% mul_mat improvement
  • PR #20609: MXFP flash attention SoA layout pattern

Key lesson from upstream

Custom quant types that aren't in Metal SET_ROWS whitelist silently fall back to CPU.
We hit this bug. Pattern confirmed by MXFP4 experience (PR #20609).

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal developmenttype:portPorting to llama.cpp/MLX/etc

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions