Experiment: Layer-adaptive turbo3 (LA-1) on Metal

## Hypothesis
First 4 + last 4 layers at q8_0, rest turbo3, improves PPL while maintaining compression on Apple Silicon.

## Background
buun validated on CUDA (RTX 3090): PPL 5.769 which is **1.17% better than q8_0**. 3.5x compression. 99.6% prefill speed, 97.5% decode speed. Recommended config for contexts up to 65K.

## What to test
- `TURBO_LAYER_ADAPTIVE=1` env var (if implemented) or manual per-layer cache type selection
- PPL comparison vs uniform turbo3 and q8_0
- Decode speed at 2K, 8K, 32K
- Prefill speed

## Expected outcome
Similar PPL improvement on Metal. No kernel changes needed — just config.

## Priority
High — zero code changes, highest expected value.

## Source
AutoRepl: cexp_55ab69 (buun, fork_dc582a)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: Layer-adaptive turbo3 (LA-1) on Metal #7

Hypothesis

Background

What to test

Expected outcome

Priority

Source

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Experiment: Layer-adaptive turbo3 (LA-1) on Metal #7

Description

Hypothesis

Background

What to test

Expected outcome

Priority

Source

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions