You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+16-1Lines changed: 16 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,6 +58,21 @@ To run the extreme context benchmark suite on your device, execute:
58
58
bash tests/run_extreme_context.sh <model-id>
59
59
```
60
60
61
+
### Prompt Cache & Sliding Window Regression Test
62
+
To verify the stability of the prompt cache when interleaving long contexts with sliding window attention (e.g. Gemma 4/Mistral 3), run this extreme test sequence:
63
+
```bash
64
+
# 1. Start the server with a large sliding-window MoE model
The stabilization of the Gemma 4 inference engine on Apple Silicon is fully chronicled in our [Development Journal](journal.md.resolved).
230
+
The stabilization of the Gemma 4 inference engine on Apple Silicon is fully chronicled in our [Development Journal](journal.md).
216
231
217
232
**The "2+2=4" Aha Moment**: During development, we encountered a severe "silent failure" where the model would successfully load and evaluate all 32 layers at high speed, but generate nothing but infinite whitespace. The model logits showed the correct *shape* but the wrong *magnitudes*.
0 commit comments