Commit 32ce0e2
committed
fix: stabilize Gemma4 MoE inference — dynamic attention mask slicing
- Update mlx-swift-lm submodule with safeMask fix for broadcast_shapes
crash during sliding window attention on long prompts (1370+ tokens)
- Add tmp/ to .gitignore
HomeSec-Bench: 87/96 (90%) — zero server crashes across full 96-test suite1 parent c59f6a1 commit 32ce0e2
2 files changed
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
0 commit comments