Commit eac5ab3
committed
fix: Prevent crash on full prompt cache hit (100% match)
When a repeated prompt matched 100% of cached tokens, the remaining
token slice was empty (0 tokens). Passing this to the model caused
'[reshape] Cannot infer the shape of an empty array' fatal error.
Fix: replay the last cached token (with KV trim-back by 1) so the
model always receives at least 1 token for next-token logit production.1 parent 32dd183 commit eac5ab3
1 file changed
Lines changed: 9 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
961 | 961 | | |
962 | 962 | | |
963 | 963 | | |
964 | | - | |
| 964 | + | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
965 | 973 | | |
966 | 974 | | |
967 | 975 | | |
| |||
0 commit comments