refact : fix convert script + zero out KV cache to avoid nans by ggerganov · Pull Request #3523 · ggml-org/llama.cpp

ggerganov · 2023-10-07T08:24:36Z

Copied tokenization from convert-starcoder-hf-to-gguf.py
ALiBi is prone to random KV cache data, so we have to zero out the cache at the start. Since llama : custom attention mask + parallel decoding + no context swaps #3228, we can access uninitialized KV cache data due to:
https://github.com/ggerganov/llama.cpp/blob/bdbe11719d81dfdc955b762b6d99796724e292b7/llama.cpp#L5024
If this data happens to contain nan, then the generation fails

Question: should we first mask the KV tensor and then apply ALiBi?

If that were the case, then the above KV cache initialization wouldn't be needed since any uninitialized values will be masked with -INF

slaren · 2023-10-07T10:59:45Z

If that were the case, then the above KV cache initialization wouldn't be needed since any uninitialized values will be masked with -INF

But nan - INF is still nan, so I don't think that this would work for removing nans before alibi.

refact : fix convert script + zero out KV cache to avoid nans

bdbe117

ggerganov mentioned this pull request Oct 7, 2023

add refact model #3329

Merged

ggml : silu(-inf) should never happen

42833bc

martell mentioned this pull request Oct 8, 2023

model: refact-1_6B-fim unable to load model #3531

Closed

ggerganov added 2 commits October 8, 2023 11:04

metal : assert various kernel requirements

0f8df39

Merge branch 'master' into fix-refact

acead65

ggerganov added the need feedback Testing and feedback with results are needed label Oct 8, 2023

ggerganov merged commit fcca0a7 into master Oct 9, 2023

Provide feedback