Fix memory management bug in llava and server code#5491
Fix memory management bug in llava and server code#5491ggerganov merged 2 commits intoggml-org:masterfrom
Conversation
Fixes this error:
llama_new_context_with_model: graph splits (measure): 3
Available slots:
-> Slot 0 - max context: 6000
{"timestamp":1707926446,"level":"INFO","function":"main","line":2623,"message":"model loaded"}
all slots are idle and system prompt is empty, clear the KV cache
slot 0 - loaded image
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
slot 0 - encoding image [id: 1]
munmap_chunk(): invalid pointer
Aborted
examples/llava/clip.h
Outdated
| CLIP_API void clip_image_u8_batch_free (struct clip_image_u8 * data); | ||
| CLIP_API void clip_image_f32_batch_free(struct clip_image_f32 * data); |
There was a problem hiding this comment.
Wouldn't it be better to change these to:
CLIP_API void clip_image_u8_batch_free (struct clip_image_u8_batch * batch) [
if (batch.size > 0) {
delete[] batch.data;
}
batch.size = 0;
}There was a problem hiding this comment.
Agreed, I changed it and retested.
| pad_to_square = false; | ||
| } | ||
| // free the previous res_imgs if any set | ||
| if (res_imgs.size > 0 && res_imgs.size < 100) { |
There was a problem hiding this comment.
oh, I removed the upper bound because there didn't seem to be any justification for it, but if there is then let me know @cmp-nct and I'll restore it
There was a problem hiding this comment.
oh, I removed the upper bound because there didn't seem to be any justification for it, but if there is then let me know @cmp-nct and I'll restore it
The reason for the upper bound was a safety check in case the passed structure points to uninitialized memory, in that case it would almost certainly be outside that range.
So only relevant if someone uses it wrong, I'm fine either way.
Glad you spotted the double free, it's another remnant of the vector->pointer refactor.
* Fix memory management in llava and server code
Fixes this error:
llama_new_context_with_model: graph splits (measure): 3
Available slots:
-> Slot 0 - max context: 6000
{"timestamp":1707926446,"level":"INFO","function":"main","line":2623,"message":"model loaded"}
all slots are idle and system prompt is empty, clear the KV cache
slot 0 - loaded image
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
slot 0 - encoding image [id: 1]
munmap_chunk(): invalid pointer
Aborted
* Make it cleaner by checking size in batch free wrapper
Fixes this error:
llama_new_context_with_model: graph splits (measure): 3 Available slots:
-> Slot 0 - max context: 6000
{"timestamp":1707926446,"level":"INFO","function":"main","line":2623,"message":"model loaded"} all slots are idle and system prompt is empty, clear the KV cache slot 0 - loaded image
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
slot 0 - encoding image [id: 1]
munmap_chunk(): invalid pointer
Aborted
when running the server binary like this:
./bin/server -m ../models/mistral-7b-q_5_k.gguf --mmproj ../models/mmproj-mistral7b-f16-q6_k.gguf -ngl 50 -c 6000 --host 0.0.0.0 --port 8007 --no-mmap
Tested on:
Linux, WSL (Debian)
GPU: 4090