support llava 1.6 image embedding dimension in server#5553
support llava 1.6 image embedding dimension in server#5553ggerganov merged 7 commits intoggml-org:masterfrom
Conversation
|
nice to see, regarding your question: those structs use vectors and clip.h is C style. That's why you've to include them manually |
|
This is awesome! Thanks so much! |
|
This is a great point, the print here is definitely wrong. From a quick peek at the code it looks like the same issue was in the previous version as well, though I haven't verified via testing. I think this should be fixed, I'll take a quick look for anything obvious edit: From a quick look, one thing stands out in particular. It's that I am not familiar enough with this value yet, so I'm not sure if it just affects the log or has bigger impact across the generation |
|
@cjpais thanks for looking into this. Maybe @cmp-nct or @ggerganov has more ideas on why llama.cpp reports number of prompt token as 1 when using image in the input, and how to fix it? |
|
It seems like it's server only problem. Llava-cli seems to work. From llava-cli: From server through API: From server console: |
* server: init working 1.6 * move clip_image to header * remove commented code * remove c++ style from header * remove todo * expose llava_image_embed_make_with_clip_img * fix zig build
Should address #5514. I haven't tested extensively but the results for 1.6 are as follows. 1.5 seems to work fine from very brief testing.
Baseline
Command:
./llava-cli -ngl 99 -n 325 -c 4096 --temp 0 --mmproj ~/models/llava/1.6/llava-v1.6-mistral-7b/mmproj-model-f16.gguf -m ~/models/llava/1.6/llava-v1.6-mistral-7b/llava-v1.6-mistral-7b.Q5_K_M.gguf --image ~/Downloads/beach.jpg -p "describe the image in detail"Result:
The image shows a highway scene with a clear blue sky overhead. The road is lined with trees and appears to be in a rural or semi-rural area, as indicated by the presence of palm trees along the side. There are several vehicles on the road, including cars and trucks, suggesting that it's a busy time of day. The perspective of the image suggests it was taken from inside a vehicle traveling down the highway.This PR

Previous

Questions:
clip_image_u8intoclip.h?cc: @cmp-nct