tool parser: add GigaChatV3/3.1 models support in PEG format#19931
Conversation
f774481 to
2d12cf9
Compare
|
Can you check the PR please? Thanks) |
|
@Mishusha sorry, wanted to get the autoparser in first. Could you rebase on latest master? Nothing should change except the positioning of the functions and the detection code in |
|
@pwilkin Hello! I tried setting up an autoparser with the GigaChatv3 model. The tests themselves pass, but the autoparser doesn't save the Therefore, when I run my server without the -sp flag, toolcalls aren't automatically parsed because special tokens are discarded. What's the best way to do this—set up my own custom parser for GigaChatv3? (Leave the one currently committed) Or do you recommend something else? P.S. For gigachatv3.1, there's a single <|function_call|> token, and there are no such problems with the autoparser. |
|
@Mishusha Yeah, I checked it with the autoparser and determined that it needs the separate parser due to the role thing (similar to Functionary), that's why I told you to do that instead of just closing it as deprecated :) sorry, should've made myself clearer. The parser can stay as it is since it's already PEG based, you just have to move the detection code to the right place in chat.cpp and that's it. |
2d12cf9 to
6ea01ad
Compare
|
@pwilkin Ok, thank you very much) |
pwilkin
left a comment
There was a problem hiding this comment.
Waiting for CI and then will merge.
|
@pwilkin Some checks failed, but it seems that this is not related to this PR. |
|
Yup. |
Co-authored-by: Mishusha <pmv26021975@gmail.com>
* 'master' of github.com:ggml-org/llama.cpp: (33 commits) convert : better mtp check and fix return [no ci] (ggml-org#20419) vulkan: fix SSM_CONV PP scaling with large ubatch sizes (ggml-org#20379) New conversations now auto-select the first loaded model (ggml-org#20403) ggml-virtgpu: Fix some build commands (ggml-org#20341) metal : avoid divisions in bin kernel (ggml-org#20426) ci: Setup self-hosted CI for Intel Linux Vulkan backend (ggml-org#20154) vulkan: fix l2_norm epsilon handling (ggml-org#20350) vulkan: fix OOB check in flash_attn_mask_opt (ggml-org#20296) vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap (ggml-org#20059) opencl: use larger workgroup size for get_rows (ggml-org#20316) opencl: add cumsum op (ggml-org#18981) hip: compile debug builds with -O2 on hip to avoid a compiler bug (ggml-org#20392) common/parser: add GigaChatV3/3.1 models support (ggml-org#19931) model : add support for Phi4ForCausalLMV (ggml-org#20168) graph : add optional scale parameter to build_lora_mm [no ci] (ggml-org#20427) common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (ggml-org#20416) ggml-webgpu: Add supports for `GGML_OP_REPEAT` (ggml-org#20230) llama : enable chunked fused GDN path (ggml-org#20340) llama : whitespace cleanup (ggml-org#20422) ggml : add NVFP4 quantization type support (ggml-org#19769) ...
I have recreated the PR of #17924 for cleaner commits and no merge conflicts