Gemini API compatibility by marksverdhei · Pull Request #7 · heiervang-technologies/ht-llama.cpp

marksverdhei · 2026-03-06T10:38:15Z

For gemini cli compatibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Register Qwen2_5OmniThinkerForConditionalGeneration architecture for text and mmproj GGUF conversion. Handle config structure difference where the Thinker-only variant has vision/audio configs at the top level. Add pooling type detection for embedding use cases. Fix audio tensor routing to base MmprojModel class. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

)

) * ggml: backend-agnostic tensor parallelism * support for GPT-OSS, Qwen 3 MoE * partial Vulkan fix * add support for 4/8 GPUs * unconditional peer access * re-use buffers + ggml contexts * fix output pattern * NCCL support * GGML: HIP: add RCCL support * Remove shfl and AllReduce from backend interface * move allocation workaround out of ggml-alloc.c * 2d tensor set/get support * Fix the seg fault without NCCL * Apply suggestion from JohannesGaessler * support for tensor dims % n_devs != 0 * fix view_offs scaling * arbitrary num. of GPUs/tensor split * fix compilation * better granularity estimate * Support device-specific host buffer types if all underlying backends expose the same type. This allows using pinned memory instead of pageable memory for CUDA. Fix compilation errors. * partial Qwen 3 Next support * Fix qwen3 30b (#8) * Fix crash with Qwen-30B-A3B Q4_0 Qwen-30B-A3B Q4_0 has an intermediate dimension of 768. Using a granularity of 256 forces an uneven split between GPUs, which is not supported by the current implementation. * Decide block size based on tensor quantization type * Fix crashes due to KV cache serialization (#9) KV cache serialization requires non-zero offsets on the tensor. Add support in the meta backend to set/get a tensor with a non-zero offset. * metal : fix build (#7) * static memory allocations, fix usage count * fix tensor granularity * more even memory distribution * use BF16 for allreduce * rebase fixup * better error message for unsupported architectures * Fix device mismatch during scatter of allReduce. (#11) There is a mismatch between the dst buffer device and the backend device, causing the use of sync copies * Enable the previous allreduce implementation. It is better in both perf and stability (#12) * delay AllReduce for Moe for less I/O * build : clean-up compile warnings * backend : move most of the meta backend API to ggml-backend-impl.h * cont : hide unused public API in the implementation * llama : use llama_device + remove ggml_backend_dev_is_meta() * ggml-backend : remove unused alloc include * minor : remove regex include * ggml : introduce ggml-ext.h for staging new APIs * rebase fixup * fix tests * llama : more robust logic for determining Meta devices (#16) * llama : more robust logic for determining Meta devices * cont : fix devs size check Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * cont : fix log type Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * disable roundtrip for meta backend * fix arch selection * Qwen 3.5 support * fix Gemma 4 MoE * fix OpenVino, SYCL * fix test-llama-archs for CPU-only builds * Fix Qwen 3.5 MoE * disable meta backend tests for WebGPU * tests : filter CPU-based devices from the Meta backend tests (#17) * meta : formatting, naming, indentation (#18) * formatting : llama-model.cpp * formatting : ggml-ext.h * formatting : ggml-backend-meta.cpp * meta : add TODO * add documentation * better error messages * fix GPT-OSS --------- Co-authored-by: Carl Philipp Klemm <carl@uvos.xyz> Co-authored-by: Gaurav Garg <gaugarg@nvidia.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

marksverdhei · 2026-04-27T13:59:10Z

Closing as stale — branch is 724 commits behind ht and Gemini API compatibility hasn't surfaced as a recent need. The codebase has moved on (model picker / chain picker / OpenAI-compat path). Reopen on a fresh branch off ht if this is needed again.

* spec: support MTP * fix batch size * rename files * cont : simplify (#7) * MTP: clean-up (#9) * MTP: clean-up * review: use llama_context_type instead of llama_graph_type * review: remove llama_model_has_mtp * review: fix convert issues * convert: fix pycheck * review: formatting * use `mtp-` for identifying mtp models * convert: fix mtp conversion * mtp -> draft-mtp * remove unused llama_arch * add need_embd in speculative * llama: allow partial seq_rm for GDN models for speculative decoding Currently speculative checkpoint needs to restart from a checkpoint after some draft tokens are not accepted, this leads to some wastage in running the target again. This PR adds the ability to rollback upto `draft_max` by storing the GDN intermediates. * fix pending state * vulkan: add GDN partial rollback * meta: extend check to axis 1 * metal: add GDN partial rollback Extend the gated delta net kernel to store intermediate states for partial rollback support on the Metal backend. - Add K (snapshot slot count) as a function constant - Read input state from slot 0 of the 3D state tensor - Write intermediate states to different slots during token loop - For K=1, maintain backward-compatible single-slot behavior Ref: ggml-org@8c05923 Assisted-by: llama.cpp:local pi * delta_net_base: use ggml_pad instead of new_tensor * review: add need_rs_seq * review: rename part_bounded to n_rs * review: deslop comments * review: rename, add asserts * server : adjust checkpoint logic (#11) * server : adjust checkpoint logic * cont : rm asserts * server-context: fix early exit * spec : fix compatibility with n-gram and add TODOs (#13) * metal : cleanup * llama : fix faulty bitwise check in recurrent memory * server : disable RS-based MTP in combination with other spec types * spec : add TODOs * cont : fix comment * cont : update comment * common : fix logic for ngram + mtp compat * llama-memory: enable checkpointing with partial rollback * cont: add test-case for loading into a dirty ctx * llama-memory-recurrent: clear rs_idx in clear * download: fix mtp path * llama-arch: fix enorm op * docs: update docs * conversion: fix type annotations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

marksverdhei and others added 8 commits March 6, 2026 07:45

docs: add ht-fork documentation, branding, and discussion links

8f2b631

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

convert: support LoRA conversion for MLA kv_b_proj

bea79e9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ci: add fork sync automation

213f1bb

feat: add --remap-developer-role flag to translate developer→system

f07b116

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ci: add ht branch to flake8 lint workflow triggers

8afb015

feat: welcome agentic contributions, remove upstream AI restrictions (#6

796f1b5

)

add code

2b3453c

marksverdhei marked this pull request as draft March 6, 2026 10:38

marksverdhei force-pushed the ht branch 5 times, most recently from b4243b4 to 9167c57 Compare March 11, 2026 08:33

marksverdhei force-pushed the ht branch 5 times, most recently from 5772904 to 22e54b4 Compare March 31, 2026 11:50

marksverdhei force-pushed the ht branch 2 times, most recently from cb142b0 to f40775c Compare April 7, 2026 12:35

marksverdhei force-pushed the ht branch from 6846da3 to 139f68e Compare April 12, 2026 09:32

marksverdhei closed this Apr 27, 2026

marksverdhei mentioned this pull request May 13, 2026

Runbook: rebase ht onto upstream/master (parked 2026-05-12) #38

Closed

marksverdhei mentioned this pull request Jun 6, 2026

branch hygiene: 12 stale remote branches (3 merged-undeleted, 6 closed-rejected, 3 PR-less) #90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini API compatibility#7

Gemini API compatibility#7
marksverdhei wants to merge 8 commits into
htfrom
feat/gemini-api-compatibility

marksverdhei commented Mar 6, 2026

Uh oh!

marksverdhei commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marksverdhei commented Mar 6, 2026

Uh oh!

marksverdhei commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant