Add Native Gemini API Compatibility to Server
Overview
Currently, llama-server provides an OpenAI-compatible API (/v1/chat/completions) and natively supports Anthropic's message schema via an inline request/response translator built into the server (convert_anthropic_to_oai).
We want to add native support for the Google Gemini API (e.g., /v1beta/models/...:generateContent or similar endpoints) using the exact same request/response translation pattern that was used for the Anthropic implementation.
Reference Implementations
- LiteLLM: See how LiteLLM intercepts Gemini API requests and translates them on-the-fly to OpenAI formats: LiteLLM Gemini Handler
- Anthropic Implementation in
llama.cpp: See tools/server/server-common.cpp (convert_anthropic_to_oai) and tools/server/server-task.cpp (to_json_anthropic) in our existing codebase for reference on how to inline the translation.
Checklist
Add Native Gemini API Compatibility to Server
Overview
Currently,
llama-serverprovides an OpenAI-compatible API (/v1/chat/completions) and natively supports Anthropic's message schema via an inline request/response translator built into the server (convert_anthropic_to_oai).We want to add native support for the Google Gemini API (e.g.,
/v1beta/models/...:generateContentor similar endpoints) using the exact same request/response translation pattern that was used for the Anthropic implementation.Reference Implementations
llama.cpp: Seetools/server/server-common.cpp(convert_anthropic_to_oai) andtools/server/server-task.cpp(to_json_anthropic) in our existing codebase for reference on how to inline the translation.Checklist
convert_gemini_to_oai()intools/server/server-common.cppto map Gemini'scontents,parts, andtextinto OpenAI'smessagesandcontent.TASK_RESPONSE_TYPE_GEMINIintools/server/server-task.h(similar toTASK_RESPONSE_TYPE_ANTHROPIC).to_json_gemini()intools/server/server-task.cppto emit Google-formatted responses (candidates,content,parts).format_gemini_sse()intools/server/server-context.cppfor streaming response support via SSE.handle_completions_impland related logic inserver-context.cppto process Gemini endpoints.ctx_http.post("/v1beta/models/:model:generateContent", ...)) intools/server/server.cpp.tools/server/tests/unit/(e.g.,test_compat_gemini.py) verifying correct request translation and response wrapping.