server: /v1/responses (partial) by openingnow · Pull Request #18486 · ggml-org/llama.cpp

openingnow · 2025-12-30T10:10:56Z

previous PR: #18227

Conversations need to be resolved:

This PR implements:

Converting Chat Completions requests to Responses requests, while aware of reasoning_content.
Emitting Responses SSEs

Current caveats:

~~If there are consecutive function calls, the response.output_item.added event will not be emitted for the latter ones.~~
All response.output_item.done events are generated at the end (i.e., when there is both reasoning and a function call, response.function_call_arguments.delta is created before response.output_item.done(of reasoning))

The issue arise from server_task_result_cmpl_partial::update and seems to be not problematic for codex-cli since it does not check order of events.

Here's a visualization of handling reasoning texts. Let's start with codex 'Explain this repo in one sentence'

# config.toml
# match model name github.com/openai/codex/blob/0a568a/codex-rs/core/src/models_manager/model_info.rs#L116
model = "gpt-oss_local_gguf"
model_provider = "llama_cpp"

[model_providers.llama_cpp]
name = "llama_cpp API"
base_url = "http://127.0.0.1:8080/v1"
wire_api = "responses"
stream_idle_timeout_ms = 10000000

First request (Responses format):

{"model":"gpt-oss_local_gguf","instructions":"You are a ...","input":[
    {"type":"message","role":"developer","content":[{"type":"input_text","text":"<permissions instructions>..."}]},
    {"type":"message","role":"user","content":[{"type":"input_text","text":"# AGENTS.md instructions for ..."}]},
    {"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>..."}]},
    {"type":"message","role":"user","content":[{"type":"input_text","text":"Explain this repo in one sentence"}]}
],"tools":["..."],"tool_choice":"auto","parallel_tool_calls":false,"reasoning":null,"store":false,"stream":true,"include":[],"prompt_cache_key":"019bd4e7-1366-7571-93a6-f1b960ee1c59"}

convert_responses_to_chatcmpl converts this into the Chat Completions format:

{"model":"gpt-oss_local_gguf","tool_choice":"auto","parallel_tool_calls":false,"reasoning":null,"store":false,"stream":true,"include":[],"prompt_cache_key":"019bd4e7-1366-7571-93a6-f1b960ee1c59","messages":[
    {"role":"system","content":"You are a ..."},
    {"role":"developer","content":[{"text":"<permissions instructions>...","type":"text"}]},
    {"role":"user","content":[{"text":"# AGENTS.md instructions for ...","type":"text"}]},
    {"role":"user","content":[{"text":"<environment_context>...","type":"text"}]},
    {"role":"user","content":[{"text":"Explain this repo in one sentence","type":"text"}]}
],"tools":["..."]}

generated prompt:

<|start|>system<|message|>You are ChatGPT, ...<|end|>
<|start|>developer<|message|># Instructions\n\nYou are a ...\n\n# Tools...<|end|>
<|start|>user<|message|># AGENTS.md instructions for ...<|end|>
<|start|>user<|message|><environment_context>...<|end|>
<|start|>user<|message|>Explain this repo in one sentence<|end|>
<|start|>assistant

With the prompt, a reasoning text("We need to explain repo ...") and a function call(ls -R) are made.

    {"type":"response.created","response":{"id":"resp_WWZZqZyHtSnMteAaYg1oCkKtxeMk1mPO","object":"response","status":"in_progress"}}
    {"type":"response.in_progress","response":{"id":"resp_WWZZqZyHtSnMteAaYg1oCkKtxeMk1mPO","object":"response","status":"in_progress"}}
    {"type":"response.output_item.added","item":{"id":"rs_HL7QCzI9EEpldiojvIeiYPb6AOe3EnUi","summary":[],"type":"reasoning","content":[],"encrypted_content":"","status":"in_progress"}}
    {"type":"response.reasoning_text.delta","delta":"We","item_id":"rs_HL7QCzI9EEpldiojvIeiYPb6AOe3EnUi"}
    deltas ...
    {"type":"response.output_item.added","item":{"arguments":"","call_id":"fc_wxzvZd6LrQJetz7V9ZxjjmAFObzRzzg0","name":"shell","type":"function_call","status":"in_progress"}}
    {"type":"response.function_call_arguments.delta","delta":"{\"","item_id":"fc_wxzvZd6LrQJetz7V9ZxjjmAFObzRzzg0"}
    deltas ...
--> {"type":"response.output_item.done","item":{"id":"rs_HL7QCzI9EEpldiojvIeiYPb6AOe3EnUi","summary":[],"type":"reasoning","content":[{"text":"We need to explain repo in one sentence. Let's inspect repo.","type":"reasoning_text"}],"encrypted_content":""}}
    {"type":"response.output_item.done","item":{"type":"function_call","status":"completed","arguments":"{\"command\":[\"bash\",\"-lc\",\"ls -R\"],\"workdir\":\"./foobar\"}","call_id":"fc_wxzvZd6LrQJetz7V9ZxjjmAFObzRzzg0","name":"shell"}}
    {"type": "response.completed", ...}
(The arrow shows delayed `response.output_item.done` of reasoning)

Codex sends the new request after executing ls -R.

Second request (Responses format):

 {"model":"gpt-oss_local_gguf","instructions":"You are a ...","input":[
     {"type":"message","role":"developer","content":[{"type":"input_text","text":"<permissions instructions>..."}]},
     {"type":"message","role":"user","content":[{"type":"input_text","text":"# AGENTS.md instructions for ..."}]},
     {"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>..."}]},
     {"type":"message","role":"user","content":[{"type":"input_text","text":"Explain this repo in one sentence"}]},
+    {"type":"reasoning","summary":[],"content":[{"type":"reasoning_text","text":"We need to explain repo in one sentence. Let's inspect repo."}],"encrypted_content":""},
+    {"type":"function_call","name":"shell","arguments":"{\"command\":[\"bash\",\"-lc\",\"ls -R\"],\"workdir\":\"./foobar\"}","call_id":"fc_wxzvZd6LrQJetz7V9ZxjjmAFObzRzzg0"},
+    {"type":"function_call_output","call_id":"fc_wxzvZd6LrQJetz7V9ZxjjmAFObzRzzg0","output":"{\"output\":\".:\\nfoo.cpp\\n\",\"metadata\":{\"exit_code\":0,\"duration_seconds\":0.0}}"}
 ],"tools":["..."],"tool_choice":"auto","parallel_tool_calls":false,"reasoning":null,"store":false,"stream":true,"include":[],"prompt_cache_key":"019bd4e7-1366-7571-93a6-f1b960ee1c59"}

Converted request (Chat Completions format):

 {"model":"gpt-oss_local_gguf","tool_choice":"auto","parallel_tool_calls":false,"reasoning":null,"store":false,"stream":true,"include":[],"prompt_cache_key":"019bd4e7-1366-7571-93a6-f1b960ee1c59","messages":[
     {"role":"system","content":"You are a ..."},
     {"role":"developer","content":[{"text":"<permissions instructions>...","type":"text"}]},
     {"role":"user","content":[{"text":"# AGENTS.md instructions for ...","type":"text"}]},
     {"role":"user","content":[{"text":"<environment_context>...","type":"text"}]},
     {"role":"user","content":[{"text":"Explain this repo in one sentence","type":"text"}]},
+    {"role":"assistant","tool_calls":[{"function":{"arguments":"{\"command\":[\"bash\",\"-lc\",\"ls -R\"],\"workdir\":\"./foobar\"}","name":"shell"},"id":"fc_wxzvZd6LrQJetz7V9ZxjjmAFObzRzzg0","type":"function"}],"reasoning_content":"We need to explain repo in one sentence. Let's inspect repo."},
+    {"role":"tool","content":"{\"output\":\".:\\nfoo.cpp\\n\",\"metadata\":{\"exit_code\":0,\"duration_seconds\":0.0}}","tool_call_id":"fc_wxzvZd6LrQJetz7V9ZxjjmAFObzRzzg0"}
 ],"tools":["..."]}

generated prompt:

 <|start|>system<|message|>You are ChatGPT, ...<|end|>
 <|start|>developer<|message|># Instructions\n\nYou are a ...\n\n# Tools\n\n...<|end|>
 <|start|>user<|message|># AGENTS.md instructions for ...<|end|>
 <|start|>user<|message|><environment_context>...<|end|>
 <|start|>user<|message|>Explain this repo in one sentence<|end|>
-<|start|>assistant
+<|start|>assistant<|channel|>analysis<|message|>We need to explain repo in one sentence. Let's inspect repo.<|end|>
+<|start|>assistant to=functions.shell<|channel|>commentary json<|message|>"{\"command\":[\"bash\",\"-lc\",\"ls -R\"],\"workdir\":\"./foobar\"}"<|call|>
+<|start|>functions.shell to=assistant<|channel|>commentary<|message|>"{\"output\":\".:\\nfoo.cpp\\n\",\"metadata\":{\"exit_code\":0,\"duration_seconds\":0.0}}"<|end|>
+<|start|>assistant

This generates another function call(sed -n '1,200p' foo.cpp) and the next (third) request is done in similar way.

 <|start|>system<|message|>You are ChatGPT, ...<|end|>
 <|start|>developer<|message|># Instructions\n\nYou are a ...\n\n# Tools\n\n...<|end|>
 <|start|>user<|message|># AGENTS.md instructions for ...<|end|>
 <|start|>user<|message|><environment_context>...<|end|>
 <|start|>user<|message|>Explain this repo in one sentence<|end|>
 <|start|>assistant<|channel|>analysis<|message|>We need to explain repo in one sentence. Let's inspect repo.<|end|>
 <|start|>assistant to=functions.shell<|channel|>commentary json<|message|>"{\"command\":[\"bash\",\"-lc\",\"ls -R\"],\"workdir\":\"./foobar\"}"<|call|>
 <|start|>functions.shell to=assistant<|channel|>commentary<|message|>"{\"output\":\".:\\nfoo.cpp\\n\",\"metadata\":{\"exit_code\":0,\"duration_seconds\":0.0}}"<|end|>
-<|start|>assistant
+<|start|>assistant<|channel|>analysis<|message|>Only one file foo.cpp. Open it.<|end|>
+<|start|>assistant to=functions.shell<|channel|>commentary json<|message|>"{\"command\":[\"bash\",\"-lc\",\"sed -n '1,200p' foo.cpp\"],\"workdir\":\"./foobar\"}"<|call|>
+<|start|>functions.shell to=assistant<|channel|>commentary<|message|>"{\"output\":\"#include <iostream>\\n\\nint main() {\\n    std::cout << \\\"Hello!\\\" << std::endl;\\n}\\n\",\"metadata\":{\"exit_code\":0,\"duration_seconds\":0.0}}"<|end|>
+<|start|>assistant

This generates:

reasoning_text: "Repo contains single C++ hello world program. Provide one sentence."
output_text: "A single C++ file that prints “Hello!” to the console."

Next, the user types "Compile and run in one line" and the fourth request is sent. Reasoning contents from the previous turn are excluded (by the chat template).

 <|start|>system<|message|>You are ChatGPT, ...<|end|>
 <|start|>developer<|message|># Instructions\n\nYou are a ...\n\n# Tools\n\n...<|end|>
 <|start|>user<|message|># AGENTS.md instructions for ...<|end|>
 <|start|>user<|message|><environment_context>...<|end|>
 <|start|>user<|message|>Explain this repo in one sentence<|end|>
-<|start|>assistant<|channel|>analysis<|message|>We need to explain repo in one sentence. Let's inspect repo.<|end|>
 <|start|>assistant to=functions.shell<|channel|>commentary json<|message|>"{\"command\":[\"bash\",\"-lc\",\"ls -R\"],\"workdir\":\"./foobar\"}"<|call|>
 <|start|>functions.shell to=assistant<|channel|>commentary<|message|>"{\"output\":\".:\\nfoo.cpp\\n\",\"metadata\":{\"exit_code\":0,\"duration_seconds\":0.0}}"<|end|>
-<|start|>assistant<|channel|>analysis<|message|>Only one file foo.cpp. Open it.<|end|>
 <|start|>assistant to=functions.shell<|channel|>commentary json<|message|>"{\"command\":[\"bash\",\"-lc\",\"sed -n '1,200p' foo.cpp\"],\"workdir\":\"./foobar\"}"<|call|>
 <|start|>functions.shell to=assistant<|channel|>commentary<|message|>"{\"output\":\"#include <iostream>\\n\\nint main() {\\n    std::cout << \\\"Hello!\\\" << std::endl;\\n}\\n\",\"metadata\":{\"exit_code\":0,\"duration_seconds\":0.0}}"<|end|>
-<|start|>assistant
+<|start|>assistant<|channel|>final<|message|>A single C++ file that prints “Hello!” to the console.<|end|>
+<|start|>user<|message|>Compile and run in one line<|end|>
+<|start|>assistant

And so on.

aldehir · 2026-01-02T11:37:38Z

Now converting tools.

The gpt-oss models require feeding the reasoning from prior assistant tool calls. In the common library, this is handled via the reasoning_content field in the message. Is this something that can be handled by the stateless responses API?

ngxson · 2026-01-02T13:08:41Z

Is this something that can be handled by the stateless responses API?

Just note that we do support state tracking for streamed API response, documented in server devs docs

openingnow · 2026-01-04T15:06:29Z

OpenAI models do not provide raw reasoning text. (model spec)

Hidden chain-of-thought message: some of OpenAI’s models can generate a hidden chain-of-thought message to reason through a problem before generating a final answer. This chain of thought is used to guide the model’s behavior, but is not exposed to the user or developer except potentially in summarized form.

As Aldehir mentioned, if an LLM request includes tool call output, reasoning contents should also be included. (docs/function-calling)

for reasoning models like GPT-5 or o4-mini, any reasoning items returned in model responses with tool calls must also be passed back with tool call outputs.

I see 2 problems:

While there is a field for providing previous reasoning contents, it is for summary and not for raw reasoning text.
Chat Completions API does not have a field for providing reasoning text. This could be a difference between /chat/completions and /responses.

I hope inspecting codex-cli will help.

coder543 · 2026-01-04T17:14:25Z

@openingnow I think the question was regarding gpt-oss, not the closed source OpenAI models, but the closed source OpenAI models do provide full reasoning traces that you hand back to the server with each request so that the reasoning can persist between tool calls, it is just encrypted so that no one outside of OpenAI can read those reasoning traces. They are not summaries, although unencrypted summaries are also available for user-facing usage.

aldehir · 2026-01-04T17:35:21Z

@openingnow

See https://cookbook.openai.com/articles/gpt-oss/handle-raw-cot; Indeed I am only concerned about gpt-oss.

llama.cpp sends reasoning traces to the client via reasoning_content and accepts them back in the same field, adjusting them as needed for the template:

llama.cpp/common/chat.cpp

Lines 1944 to 1957 in cef1d23

    
           // Copy reasoning to the "thinking" field as expected by the gpt-oss template 
        
           auto adjusted_messages = json::array(); 
        
           for (const auto & msg : inputs.messages) { 
        
               auto has_reasoning_content = msg.contains("reasoning_content") && msg.at("reasoning_content").is_string(); 
        
               auto has_tool_calls = msg.contains("tool_calls") && msg.at("tool_calls").is_array(); 
        
               if (has_reasoning_content && has_tool_calls) { 
        
                   auto adjusted_message = msg; 
        
                   adjusted_message["thinking"] = msg.at("reasoning_content"); 
        
                   adjusted_messages.push_back(adjusted_message); 
        
               } else { 
        
                   adjusted_messages.push_back(msg); 
        
               } 
        
           }

This deviates from the recommended approach, which uses reasoning for the Chat Completions API, but its what llama.cpp and vLLM have settled on.

My hope is to align with OpenAI's recommendation for the Responses API.

This doesn't have to be tackled in this PR, which is basic support. I only bring it up for awareness, as it is a highly desired feature.

openingnow · 2026-01-05T03:59:24Z

As this API is for openai compatibility and aims to be a drop-in replacement, shouldn't the behavior match with closed source models?

coder543 · 2026-01-05T06:06:53Z

Encryption of the reasoning traces is not a compatibility concern, so… I don’t see any reason to encrypt them, if that’s what you’re asking? Otherwise, I’m not sure what you’re asking.

It sounds like the desire is to have compatibility, which I agree with entirely.

Given how buggy codex is with the Chat API provided by llama-server today, I would definitely like it to have proper Responses API support soon, which would include passing back reasoning traces.

openingnow · 2026-01-05T12:37:35Z

My concern is, which field does codex-cli use to deliver the reasoning contents.

For reasoning input and output, we have summary, encrypted_content, and content. The problem is, proprietary models will use first two fields while oss models will only use the last one. Therefore applications based on the proprietary models will look at summary or encrypted_content and append it to subsequent requests. They will not look at plain content field since proprietary models will not fill it.

Supporting codex-cli with reasoning + tool calling would be a great test for this PR, and here is my plan.

Figure out what field does codex-cli use
On output, append reasoning to the field
On input, read the field and move that into message["thinking"] (or other appropriate place)

@coder543 Does this make sense? And can you provide an edge case or dump where codex-cli with llama-server fails?

coder543 · 2026-01-05T14:22:45Z

What you wrote makes sense, but I think it's not something we should have to worry so much about.

For the Responses API path, Codex CLI just replays the ResponseItem items it received; it doesn’t pick any specific fields. The schema explicitly supports summary, content (raw reasoning text), and encrypted_content in the same item. See ResponseItem::Reasoning and ReasoningItemContent.

On the Responses request path, the prompt.input array (which includes prior ResponseItems) is passed through unchanged into the request body:

stream_prompt builds request from prompt.input (https://github.com/openai/codex/blob/0b53aed2d05fb96e4df634900528c3d8283af34e/codex-rs/codex-api/src/endpoint/responses.rs#L80-L91)
ResponsesRequestBuilder uses input verbatim (https://github.com/openai/codex/blob/0b53aed2d05fb96e4df634900528c3d8283af34e/codex-rs/codex-api/src/requests/responses.rs#L97-L126)

Codex CLI explicitly asks for encrypted reasoning only when the model is known to support reasoning summaries: ["reasoning.encrypted_content"]

Codex CLI seems to fully support GPT-OSS, which makes sense because OpenAI defined the spec for it. We don't have to fake the encrypted_content field.

If we wanted to offer a CLI option to move the reasoning text into the encrypted_content field as a hack to support a poorly written client, we could do that, but I don't think it makes sense as the default.

Or we could pass the reasoning text back in the encrypted_content field when the client asks for encrypted_content? Which is required for a client to get encrypted_content from OpenAI's main API. Which Codex CLI will not do for GPT-OSS, at least not by default, since it recognizes that GPT-OSS implementations don't usually support encrypted_content.

This is my best understanding of the situation from poking around the code.

openingnow · 2026-01-19T10:31:51Z

I editted main text as the explanation is too long to be in a comment.
@ngxson Is it acceptable to make events from server_task_result_cmpl_partial::update()?

ngxson · 2026-01-19T10:51:34Z

I don't get what you mean. We're doing function programming here and it's unclear from your question which is the state and which is the derived state

openingnow · 2026-01-19T11:28:06Z

The primary state would be variables related to state.update_chat_msg(content, true, oaicompat_msg_diffs);, which is oaicompat_msg_diffs and task_result_state & state(excluding task_result_state::openai_responses_item_ids).
Derived state includes openai_responses_item_ids(which is modified at update()) and openai_responses_current_events(which is generated SSEs from current diff chunk)

tools/server/server-task.cpp

…ver_task_result_cmpl_partial, and server_task_result_cmpl_final

openingnow · 2026-01-21T02:58:17Z

Rebased to resolve conflict around task_result_state(const common_chat_parser_params & chat_parser_params)

ngxson · 2026-01-21T09:36:57Z

tools/server/server-common.cpp

+                                {"file_data", input_item.at("file_data")},
+                                {"filename",  input_item.at("filename")},
+                            }},
+                            {"type", "file"},


I don't think we support this type yet. It should probably be converted into a text chunk (please verify), or maybe we just reject this type for now

I think it should be rejected unless file is supported from chat completions.

ngxson · 2026-01-21T09:37:55Z

tools/server/server-task.cpp

+            {"type",      "function_call"},
+            {"status",    "completed"},
+            {"arguments", tool_call.arguments},
+            {"call_id",   "fc_" + tool_call.id},


do we expect to use oai_resp_fc_id here?

No, since oai_resp_fc_id is for keeping function call's id while generating args, it only exists in task_result_state and server_task_result_cmpl_partial and not in server_task_result_cmpl_final.

ngxson · 2026-01-21T09:40:24Z

tools/server/server-task.cpp

+                {"data", json {
+                    {"type",    "response.function_call_arguments.delta"},
+                    {"delta",   diff.tool_call_delta.arguments},
+                    {"item_id", "fc_" + oai_resp_fc_id},


it's unclear to me, does oai_resp_fc_id value already include fc_ prefix inside it?

No, it does not has "fc_" prefix. It is copied from diff.tool_call_delta.id without any prefix.

…ed out

* from previous PR * Make instruction(system) as first message * Convert [input_message] (text/image/file) * Rename convert_responses_to_chatcmpl(body) -> response_body * Initial tool call support * Erase instructions field from chatcmpl body * Feed reasoning texts to chat template * Use std::vector instead of opaque json array * Make output_item.added events consistent * Move `server_task_result_cmpl_partial::update` from header to source * Match ID of output_item.added and .done events * Add function_call only if there is no "fc_" prefix * Add function call output at non-streaming API * Test if ID is persistent * Add doc * Fix style - use trailing comma * Rewrite state management * catch up with upstream/master * Fix style - "type" is the first item of SSE data * Explicitly check "instructions" from response_body * Make lambdas static * Check if reasoning content exists * Add `oai_resp_id` to task_result_state(also initialized at ctor), server_task_result_cmpl_partial, and server_task_result_cmpl_final * Reject `input_file` since it is not supported by chatcmpl * Add "fc_" prefix to non-straming function call id as coderabbit pointed out --------- Co-authored-by: openingnow <>

openingnow requested review from CISC, ggerganov and ngxson as code owners December 30, 2025 10:10

github-actions bot added examples python python script changes server labels Dec 30, 2025

openingnow force-pushed the v1_responses branch from 293b94e to 5dcc7fa Compare December 30, 2025 10:15

loci-dev mentioned this pull request Dec 30, 2025

UPSTREAM PR #18486: server: /v1/responses (partial) auroralabs-loci/llama.cpp#759

Open

2 tasks

openingnow force-pushed the v1_responses branch from 5dcc7fa to 9f09745 Compare January 1, 2026 14:05

This comment was marked as duplicate.

Sign in to view

openingnow force-pushed the v1_responses branch 2 times, most recently from fda1d43 to fdb26fb Compare January 19, 2026 10:00

ngxson reviewed Jan 19, 2026

View reviewed changes

tools/server/server-task.cpp Outdated Show resolved Hide resolved

ngxson reviewed Jan 19, 2026

View reviewed changes

tools/server/server-task.cpp Outdated Show resolved Hide resolved

ngxson reviewed Jan 19, 2026

View reviewed changes

tools/server/server-task.cpp Show resolved Hide resolved

ngxson reviewed Jan 19, 2026

View reviewed changes

tools/server/server-task.cpp Outdated Show resolved Hide resolved

openingnow force-pushed the v1_responses branch 2 times, most recently from 34c54c2 to e8061a2 Compare January 20, 2026 07:25

openingnow added 13 commits January 20, 2026 23:54

Match ID of output_item.added and .done events

d9dca02

Add function_call only if there is no "fc_" prefix

cd9b4cf

Add function call output at non-streaming API

6c200df

Test if ID is persistent

63c6013

Add doc

f232a1b

Fix style - use trailing comma

8a2dd2d

Rewrite state management

42a6eb3

catch up with upstream/master

5e1f65c

Fix style - "type" is the first item of SSE data

951fe42

Explicitly check "instructions" from response_body

ebb6438

Make lambdas static

cf83e1a

Check if reasoning content exists

0d5e3de

Add oai_resp_id to task_result_state(also initialized at ctor), ser…

5ac23d2

…ver_task_result_cmpl_partial, and server_task_result_cmpl_final

openingnow force-pushed the v1_responses branch from cd11168 to 5ac23d2 Compare January 20, 2026 23:59

ngxson mentioned this pull request Jan 21, 2026

[Mirror] server: /v1/responses (partial) ngxson/llama.cpp#85

Open

ngxson reviewed Jan 21, 2026

View reviewed changes

openingnow added 2 commits January 21, 2026 11:56

Reject input_file since it is not supported by chatcmpl

da3ed76

Add "fc_" prefix to non-straming function call id as coderabbit point…

96995a6

…ed out

ngxson approved these changes Jan 21, 2026

View reviewed changes

ngxson merged commit fbbf3ad into ggml-org:master Jan 21, 2026
85 checks passed

openingnow mentioned this pull request Jan 21, 2026

Misc. bug: OpenAI API v1/responses llama-server #14702

Open

openingnow deleted the v1_responses branch January 22, 2026 10:31

RodriMora mentioned this pull request Jan 23, 2026

server: add /v1/responses support ikawrakow/ik_llama.cpp#1184

Merged

github-actions bot mentioned this pull request Jan 24, 2026

Reddit News Daily 2026-01-24 gitlawr/reddit-daily-news#134

Open

jeremyckahn mentioned this pull request Jan 26, 2026

bug: From Codex CLI: Support for the "chat" wire API is deprecated and will soon be removed. Update your model provider definition in config.toml to use wire_api = "responses". janhq/jan#7413

Open

3 tasks

This was referenced Jan 27, 2026

Feature Request: Support OpenAI Responses API (/v1/responses) in llama.cpp server #19138

Open

Eval bug: Responses API (/v1/responses) can`t cancel a stream to stop generation #19173

Open

bfroemel mentioned this pull request Jan 29, 2026

Performance of newer versions of codex with gpt-oss has fallen off a cliff openai/codex#8272

Open

Conversation

openingnow commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as duplicate.

aldehir commented Jan 2, 2026

Uh oh!

ngxson commented Jan 2, 2026

Uh oh!

openingnow commented Jan 4, 2026

Uh oh!

coder543 commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openingnow commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coder543 commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openingnow commented Jan 5, 2026

Uh oh!

coder543 commented Jan 5, 2026

Uh oh!

openingnow commented Jan 19, 2026

Uh oh!

ngxson commented Jan 19, 2026

Uh oh!

openingnow commented Jan 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openingnow commented Jan 21, 2026

Uh oh!

ngxson Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

openingnow Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

openingnow Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

openingnow Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

openingnow commented Dec 30, 2025 •

edited

Loading

coder543 commented Jan 4, 2026 •

edited

Loading

aldehir commented Jan 4, 2026 •

edited

Loading

openingnow commented Jan 5, 2026 •

edited

Loading

coder543 commented Jan 5, 2026 •

edited

Loading