Fix Ollama 404 error and streaming response parsing#3684
Fix Ollama 404 error and streaming response parsing#3684lsby wants to merge 6 commits intomicrosoft:mainfrom
Conversation
|
@microsoft-github-policy-service agree |
|
LGTM |
|
Ollama has a built-in OAI compatible endpoint /v1/chat/completions, which was working well previously. Why not just fix that path issue? |
|
Thanks for pointing that out. I wasn’t aware that Ollama provides a built-in OpenAI-compatible endpoint at /v1/chat/completions. After testing, it works as expected. Given that, updating the endpoint path is indeed a cleaner solution and avoids the need for a custom streaming parser. I’ve revised the commits accordingly to use the compatible endpoint and removed the additional parsing logic. Thanks again for the suggestion. |
wrenchpilot
left a comment
There was a problem hiding this comment.
Tested update that includes the /v1/chat/completions endpoint. LGTM2.
|
It's taking a while to approve just 6 lines of code :/ |
CoPilot needs its time. 😅 🙃 |
|
Can someone with write access review this PR, please? |
|
Just one more person, come on humanity !!! |
|
Reopening to try again to merge |
|
Since this is stuck in approvals, I created another PR to merge this - #3858 |
Pull request was closed
|
Thanks @sandy081. Can we talk about why this remained broken for a week+ and required back channel communication? Is there something we can do in the future to better alert folks to critical issues like this? |
|
@sandy081 Can you confirm this is scheduled for release ~February 28, 2026? Is there a faster hotfix pipe this could potentially be pushed into? |
|
the only agent i'm able to use is stuck behind a pay wall. can this not be released sooner than more than a week from now? |

Summary
This PR fixes a
404error when calling Ollama models.The root causes include:
Screenshot from here
Environment
Ollama
mistral-small3.2:latest(supports vision and tool calling)VSCode
Issue 1: Incorrect API Endpoint
Ollama uses the API path
/api/chat(see official documentation), not/chat/completions.In the current implementation,
OllamaLMProviderextendsAbstractOpenAICompatibleLMProvider, which constructs the request path based on the OpenAI-compatible protocol. As a result, requests were incorrectly sent to/chat/completions, leading to a404error.I overrode
createOpenAIEndPointinOllamaLMProviderto provide the correct path.Issue 2: Incompatible Streaming Response Parsing
Ollama’s streaming responses use JSON Lines format (one JSON object per line).
The current parser does not handle this format, which prevents it from correctly parsing the streamed content.
To resolve this, I implemented a dedicated response parser
processOllamaStreamResponsefor Ollama.Since the parser needs to be conditionally applied based on the endpoint type, I introduced a subclass in
src/extension/byok/node/ollamaEndpointto indicate whether the current endpoint targets Ollama.