Conversation
passthrough tests upate readme
WalkthroughThis PR introduces comprehensive documentation and testing infrastructure for vLLM-MLX integration with Olla. It adds new API reference and backend integration documentation, updates navigation and readme files, and implements a new passthrough test script for validating Anthropic Messages API translator functionality across backends. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/content/index.md (1)
100-100:⚠️ Potential issue | 🟡 Minor
X-Olla-Backend-Typeexample values are missing the two new backends.The illustrative values list
(ollama/openai/openai-compatible/lm-studio/llamacpp/vllm/sglang/lemonade)doesn't includevllm-mlxordocker-model-runner, both of which are introduced in this PR.📝 Proposed fix
-| `X-Olla-Backend-Type` | Backend type (ollama/openai/openai-compatible/lm-studio/llamacpp/vllm/sglang/lemonade) | +| `X-Olla-Backend-Type` | Backend type (ollama/openai/openai-compatible/lm-studio/llamacpp/vllm/vllm-mlx/sglang/lemonade/docker-model-runner) |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/content/index.md` at line 100, Update the illustrative example values for the `X-Olla-Backend-Type` header in docs/content/index.md to include the two new backends; modify the value list `(ollama/openai/openai-compatible/lm-studio/llamacpp/vllm/sglang/lemonade)` to also contain `vllm-mlx` and `docker-model-runner` so the example reflects all supported backend types.
🧹 Nitpick comments (6)
test/scripts/passthrough/README.md (1)
64-64: Fenced code block missing language specifier.The markdownlint MD040 rule flags this. Adding a language identifier (e.g.,
text) silences the warning and helps renderers.Proposed fix
-``` +```text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/scripts/passthrough/README.md` at line 64, The fenced code block in the README (the triple-backtick block at the end of the passthrough README) is missing a language specifier; update the opening fence from ``` to ```text so the block becomes a fenced code block with a language identifier (e.g., change the lone ``` to ```text) to satisfy markdownlint MD040 and improve rendering.test/scripts/passthrough/test-passthrough.py (4)
285-285: Remove extraneousfprefix from strings with no placeholders.Ruff F541 flags six occurrences of
f"..."where no{}interpolation exists: lines 285, 350, 428, 479, 502, 527. These should be plain strings.Proposed fix (showing all six)
- self.pcolor(YELLOW, f" Non-streaming: ", end="") + self.pcolor(YELLOW, " Non-streaming: ", end="")- self.pcolor(YELLOW, f" Streaming: ", end="") + self.pcolor(YELLOW, " Streaming: ", end="")- self.pcolor(YELLOW, f" OpenAI check: ", end="") + self.pcolor(YELLOW, " OpenAI check: ", end="")- self.pcolor(YELLOW, f" Non-existent model: ", end="") + self.pcolor(YELLOW, " Non-existent model: ", end="")- self.pcolor(YELLOW, f" System parameter: ", end="") + self.pcolor(YELLOW, " System parameter: ", end="")- self.pcolor(YELLOW, f" Multi-turn conversation: ", end="") + self.pcolor(YELLOW, " Multi-turn conversation: ", end="")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/scripts/passthrough/test-passthrough.py` at line 285, Several calls to self.pcolor use f-strings with no interpolation (e.g., f" Non-streaming: ") which triggers Ruff F541; remove the unnecessary f prefix so they are plain string literals (e.g., " Non-streaming: "). Locate the other similar occurrences and update each call to use normal strings (references: self.pcolor invocations that pass f"..." for messages with no {} placeholders), run the linter to confirm F541 is resolved.
136-146: Silentexceptswallows all errors during health check.Ruff S110 flags the bare
except Exception: pass. While the fall-through to the failure message is fine for a test script, logging or printing the exception in verbose mode would aid debugging when Olla is misconfigured (e.g., SSL errors, DNS resolution failures).Proposed fix
try: r = requests.get(f"{self.base_url}/internal/health", timeout=5) if r.status_code == 200: self.pcolor(GREEN, "[OK] Olla is reachable") return True - except Exception: - pass + except Exception as e: + if self.verbose: + self.pcolor(GREY, f" ({e})") self.pcolor(RED, f"[FAIL] Cannot reach Olla at {self.base_url}") return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/scripts/passthrough/test-passthrough.py` around lines 136 - 146, The bare except in check_health silently swallows errors; change it to except Exception as e and, before falling through to the failure message, emit the exception when a verbose flag is set (e.g. use getattr(self, "verbose", False) or an existing self.verbose) by calling self.pcolor(YELLOW, f"Error contacting {self.base_url}: {e}") or similar so DNS/SSL/timeout errors are visible while preserving the existing return False behavior; keep requests.get and the existing success path intact.
330-339: Double JSON parse in verbose path.
r.json()is called once at line 332 to validate, and again at line 339 for verbose output. Store the result to avoid redundant parsing.Proposed fix
# Valid JSON try: - r.json() + body = r.json() except Exception: ok = False notes.append("invalid JSON response") + body = None if self.verbose and ok: self.pcolor(GREY, "") - self.pcolor(GREY, f" {json.dumps(r.json(), indent=2)[:300]}") + self.pcolor(GREY, f" {json.dumps(body, indent=2)[:300]}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/scripts/passthrough/test-passthrough.py` around lines 330 - 339, The code calls r.json() twice (once to validate and again for verbose output); change the try/except that currently calls r.json() at validation to parse once into a local variable (e.g., parsed_json or json_body) and set ok based on that, then reuse that variable in the verbose block (self.pcolor(...) f" {json.dumps(parsed_json, indent=2)[:300]}") to avoid redundant parsing and potential double work or errors; ensure the exception handler still sets ok = False and appends the same note when parsing fails.
402-413: Unusedmissingvariable on line 404; potential copy-paste oversight.
missing = PASSTHROUGH_EVENTS - event_typesis computed but never referenced. The actual validation on line 406 checksTRANSLATION_MIN_EVENTS - event_typesinstead. The intent (require at least the minimum set) is fine, but the dead assignment is confusing and may mislead future maintainers into thinking the full set is being validated.Proposed fix
# Validate event types if bi.expects_passthrough: - missing = PASSTHROUGH_EVENTS - event_types # Some events are optional depending on content length; require at least the minimum set if TRANSLATION_MIN_EVENTS - event_types: ok = False notes.append(f"missing events: {TRANSLATION_MIN_EVENTS - event_types}") else: - missing = TRANSLATION_MIN_EVENTS - event_types - if missing: + if TRANSLATION_MIN_EVENTS - event_types: ok = False - notes.append(f"missing events: {missing}") + notes.append(f"missing events: {TRANSLATION_MIN_EVENTS - event_types}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/scripts/passthrough/test-passthrough.py` around lines 402 - 413, The assignment missing = PASSTHROUGH_EVENTS - event_types inside the bi.expects_passthrough branch is unused and should be removed (or used consistently) to avoid dead code confusion; update the block in test-passthrough.py so that when bi.expects_passthrough you either compute and assert against PASSTHROUGH_EVENTS (using the missing variable in the validation/notes) or simply remove the unused missing assignment and keep the existing check against TRANSLATION_MIN_EVENTS, ensuring references to missing, PASSTHROUGH_EVENTS, TRANSLATION_MIN_EVENTS, and bi.expects_passthrough are consistent.docs/content/api-reference/vllm-mlx.md (1)
1-5: Consider adding YAML front matter for SEO, but note this is not standard across API reference pages.The backend integration counterpart (
integrations/backend/vllm-mlx.md) includestitle,description, andkeywordsfront matter. However, most API reference pages (10 out of 14) do not include front matter. A few pages (anthropic.md,llamacpp.md,overview.md) do include it. Adding front matter here would align this page with those examples and improve SEO, though it is not a consistent pattern across the API reference section.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/content/api-reference/vllm-mlx.md` around lines 1 - 5, Add YAML front matter to this API page by inserting a top matter block containing title: "vLLM-MLX API", a concise description summarizing "Proxy endpoints for vLLM-MLX inference servers running on Apple Silicon" and a keywords array similar to the backend counterpart (e.g., ["vllm", "mlx", "Apple Silicon", "inference", "API"]); ensure the front matter precedes the existing "vLLM-MLX API" header so the page metadata (title/description/keywords) aligns with integrations/backend/vllm-mlx.md and other pages that include front matter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/content/api-reference/vllm-mlx.md`:
- Around line 128-138: The fenced code block showing the streaming response
examples (the triple-backtick block containing lines starting with `data:
{"id":"chatcmpl-abc123"...}`) lacks a language identifier; update that opening
fence to include a language token such as text or sse (e.g., change ``` to
```text) so markdownlint MD040 is satisfied and the examples render
consistently.
In `@docs/content/integrations/backend/vllm-mlx.md`:
- Around line 33-34: Replace the bare ampersand in the list item text "Model
Detection & Normalisation" with the HTML-escaped entity "&" so the
markdown/html is valid; locate the list entry in
docs/content/integrations/backend/vllm-mlx.md containing "Model Detection &
Normalisation" and update it to use "Model Detection & Normalisation" (no
other changes).
In `@test/scripts/README.md`:
- Around line 60-68: The README section titled `/passthrough` omits vLLM-MLX
from the list of natively supported backends; update the human-readable list
under that header to include "vLLM-MLX" alongside the existing entries (vLLM, LM
Studio, Ollama, llama.cpp, Lemonade) so it matches the PASSTHROUGH_TYPES
definition used in the test scripts and the intent of this PR.
---
Outside diff comments:
In `@docs/content/index.md`:
- Line 100: Update the illustrative example values for the `X-Olla-Backend-Type`
header in docs/content/index.md to include the two new backends; modify the
value list
`(ollama/openai/openai-compatible/lm-studio/llamacpp/vllm/sglang/lemonade)` to
also contain `vllm-mlx` and `docker-model-runner` so the example reflects all
supported backend types.
---
Nitpick comments:
In `@docs/content/api-reference/vllm-mlx.md`:
- Around line 1-5: Add YAML front matter to this API page by inserting a top
matter block containing title: "vLLM-MLX API", a concise description summarizing
"Proxy endpoints for vLLM-MLX inference servers running on Apple Silicon" and a
keywords array similar to the backend counterpart (e.g., ["vllm", "mlx", "Apple
Silicon", "inference", "API"]); ensure the front matter precedes the existing
"vLLM-MLX API" header so the page metadata (title/description/keywords) aligns
with integrations/backend/vllm-mlx.md and other pages that include front matter.
In `@test/scripts/passthrough/README.md`:
- Line 64: The fenced code block in the README (the triple-backtick block at the
end of the passthrough README) is missing a language specifier; update the
opening fence from ``` to ```text so the block becomes a fenced code block with
a language identifier (e.g., change the lone ``` to ```text) to satisfy
markdownlint MD040 and improve rendering.
In `@test/scripts/passthrough/test-passthrough.py`:
- Line 285: Several calls to self.pcolor use f-strings with no interpolation
(e.g., f" Non-streaming: ") which triggers Ruff F541; remove the unnecessary f
prefix so they are plain string literals (e.g., " Non-streaming: "). Locate the
other similar occurrences and update each call to use normal strings
(references: self.pcolor invocations that pass f"..." for messages with no {}
placeholders), run the linter to confirm F541 is resolved.
- Around line 136-146: The bare except in check_health silently swallows errors;
change it to except Exception as e and, before falling through to the failure
message, emit the exception when a verbose flag is set (e.g. use getattr(self,
"verbose", False) or an existing self.verbose) by calling self.pcolor(YELLOW,
f"Error contacting {self.base_url}: {e}") or similar so DNS/SSL/timeout errors
are visible while preserving the existing return False behavior; keep
requests.get and the existing success path intact.
- Around line 330-339: The code calls r.json() twice (once to validate and again
for verbose output); change the try/except that currently calls r.json() at
validation to parse once into a local variable (e.g., parsed_json or json_body)
and set ok based on that, then reuse that variable in the verbose block
(self.pcolor(...) f" {json.dumps(parsed_json, indent=2)[:300]}") to avoid
redundant parsing and potential double work or errors; ensure the exception
handler still sets ok = False and appends the same note when parsing fails.
- Around line 402-413: The assignment missing = PASSTHROUGH_EVENTS - event_types
inside the bi.expects_passthrough branch is unused and should be removed (or
used consistently) to avoid dead code confusion; update the block in
test-passthrough.py so that when bi.expects_passthrough you either compute and
assert against PASSTHROUGH_EVENTS (using the missing variable in the
validation/notes) or simply remove the unused missing assignment and keep the
existing check against TRANSLATION_MIN_EVENTS, ensuring references to missing,
PASSTHROUGH_EVENTS, TRANSLATION_MIN_EVENTS, and bi.expects_passthrough are
consistent.
| ``` | ||
| data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334400,"model":"mlx-community/Llama-3.2-3B-Instruct-4bit","choices":[{"index":0,"delta":{"role":"assistant"},"logprobs":null,"finish_reason":null}]} | ||
|
|
||
| data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334400,"model":"mlx-community/Llama-3.2-3B-Instruct-4bit","choices":[{"index":0,"delta":{"content":"MLX"},"logprobs":null,"finish_reason":null}]} | ||
|
|
||
| ... | ||
|
|
||
| data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334401,"model":"mlx-community/Llama-3.2-3B-Instruct-4bit","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]} | ||
|
|
||
| data: [DONE] | ||
| ``` |
There was a problem hiding this comment.
Add a language identifier to the fenced code block.
The streaming response example at line 128 uses a plain triple-backtick fence without a language specifier, triggering markdownlint MD040. Use text or sse to satisfy the linter and enable consistent rendering.
📝 Proposed fix
-```
+```text
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk",...}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ``` | |
| data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334400,"model":"mlx-community/Llama-3.2-3B-Instruct-4bit","choices":[{"index":0,"delta":{"role":"assistant"},"logprobs":null,"finish_reason":null}]} | |
| data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334400,"model":"mlx-community/Llama-3.2-3B-Instruct-4bit","choices":[{"index":0,"delta":{"content":"MLX"},"logprobs":null,"finish_reason":null}]} | |
| ... | |
| data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334401,"model":"mlx-community/Llama-3.2-3B-Instruct-4bit","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]} | |
| data: [DONE] | |
| ``` |
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 128-128: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/content/api-reference/vllm-mlx.md` around lines 128 - 138, The fenced
code block showing the streaming response examples (the triple-backtick block
containing lines starting with `data: {"id":"chatcmpl-abc123"...}`) lacks a
language identifier; update that opening fence to include a language token such
as text or sse (e.g., change ``` to ```text) so markdownlint MD040 is satisfied
and the examples render consistently.
| <li>Model Detection & Normalisation</li> | ||
| <li>OpenAI API Compatibility</li> |
There was a problem hiding this comment.
Use & instead of bare & in HTML context.
Line 33 uses an unescaped & inside an HTML <li> tag, which is technically invalid HTML. The adjacent docker-model-runner.md uses & for the same list item (Model Detection & Normalisation).
📝 Proposed fix
-<li>Model Detection & Normalisation</li>
+<li>Model Detection & Normalisation</li>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <li>Model Detection & Normalisation</li> | |
| <li>OpenAI API Compatibility</li> | |
| <li>Model Detection & Normalisation</li> | |
| <li>OpenAI API Compatibility</li> |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/content/integrations/backend/vllm-mlx.md` around lines 33 - 34, Replace
the bare ampersand in the list item text "Model Detection & Normalisation" with
the HTML-escaped entity "&" so the markdown/html is valid; locate the list
entry in docs/content/integrations/backend/vllm-mlx.md containing "Model
Detection & Normalisation" and update it to use "Model Detection &
Normalisation" (no other changes).
| ### `/passthrough` - Anthropic Passthrough Tests | ||
| Validates that the Anthropic Messages API translator correctly selects passthrough or translation mode based on backend capability. | ||
| - Auto-discovers available backends and models | ||
| - Verifies passthrough mode for natively supported backends (vLLM, LM Studio, Ollama, llama.cpp, Lemonade) | ||
| - Verifies translation mode for backends without native support (OpenAI-compatible, LiteLLM) | ||
| - Non-streaming and streaming response validation | ||
| - OpenAI baseline comparison per backend | ||
| - Edge cases: non-existent models, system parameters, multi-turn conversations | ||
| - Translator statistics reporting |
There was a problem hiding this comment.
vLLM-MLX missing from the natively supported backends list.
Line 63 lists "vLLM, LM Studio, Ollama, llama.cpp, Lemonade" but omits vLLM-MLX, which is the focus of this PR and is present in PASSTHROUGH_TYPES in the test script.
Proposed fix
-- Verifies passthrough mode for natively supported backends (vLLM, LM Studio, Ollama, llama.cpp, Lemonade)
+- Verifies passthrough mode for natively supported backends (vLLM, vLLM-MLX, LM Studio, Ollama, llama.cpp, Lemonade)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### `/passthrough` - Anthropic Passthrough Tests | |
| Validates that the Anthropic Messages API translator correctly selects passthrough or translation mode based on backend capability. | |
| - Auto-discovers available backends and models | |
| - Verifies passthrough mode for natively supported backends (vLLM, LM Studio, Ollama, llama.cpp, Lemonade) | |
| - Verifies translation mode for backends without native support (OpenAI-compatible, LiteLLM) | |
| - Non-streaming and streaming response validation | |
| - OpenAI baseline comparison per backend | |
| - Edge cases: non-existent models, system parameters, multi-turn conversations | |
| - Translator statistics reporting | |
| ### `/passthrough` - Anthropic Passthrough Tests | |
| Validates that the Anthropic Messages API translator correctly selects passthrough or translation mode based on backend capability. | |
| - Auto-discovers available backends and models | |
| - Verifies passthrough mode for natively supported backends (vLLM, vLLM-MLX, LM Studio, Ollama, llama.cpp, Lemonade) | |
| - Verifies translation mode for backends without native support (OpenAI-compatible, LiteLLM) | |
| - Non-streaming and streaming response validation | |
| - OpenAI baseline comparison per backend | |
| - Edge cases: non-existent models, system parameters, multi-turn conversations | |
| - Translator statistics reporting |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/scripts/README.md` around lines 60 - 68, The README section titled
`/passthrough` omits vLLM-MLX from the list of natively supported backends;
update the human-readable list under that header to include "vLLM-MLX" alongside
the existing entries (vLLM, LM Studio, Ollama, llama.cpp, Lemonade) so it
matches the PASSTHROUGH_TYPES definition used in the test scripts and the intent
of this PR.
adds passthrough tests
updates docs
Summary by CodeRabbit
Release Notes
Documentation
Tests