Skip to content

docs: vllm-mlx#110

Merged
thushan merged 1 commit intomainfrom
docs/vllm-mlx
Feb 20, 2026
Merged

docs: vllm-mlx#110
thushan merged 1 commit intomainfrom
docs/vllm-mlx

Conversation

@thushan
Copy link
Copy Markdown
Owner

@thushan thushan commented Feb 20, 2026

adds passthrough tests

updates docs

Summary by CodeRabbit

Release Notes

  • Documentation

    • Added comprehensive vLLM-MLX API reference and integration documentation for Apple Silicon support
    • Enhanced README with verification examples and detailed API querying guides, including OpenAI-compatible endpoints and Anthropic Messages API usage
    • Updated navigation structure to include new vLLM-MLX resources
  • Tests

    • Added Anthropic passthrough test suite with automatic backend discovery, validation, and edge case coverage

passthrough tests

upate readme
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 20, 2026

Walkthrough

This PR introduces comprehensive documentation and testing infrastructure for vLLM-MLX integration with Olla. It adds new API reference and backend integration documentation, updates navigation and readme files, and implements a new passthrough test script for validating Anthropic Messages API translator functionality across backends.

Changes

Cohort / File(s) Summary
vLLM-MLX Documentation
docs/content/api-reference/vllm-mlx.md, docs/content/integrations/backend/vllm-mlx.md
New comprehensive API reference (266 lines) documenting OpenAI-compatible endpoints, request/response samples, and streaming behaviour. New backend integration guide (552 lines) covering features, configuration, CLI options, troubleshooting, best practices, and integration examples for Apple Silicon deployment.
Documentation Navigation & Metadata
docs/mkdocs.yml, docs/content/integrations/overview.md
Added two vLLM-MLX navigation entries to mkdocs configuration and updated integrations overview table to include vLLM-MLX as a new backend option.
README & Badge Updates
docs/content/index.md, readme.md
Updated badges by removing LM Deploy and adding vLLM-MLX and Docker Model Runner entries. Expanded main README with new Verification and Querying Olla sections including OpenAI-compatible proxy, Anthropic Messages API, and provider-specific endpoint examples.
Version & Test Documentation Updates
docs/content/integrations/backend/docker-model-runner.md, test/scripts/README.md
Updated Docker Model Runner version from v0.0.17 to v0.0.23. Added new test category documentation for passthrough tests under Script Categories.
Passthrough Test Implementation
test/scripts/passthrough/README.md, test/scripts/passthrough/test-passthrough.py
New README (119 lines) and comprehensive test script (724 lines) implementing autonomous backend discovery, test matrix generation, and validation of passthrough versus translation modes for Anthropic-compatible backends with edge case handling and translator statistics collection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • bug: proxy streaming is broken #41: Related through test infrastructure additions, as both PRs expand the repository's test suite with new test scripts and documentation for testing functionality.

Suggested labels

documentation, llm-backend

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'docs: vllm-mlx' is vague and generic, using non-descriptive terms that don't clearly convey the scope of changes. Consider a more descriptive title such as 'docs: add vLLM-MLX integration documentation and passthrough tests' to better communicate the extent of changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch docs/vllm-mlx

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/content/index.md (1)

100-100: ⚠️ Potential issue | 🟡 Minor

X-Olla-Backend-Type example values are missing the two new backends.

The illustrative values list (ollama/openai/openai-compatible/lm-studio/llamacpp/vllm/sglang/lemonade) doesn't include vllm-mlx or docker-model-runner, both of which are introduced in this PR.

📝 Proposed fix
-| `X-Olla-Backend-Type` | Backend type (ollama/openai/openai-compatible/lm-studio/llamacpp/vllm/sglang/lemonade) |
+| `X-Olla-Backend-Type` | Backend type (ollama/openai/openai-compatible/lm-studio/llamacpp/vllm/vllm-mlx/sglang/lemonade/docker-model-runner) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/content/index.md` at line 100, Update the illustrative example values
for the `X-Olla-Backend-Type` header in docs/content/index.md to include the two
new backends; modify the value list
`(ollama/openai/openai-compatible/lm-studio/llamacpp/vllm/sglang/lemonade)` to
also contain `vllm-mlx` and `docker-model-runner` so the example reflects all
supported backend types.
🧹 Nitpick comments (6)
test/scripts/passthrough/README.md (1)

64-64: Fenced code block missing language specifier.

The markdownlint MD040 rule flags this. Adding a language identifier (e.g., text) silences the warning and helps renderers.

Proposed fix
-```
+```text
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/scripts/passthrough/README.md` at line 64, The fenced code block in the
README (the triple-backtick block at the end of the passthrough README) is
missing a language specifier; update the opening fence from ``` to ```text so
the block becomes a fenced code block with a language identifier (e.g., change
the lone ``` to ```text) to satisfy markdownlint MD040 and improve rendering.
test/scripts/passthrough/test-passthrough.py (4)

285-285: Remove extraneous f prefix from strings with no placeholders.

Ruff F541 flags six occurrences of f"..." where no {} interpolation exists: lines 285, 350, 428, 479, 502, 527. These should be plain strings.

Proposed fix (showing all six)
-        self.pcolor(YELLOW, f"  Non-streaming: ", end="")
+        self.pcolor(YELLOW, "  Non-streaming: ", end="")
-        self.pcolor(YELLOW, f"  Streaming:     ", end="")
+        self.pcolor(YELLOW, "  Streaming:     ", end="")
-        self.pcolor(YELLOW, f"  OpenAI check:  ", end="")
+        self.pcolor(YELLOW, "  OpenAI check:  ", end="")
-        self.pcolor(YELLOW, f"  Non-existent model: ", end="")
+        self.pcolor(YELLOW, "  Non-existent model: ", end="")
-        self.pcolor(YELLOW, f"  System parameter: ", end="")
+        self.pcolor(YELLOW, "  System parameter: ", end="")
-        self.pcolor(YELLOW, f"  Multi-turn conversation: ", end="")
+        self.pcolor(YELLOW, "  Multi-turn conversation: ", end="")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/scripts/passthrough/test-passthrough.py` at line 285, Several calls to
self.pcolor use f-strings with no interpolation (e.g., f"  Non-streaming: ")
which triggers Ruff F541; remove the unnecessary f prefix so they are plain
string literals (e.g., "  Non-streaming: "). Locate the other similar
occurrences and update each call to use normal strings (references: self.pcolor
invocations that pass f"..." for messages with no {} placeholders), run the
linter to confirm F541 is resolved.

136-146: Silent except swallows all errors during health check.

Ruff S110 flags the bare except Exception: pass. While the fall-through to the failure message is fine for a test script, logging or printing the exception in verbose mode would aid debugging when Olla is misconfigured (e.g., SSL errors, DNS resolution failures).

Proposed fix
         try:
             r = requests.get(f"{self.base_url}/internal/health", timeout=5)
             if r.status_code == 200:
                 self.pcolor(GREEN, "[OK] Olla is reachable")
                 return True
-        except Exception:
-            pass
+        except Exception as e:
+            if self.verbose:
+                self.pcolor(GREY, f"  ({e})")
         self.pcolor(RED, f"[FAIL] Cannot reach Olla at {self.base_url}")
         return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/scripts/passthrough/test-passthrough.py` around lines 136 - 146, The
bare except in check_health silently swallows errors; change it to except
Exception as e and, before falling through to the failure message, emit the
exception when a verbose flag is set (e.g. use getattr(self, "verbose", False)
or an existing self.verbose) by calling self.pcolor(YELLOW, f"Error contacting
{self.base_url}: {e}") or similar so DNS/SSL/timeout errors are visible while
preserving the existing return False behavior; keep requests.get and the
existing success path intact.

330-339: Double JSON parse in verbose path.

r.json() is called once at line 332 to validate, and again at line 339 for verbose output. Store the result to avoid redundant parsing.

Proposed fix
         # Valid JSON
         try:
-            r.json()
+            body = r.json()
         except Exception:
             ok = False
             notes.append("invalid JSON response")
+            body = None

         if self.verbose and ok:
             self.pcolor(GREY, "")
-            self.pcolor(GREY, f"    {json.dumps(r.json(), indent=2)[:300]}")
+            self.pcolor(GREY, f"    {json.dumps(body, indent=2)[:300]}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/scripts/passthrough/test-passthrough.py` around lines 330 - 339, The
code calls r.json() twice (once to validate and again for verbose output);
change the try/except that currently calls r.json() at validation to parse once
into a local variable (e.g., parsed_json or json_body) and set ok based on that,
then reuse that variable in the verbose block (self.pcolor(...) f"   
{json.dumps(parsed_json, indent=2)[:300]}") to avoid redundant parsing and
potential double work or errors; ensure the exception handler still sets ok =
False and appends the same note when parsing fails.

402-413: Unused missing variable on line 404; potential copy-paste oversight.

missing = PASSTHROUGH_EVENTS - event_types is computed but never referenced. The actual validation on line 406 checks TRANSLATION_MIN_EVENTS - event_types instead. The intent (require at least the minimum set) is fine, but the dead assignment is confusing and may mislead future maintainers into thinking the full set is being validated.

Proposed fix
         # Validate event types
         if bi.expects_passthrough:
-            missing = PASSTHROUGH_EVENTS - event_types
             # Some events are optional depending on content length; require at least the minimum set
             if TRANSLATION_MIN_EVENTS - event_types:
                 ok = False
                 notes.append(f"missing events: {TRANSLATION_MIN_EVENTS - event_types}")
         else:
-            missing = TRANSLATION_MIN_EVENTS - event_types
-            if missing:
+            if TRANSLATION_MIN_EVENTS - event_types:
                 ok = False
-                notes.append(f"missing events: {missing}")
+                notes.append(f"missing events: {TRANSLATION_MIN_EVENTS - event_types}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/scripts/passthrough/test-passthrough.py` around lines 402 - 413, The
assignment missing = PASSTHROUGH_EVENTS - event_types inside the
bi.expects_passthrough branch is unused and should be removed (or used
consistently) to avoid dead code confusion; update the block in
test-passthrough.py so that when bi.expects_passthrough you either compute and
assert against PASSTHROUGH_EVENTS (using the missing variable in the
validation/notes) or simply remove the unused missing assignment and keep the
existing check against TRANSLATION_MIN_EVENTS, ensuring references to missing,
PASSTHROUGH_EVENTS, TRANSLATION_MIN_EVENTS, and bi.expects_passthrough are
consistent.
docs/content/api-reference/vllm-mlx.md (1)

1-5: Consider adding YAML front matter for SEO, but note this is not standard across API reference pages.

The backend integration counterpart (integrations/backend/vllm-mlx.md) includes title, description, and keywords front matter. However, most API reference pages (10 out of 14) do not include front matter. A few pages (anthropic.md, llamacpp.md, overview.md) do include it. Adding front matter here would align this page with those examples and improve SEO, though it is not a consistent pattern across the API reference section.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/content/api-reference/vllm-mlx.md` around lines 1 - 5, Add YAML front
matter to this API page by inserting a top matter block containing title:
"vLLM-MLX API", a concise description summarizing "Proxy endpoints for vLLM-MLX
inference servers running on Apple Silicon" and a keywords array similar to the
backend counterpart (e.g., ["vllm", "mlx", "Apple Silicon", "inference",
"API"]); ensure the front matter precedes the existing "vLLM-MLX API" header so
the page metadata (title/description/keywords) aligns with
integrations/backend/vllm-mlx.md and other pages that include front matter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/content/api-reference/vllm-mlx.md`:
- Around line 128-138: The fenced code block showing the streaming response
examples (the triple-backtick block containing lines starting with `data:
{"id":"chatcmpl-abc123"...}`) lacks a language identifier; update that opening
fence to include a language token such as text or sse (e.g., change ``` to
```text) so markdownlint MD040 is satisfied and the examples render
consistently.

In `@docs/content/integrations/backend/vllm-mlx.md`:
- Around line 33-34: Replace the bare ampersand in the list item text "Model
Detection & Normalisation" with the HTML-escaped entity "&" so the
markdown/html is valid; locate the list entry in
docs/content/integrations/backend/vllm-mlx.md containing "Model Detection &
Normalisation" and update it to use "Model Detection & Normalisation" (no
other changes).

In `@test/scripts/README.md`:
- Around line 60-68: The README section titled `/passthrough` omits vLLM-MLX
from the list of natively supported backends; update the human-readable list
under that header to include "vLLM-MLX" alongside the existing entries (vLLM, LM
Studio, Ollama, llama.cpp, Lemonade) so it matches the PASSTHROUGH_TYPES
definition used in the test scripts and the intent of this PR.

---

Outside diff comments:
In `@docs/content/index.md`:
- Line 100: Update the illustrative example values for the `X-Olla-Backend-Type`
header in docs/content/index.md to include the two new backends; modify the
value list
`(ollama/openai/openai-compatible/lm-studio/llamacpp/vllm/sglang/lemonade)` to
also contain `vllm-mlx` and `docker-model-runner` so the example reflects all
supported backend types.

---

Nitpick comments:
In `@docs/content/api-reference/vllm-mlx.md`:
- Around line 1-5: Add YAML front matter to this API page by inserting a top
matter block containing title: "vLLM-MLX API", a concise description summarizing
"Proxy endpoints for vLLM-MLX inference servers running on Apple Silicon" and a
keywords array similar to the backend counterpart (e.g., ["vllm", "mlx", "Apple
Silicon", "inference", "API"]); ensure the front matter precedes the existing
"vLLM-MLX API" header so the page metadata (title/description/keywords) aligns
with integrations/backend/vllm-mlx.md and other pages that include front matter.

In `@test/scripts/passthrough/README.md`:
- Line 64: The fenced code block in the README (the triple-backtick block at the
end of the passthrough README) is missing a language specifier; update the
opening fence from ``` to ```text so the block becomes a fenced code block with
a language identifier (e.g., change the lone ``` to ```text) to satisfy
markdownlint MD040 and improve rendering.

In `@test/scripts/passthrough/test-passthrough.py`:
- Line 285: Several calls to self.pcolor use f-strings with no interpolation
(e.g., f"  Non-streaming: ") which triggers Ruff F541; remove the unnecessary f
prefix so they are plain string literals (e.g., "  Non-streaming: "). Locate the
other similar occurrences and update each call to use normal strings
(references: self.pcolor invocations that pass f"..." for messages with no {}
placeholders), run the linter to confirm F541 is resolved.
- Around line 136-146: The bare except in check_health silently swallows errors;
change it to except Exception as e and, before falling through to the failure
message, emit the exception when a verbose flag is set (e.g. use getattr(self,
"verbose", False) or an existing self.verbose) by calling self.pcolor(YELLOW,
f"Error contacting {self.base_url}: {e}") or similar so DNS/SSL/timeout errors
are visible while preserving the existing return False behavior; keep
requests.get and the existing success path intact.
- Around line 330-339: The code calls r.json() twice (once to validate and again
for verbose output); change the try/except that currently calls r.json() at
validation to parse once into a local variable (e.g., parsed_json or json_body)
and set ok based on that, then reuse that variable in the verbose block
(self.pcolor(...) f"    {json.dumps(parsed_json, indent=2)[:300]}") to avoid
redundant parsing and potential double work or errors; ensure the exception
handler still sets ok = False and appends the same note when parsing fails.
- Around line 402-413: The assignment missing = PASSTHROUGH_EVENTS - event_types
inside the bi.expects_passthrough branch is unused and should be removed (or
used consistently) to avoid dead code confusion; update the block in
test-passthrough.py so that when bi.expects_passthrough you either compute and
assert against PASSTHROUGH_EVENTS (using the missing variable in the
validation/notes) or simply remove the unused missing assignment and keep the
existing check against TRANSLATION_MIN_EVENTS, ensuring references to missing,
PASSTHROUGH_EVENTS, TRANSLATION_MIN_EVENTS, and bi.expects_passthrough are
consistent.

Comment on lines +128 to +138
```
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334400,"model":"mlx-community/Llama-3.2-3B-Instruct-4bit","choices":[{"index":0,"delta":{"role":"assistant"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334400,"model":"mlx-community/Llama-3.2-3B-Instruct-4bit","choices":[{"index":0,"delta":{"content":"MLX"},"logprobs":null,"finish_reason":null}]}

...

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334401,"model":"mlx-community/Llama-3.2-3B-Instruct-4bit","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}

data: [DONE]
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language identifier to the fenced code block.

The streaming response example at line 128 uses a plain triple-backtick fence without a language specifier, triggering markdownlint MD040. Use text or sse to satisfy the linter and enable consistent rendering.

📝 Proposed fix
-```
+```text
 data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk",...}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334400,"model":"mlx-community/Llama-3.2-3B-Instruct-4bit","choices":[{"index":0,"delta":{"role":"assistant"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334400,"model":"mlx-community/Llama-3.2-3B-Instruct-4bit","choices":[{"index":0,"delta":{"content":"MLX"},"logprobs":null,"finish_reason":null}]}
...
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334401,"model":"mlx-community/Llama-3.2-3B-Instruct-4bit","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}
data: [DONE]
```
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 128-128: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/content/api-reference/vllm-mlx.md` around lines 128 - 138, The fenced
code block showing the streaming response examples (the triple-backtick block
containing lines starting with `data: {"id":"chatcmpl-abc123"...}`) lacks a
language identifier; update that opening fence to include a language token such
as text or sse (e.g., change ``` to ```text) so markdownlint MD040 is satisfied
and the examples render consistently.

Comment on lines +33 to +34
<li>Model Detection & Normalisation</li>
<li>OpenAI API Compatibility</li>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use &amp; instead of bare & in HTML context.

Line 33 uses an unescaped & inside an HTML <li> tag, which is technically invalid HTML. The adjacent docker-model-runner.md uses &amp; for the same list item (Model Detection &amp; Normalisation).

📝 Proposed fix
-<li>Model Detection & Normalisation</li>
+<li>Model Detection &amp; Normalisation</li>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<li>Model Detection & Normalisation</li>
<li>OpenAI API Compatibility</li>
<li>Model Detection &amp; Normalisation</li>
<li>OpenAI API Compatibility</li>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/content/integrations/backend/vllm-mlx.md` around lines 33 - 34, Replace
the bare ampersand in the list item text "Model Detection & Normalisation" with
the HTML-escaped entity "&amp;" so the markdown/html is valid; locate the list
entry in docs/content/integrations/backend/vllm-mlx.md containing "Model
Detection & Normalisation" and update it to use "Model Detection &amp;
Normalisation" (no other changes).

Comment on lines +60 to +68
### `/passthrough` - Anthropic Passthrough Tests
Validates that the Anthropic Messages API translator correctly selects passthrough or translation mode based on backend capability.
- Auto-discovers available backends and models
- Verifies passthrough mode for natively supported backends (vLLM, LM Studio, Ollama, llama.cpp, Lemonade)
- Verifies translation mode for backends without native support (OpenAI-compatible, LiteLLM)
- Non-streaming and streaming response validation
- OpenAI baseline comparison per backend
- Edge cases: non-existent models, system parameters, multi-turn conversations
- Translator statistics reporting
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

vLLM-MLX missing from the natively supported backends list.

Line 63 lists "vLLM, LM Studio, Ollama, llama.cpp, Lemonade" but omits vLLM-MLX, which is the focus of this PR and is present in PASSTHROUGH_TYPES in the test script.

Proposed fix
-- Verifies passthrough mode for natively supported backends (vLLM, LM Studio, Ollama, llama.cpp, Lemonade)
+- Verifies passthrough mode for natively supported backends (vLLM, vLLM-MLX, LM Studio, Ollama, llama.cpp, Lemonade)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### `/passthrough` - Anthropic Passthrough Tests
Validates that the Anthropic Messages API translator correctly selects passthrough or translation mode based on backend capability.
- Auto-discovers available backends and models
- Verifies passthrough mode for natively supported backends (vLLM, LM Studio, Ollama, llama.cpp, Lemonade)
- Verifies translation mode for backends without native support (OpenAI-compatible, LiteLLM)
- Non-streaming and streaming response validation
- OpenAI baseline comparison per backend
- Edge cases: non-existent models, system parameters, multi-turn conversations
- Translator statistics reporting
### `/passthrough` - Anthropic Passthrough Tests
Validates that the Anthropic Messages API translator correctly selects passthrough or translation mode based on backend capability.
- Auto-discovers available backends and models
- Verifies passthrough mode for natively supported backends (vLLM, vLLM-MLX, LM Studio, Ollama, llama.cpp, Lemonade)
- Verifies translation mode for backends without native support (OpenAI-compatible, LiteLLM)
- Non-streaming and streaming response validation
- OpenAI baseline comparison per backend
- Edge cases: non-existent models, system parameters, multi-turn conversations
- Translator statistics reporting
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/scripts/README.md` around lines 60 - 68, The README section titled
`/passthrough` omits vLLM-MLX from the list of natively supported backends;
update the human-readable list under that header to include "vLLM-MLX" alongside
the existing entries (vLLM, LM Studio, Ollama, llama.cpp, Lemonade) so it
matches the PASSTHROUGH_TYPES definition used in the test scripts and the intent
of this PR.

@thushan thushan merged commit 95130ad into main Feb 20, 2026
8 checks passed
@thushan thushan deleted the docs/vllm-mlx branch February 20, 2026 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant