feat: backend/vllm by thushan · Pull Request #44 · thushan/olla

thushan · 2025-08-07T04:06:20Z

This PR implements the backend provider type for vllm.

You can configure a vLLM endpoint like this in the configuration (also in the default config.yaml as an example):

- url: "http://192.168.0.1:8000"
  name: "local-vllm"
  type: "vllm"
  priority: 100
  model_url: "/v1/models"
  health_check_url: "/health"
  check_interval: 5s
  check_timeout: 2s

Features supported:

Healthcheck
Model unification across vllm instances
Supports native and openai methods

TODO:

Determine how to handle model swaps (process starts with tinyllama shutdown, server comes back up with llama4, Olla thinks it's still okay to have / route to tinyllama).

Summary by CodeRabbit

New Features

Added native support for the vLLM backend, including enhanced model metadata, health checks, and dedicated API endpoints.
Introduced vLLM-specific model discovery and routing, with profile auto-detection and improved resource/concurrency management.
Enabled vLLM model streaming and chat completion in test suites and scripts.

Bug Fixes

Improved provider detection logic and ensured compatibility with vLLM endpoints.

Documentation

Updated documentation to reflect native vLLM support and revised usage instructions.

Tests

Added comprehensive unit and integration tests for vLLM discovery, parsing, and chat completion.
Extended test scripts to cover vLLM-specific endpoints and streaming responses.

Chores

Refactored provider constants for consistency and maintainability.

adds vllm converter for parsing fix parsing errors when no models are present in vllm change healthcheck endpoint for vllm to standard one Update tests for converter factory add vllm profile fix vllm profile typo

coderabbitai · 2025-08-07T04:06:26Z

Walkthrough

This change introduces native vLLM support throughout the codebase. It adds a vLLM converter, parser, and profile, updates configuration and documentation, and extends discovery, routing, and test scripts to handle vLLM as a first-class provider. New integration and unit tests ensure correct vLLM behaviour and compatibility.

Changes

Cohort / File(s)	Change Summary
Configuration: Static Discovery & vLLM Profile `config/config.yaml`, `config/profiles/vllm.yaml`	Adds a static vLLM discovery endpoint and a comprehensive vLLM profile with model/resource/concurrency metadata, new API paths, detection hints, and timeout scaling.
Converter Factory & Providers `internal/adapter/converter/factory.go`, `internal/adapter/converter/factory_test.go`, `internal/adapter/converter/lmstudio_converter.go`, `internal/adapter/converter/ollama_converter.go`	Registers the new vLLM converter, updates tests to expect vLLM, and replaces hardcoded provider strings with constants for consistency.
vLLM Converter & Unit Tests `internal/adapter/converter/vllm_converter.go`, `internal/adapter/converter/vllm_converter_test.go`	Implements the vLLM model response converter with extended metadata and permissions, and provides comprehensive unit tests for conversion logic and helpers.
Model Discovery & Registry `internal/adapter/discovery/http_client.go`, `internal/adapter/registry/profile/parsers.go`, `internal/adapter/registry/profile/vllm.go`, `internal/adapter/registry/profile/vllm_parser.go`, `internal/adapter/registry/profile/vllm_parser_test.go`	Adds vLLM to the discovery profile order, implements vLLM model/permission structs, introduces a vLLM profile parser, and provides robust unit tests for parsing.
Domain Profile Constant `internal/core/domain/profile.go`	Adds the `ProfileVLLM` constant for use throughout the codebase.
Integration Tests: vLLM `internal/integration/providers/vllm_integration_test.go`	Introduces end-to-end integration tests for vLLM model discovery, health checks, and chat completion against a live vLLM server.
Documentation `readme.md`	Updates documentation to reflect native vLLM support, new endpoints, and usage instructions.
Test Scripts: Provider Routing & Streaming `test/scripts/logic/test-model-routing-provider.py`, `test/scripts/logic/test-model-routing.sh`, `test/scripts/logic/test-provider-models.sh`, `test/scripts/logic/test-provider-routing.sh`, `test/scripts/streaming/test-streaming-responses.py`	Extends test scripts to include vLLM as a provider, tests vLLM-specific endpoints, formats, and streaming, and updates usage instructions.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant OLLA
    participant vLLM Server

    Client->>OLLA: GET /olla/vllm/v1/models
    OLLA->>vLLM Server: GET /v1/models
    vLLM Server-->>OLLA: vLLM models response (native format)
    OLLA-->>Client: vLLM models response (native format)

    Client->>OLLA: POST /olla/vllm/v1/chat/completions
    OLLA->>vLLM Server: POST /v1/chat/completions
    vLLM Server-->>OLLA: Chat completion response
    OLLA-->>Client: Chat completion response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: API Redesign #33: Introduces provider-specific routing and profile-driven registration, providing the framework that this PR extends with concrete vLLM support, but does not itself implement vLLM-specific converters, parsers, or tests.

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/backend/vllm

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (7)

readme.md (1)
43-49: Fix markdown list style consistency.

The static analysis tool has flagged inconsistent list formatting. The project uses dashes for unordered lists, but these lines use asterisks.

Apply this diff to maintain consistent markdown formatting:
-* [Ollama](https://github.com/ollama/ollama) - native support for Ollama, including model unification. \
+- [Ollama](https://github.com/ollama/ollama) - native support for Ollama, including model unification. \
   Use: `/olla/ollama/`
-* [LM Studio](https://lmstudio.ai/) - native support for LMStudio, including model unification. \
+- [LM Studio](https://lmstudio.ai/) - native support for LMStudio, including model unification. \
   Use: `/olla/lmstudio/` || `/olla/lm-studio/` || `/olla/lm_studio/`
-* [vLLM](https://github.com/vllm-project/vllm) - native support for vllm, including model unification. \
+- [vLLM](https://github.com/vllm-project/vllm) - native support for vllm, including model unification. \
   Use: `/olla/vllm/` \
   Models from vLLM will be available under `/olla/models` and `/olla/vllm/v1/models`
-* [OpenAI](https://platform.openai.com/docs/overview) - You can use OpenAI API that provides a unified query API across all providers. \
+- [OpenAI](https://platform.openai.com/docs/overview) - You can use OpenAI API that provides a unified query API across all providers. \
   Use: `/olla/openai/`
test/scripts/logic/test-provider-routing.sh (1)
202-210: Address shellcheck warnings about variable declarations.

The static analysis correctly identifies that declaring and assigning variables in one line can mask return values from curl commands.

Apply this diff to address the warnings:
-    local vllm_response=$(curl -s "${TARGET_URL}/olla/vllm/v1/models" 2>/dev/null || echo '{}')
-    local vllm_count=$(echo "$vllm_response" | grep -o '"id"' | wc -l)
-    
-    local openai_response=$(curl -s "${TARGET_URL}/olla/openai/v1/models" 2>/dev/null || echo '{}')
-    local openai_count=$(echo "$openai_response" | grep -o '"id"' | wc -l)
+    local vllm_response
+    vllm_response=$(curl -s "${TARGET_URL}/olla/vllm/v1/models" 2>/dev/null || echo '{}')
+    local vllm_count
+    vllm_count=$(echo "$vllm_response" | grep -o '"id"' | wc -l)
+    
+    local openai_response
+    openai_response=$(curl -s "${TARGET_URL}/olla/openai/v1/models" 2>/dev/null || echo '{}')
+    local openai_count
+    openai_count=$(echo "$openai_response" | grep -o '"id"' | wc -l)
test/scripts/logic/test-provider-models.sh (1)
222-232: Excellent vLLM-specific metadata validation.

The test correctly validates both the standard OpenAI format structure and vLLM-specific fields like max_model_len. This ensures the enhanced metadata is properly exposed.

Address the shellcheck warning about variable declaration:
-                local model_count=$(echo "$response_body" | grep -o '"id"' | wc -l)
+                local model_count
+                model_count=$(echo "$response_body" | grep -o '"id"' | wc -l)
internal/integration/providers/vllm_integration_test.go (1)
148-151: Improve error handling for EOF conditions

The error handling for EOF could be more idiomatic and clearer.

Use io.ReadAll for simpler and more robust reading:
-		// Read response body properly
-		buf := make([]byte, 10240)
-		n, err := resp.Body.Read(buf)
-		if err != nil && err.Error() != "EOF" && err.Error() != "unexpected EOF" {
-			require.NoError(t, err)
-		}
+		// Read response body
+		body, err := io.ReadAll(resp.Body)
+		require.NoError(t, err)

-		models, err := vllmProfile.ParseModelsResponse(buf[:n])
+		models, err := vllmProfile.ParseModelsResponse(body)
Don't forget to add "io" to the imports.
config/profiles/vllm.yaml (3)
126-148: Remove trailing spaces & ensure POSIX newline.

yamllint flags trailing whitespace on Lines 126, 133 & 144 and a missing newline at EOF (Line 148). Keeping the file lint-clean avoids noisy CI failures.
-    - patterns: ["*1b*", "*1.1b*", "*1.5b*"]␠
+    - patterns: ["*1b*", "*1.1b*", "*1.5b*"]
@@
-    requires_gpu: true␠
+    requires_gpu: true
@@
-      max_concurrent: 100␠
+      max_concurrent: 100
+
 # (EOF now ends with a single newline)
135-143: Verify matching precedence in concurrency_limits.

concurrency_limits is evaluated top-to-bottom by the registry parser. Because thresholds are listed from highest (min_memory_gb: 100) to lowest (0), a lookup using >= may accidentally match multiple entries, with only the first honoured. Double-check the parsing logic and, if it keeps the last match, reverse the order for clarity.
concurrency_limits:
  - min_memory_gb: 0      # smaller models
    max_concurrent: 100
  - min_memory_gb: 20     # 13B+ models
    max_concurrent: 50
  - min_memory_gb: 50     # 30B+ models
    max_concurrent: 20
  - min_memory_gb: 100    # 70B+ models
    max_concurrent: 10
36-38: Consider adding common alternative ports to default_ports.

Many self-hosted vLLM demos run on 8080 or 8001; adding them improves auto-discovery hit-rate without harm.
 default_ports:
   - 8000
+  - 8001
+  - 8080

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c7eb705 and 8393afd.

📒 Files selected for processing (21)

config/config.yaml (1 hunks)
config/profiles/vllm.yaml (4 hunks)
internal/adapter/converter/factory.go (1 hunks)
internal/adapter/converter/factory_test.go (2 hunks)
internal/adapter/converter/lmstudio_converter.go (3 hunks)
internal/adapter/converter/ollama_converter.go (3 hunks)
internal/adapter/converter/vllm_converter.go (1 hunks)
internal/adapter/converter/vllm_converter_test.go (1 hunks)
internal/adapter/discovery/http_client.go (1 hunks)
internal/adapter/registry/profile/parsers.go (2 hunks)
internal/adapter/registry/profile/vllm.go (1 hunks)
internal/adapter/registry/profile/vllm_parser.go (1 hunks)
internal/adapter/registry/profile/vllm_parser_test.go (1 hunks)
internal/core/domain/profile.go (1 hunks)
internal/integration/providers/vllm_integration_test.go (1 hunks)
readme.md (2 hunks)
test/scripts/logic/test-model-routing-provider.py (8 hunks)
test/scripts/logic/test-model-routing.sh (2 hunks)
test/scripts/logic/test-provider-models.sh (3 hunks)
test/scripts/logic/test-provider-routing.sh (3 hunks)
test/scripts/streaming/test-streaming-responses.py (9 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*_test.go

📄 CodeRabbit Inference Engine (CLAUDE.md)

Unit tests should test individual components in isolation.

Files:

internal/adapter/converter/factory_test.go
internal/adapter/converter/vllm_converter_test.go
internal/integration/providers/vllm_integration_test.go
internal/adapter/registry/profile/vllm_parser_test.go

internal/{app,adapter}/**/*.go

📄 CodeRabbit Inference Engine (CLAUDE.md)

Endpoints should be exposed at /internal/health and /internal/status.

Files:

internal/adapter/converter/factory_test.go
internal/adapter/converter/lmstudio_converter.go
internal/adapter/converter/ollama_converter.go
internal/adapter/discovery/http_client.go
internal/adapter/registry/profile/parsers.go
internal/adapter/converter/factory.go
internal/adapter/converter/vllm_converter_test.go
internal/adapter/converter/vllm_converter.go
internal/adapter/registry/profile/vllm.go
internal/adapter/registry/profile/vllm_parser.go
internal/adapter/registry/profile/vllm_parser_test.go

test/scripts/logic/test-model-routing.sh

📄 CodeRabbit Inference Engine (CLAUDE.md)

Test routing and headers using /test/scripts/logic/test-model-routing.sh.

Files:

test/scripts/logic/test-model-routing.sh

🧠 Learnings (7)

📚 Learning: applies to internal/adapter/proxy/*_test.go : shared proxy tests should ensure compatibility between...

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines.

Applied to files:

internal/adapter/converter/factory_test.go
internal/adapter/converter/vllm_converter_test.go
test/scripts/logic/test-provider-routing.sh
internal/integration/providers/vllm_integration_test.go
internal/adapter/registry/profile/vllm_parser_test.go

📚 Learning: applies to {proxy_sherpa.go,proxy_olla.go} : proxy implementations should be in `proxy_sherpa.go` an...

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to {proxy_sherpa.go,proxy_olla.go} : Proxy implementations should be in `proxy_sherpa.go` and `proxy_olla.go`.

Applied to files:

internal/adapter/converter/ollama_converter.go

📚 Learning: applies to internal/adapter/proxy/*.go : expose the following response headers: `x-olla-endpoint`, `...

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.788Z
Learning: Applies to internal/adapter/proxy/*.go : Expose the following response headers: `X-Olla-Endpoint`, `X-Olla-Model`, `X-Olla-Backend-Type`, `X-Olla-Request-ID`, `X-Olla-Response-Time`.

Applied to files:

internal/adapter/converter/ollama_converter.go
internal/adapter/registry/profile/parsers.go
internal/adapter/converter/vllm_converter.go
internal/adapter/registry/profile/vllm.go
internal/integration/providers/vllm_integration_test.go
config/profiles/vllm.yaml

📚 Learning: applies to test/scripts/logic/test-model-routing.sh : test routing and headers using `/test/scripts/...

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to test/scripts/logic/test-model-routing.sh : Test routing and headers using `/test/scripts/logic/test-model-routing.sh`.

Applied to files:

test/scripts/logic/test-model-routing.sh
test/scripts/logic/test-provider-models.sh
test/scripts/logic/test-provider-routing.sh
test/scripts/logic/test-model-routing-provider.py

📚 Learning: applies to **/*_test.go : unit tests should test individual components in isolation....

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to **/*_test.go : Unit tests should test individual components in isolation.

Applied to files:

internal/adapter/converter/vllm_converter_test.go
internal/integration/providers/vllm_integration_test.go
internal/adapter/registry/profile/vllm_parser_test.go

📚 Learning: applies to internal/adapter/proxy/*_test.go : integration tests should test the full request flow th...

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to internal/adapter/proxy/*_test.go : Integration tests should test the full request flow through the proxy.

Applied to files:

internal/adapter/converter/vllm_converter_test.go
test/scripts/logic/test-provider-routing.sh
internal/integration/providers/vllm_integration_test.go
internal/adapter/registry/profile/vllm_parser_test.go

📚 Learning: applies to internal/adapter/proxy/*_test.go : benchmark tests should measure performance of critical...

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to internal/adapter/proxy/*_test.go : Benchmark tests should measure performance of critical paths, proxy engine comparisons, connection pooling efficiency, and circuit breaker behavior.

Applied to files:

internal/integration/providers/vllm_integration_test.go
internal/adapter/registry/profile/vllm_parser_test.go

🧬 Code Graph Analysis (7)

internal/adapter/converter/lmstudio_converter.go (1)

internal/core/constants/providers.go (1)

ProviderPrefixLMStudio1 (17-17)

internal/adapter/converter/ollama_converter.go (1)

internal/core/constants/providers.go (1)

ProviderTypeOllama (4-4)

internal/adapter/discovery/http_client.go (1)

internal/core/domain/profile.go (3)

ProfileOllama (4-4)

ProfileLmStudio (5-5)

ProfileVLLM (6-6)

internal/adapter/registry/profile/parsers.go (1)

internal/core/constants/providers.go (4)

ProviderTypeOllama (4-4)

ProviderPrefixLMStudio1 (17-17)

ProviderTypeVLLM (8-8)

ProviderTypeOpenAI (6-6)

internal/adapter/converter/factory.go (1)

internal/adapter/converter/vllm_converter.go (1)

NewVLLMConverter (50-52)

internal/adapter/converter/vllm_converter.go (4)

internal/adapter/registry/profile/vllm.go (1)

VLLMModelPermission (22-35)

internal/core/ports/model_converter.go (2)

ModelResponseConverter (10-15)

ModelFilters (18-23)

internal/core/constants/providers.go (1)

ProviderTypeVLLM (8-8)

internal/core/domain/unified_model.go (1)

UnifiedModel (15-31)

internal/adapter/registry/profile/vllm.go (1)

internal/adapter/converter/vllm_converter.go (1)

VLLMModelPermission (31-44)

🪛 markdownlint-cli2 (0.17.2)

readme.md

43-43: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

45-45: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

47-47: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

🪛 Shellcheck (0.10.0)

test/scripts/logic/test-provider-models.sh

[warning] 226-226: Declare and assign separately to avoid masking return values.

(SC2155)

test/scripts/logic/test-provider-routing.sh

[warning] 203-203: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 204-204: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 208-208: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 209-209: Declare and assign separately to avoid masking return values.

(SC2155)

🪛 YAMLlint (1.37.1)

config/profiles/vllm.yaml

[error] 126-126: trailing spaces

(trailing-spaces)

[error] 133-133: trailing spaces

(trailing-spaces)

[error] 144-144: trailing spaces

(trailing-spaces)

[error] 148-148: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (30)

internal/core/domain/profile.go (1)

6-6: LGTM! Consistent constant definition.

The addition of ProfileVLLM follows the established pattern and provides a centralized definition for the vLLM provider identifier.

config/config.yaml (1)

64-71: Verify vLLM-specific health check endpoint and timing values.

The configuration uses /health as the health check URL and longer intervals (5s check, 2s timeout) compared to other providers. Please verify these values are appropriate for vLLM's health check behaviour.

internal/adapter/converter/factory.go (1)

25-25: LGTM! Proper converter registration.

The VLLM converter is correctly registered following the established pattern used by other converters.

internal/adapter/converter/ollama_converter.go (3)

6-6: LGTM! Good practice to centralise constants.

Adding the constants import enables the use of centralised provider type definitions instead of magic strings.

42-42: LGTM! Eliminates magic string.

Replacing hardcoded "ollama" with constants.ProviderTypeOllama improves maintainability and consistency.

69-69: LGTM! Consistent constant usage.

Using the same constant for alias source checking maintains consistency with the GetFormatName method.

internal/adapter/converter/factory_test.go (3)

50-50: LGTM! Test correctly validates vLLM format in error messages.

The test appropriately checks that "vllm" is included in the supported formats list within error messages.

55-55: LGTM! Correct format count update.

The expected count is properly updated to 5 to account for the addition of the vLLM format.

66-66: LGTM! Complete test coverage for vLLM format.

The test correctly verifies that the vLLM format is present in the supported formats map.

internal/adapter/discovery/http_client.go (1)

96-104: LGTM! vLLM correctly integrated into auto-detection.

The addition of vLLM to the auto-detection sequence is well-positioned and the updated comment accurately reflects the new detection order. The placement before OpenAI Compatible makes sense as vLLM can be detected more specifically.

test/scripts/logic/test-model-routing.sh (2)

34-34: Good default change to support cross-provider testing.

Changing the default provider to "openai" aligns well with Olla's unified API approach, allowing users to test cross-provider routing by default rather than being limited to a single provider.

95-99: Excellent examples showcasing vLLM integration.

The updated examples clearly demonstrate how to test different providers including the new vLLM support, making it easy for users to understand the available options.

internal/adapter/converter/lmstudio_converter.go (1)

4-4: Excellent refactoring to use constants.

Replacing hardcoded string literals with constants from the constants package improves maintainability and ensures consistency across the codebase. This aligns well with similar changes being made for other providers.

Also applies to: 36-36, 63-63

internal/adapter/registry/profile/parsers.go (1)

8-8: Excellent integration of vLLM parser with consistent refactoring.

The addition of vLLM parser support alongside the conversion of hardcoded strings to constants improves both functionality and maintainability. The fallback logic remains intact and the new parser follows the established pattern.

Also applies to: 21-32

readme.md (2)

10-10: Perfect badge update reflecting native vLLM support.

The change from "OpenAI Compatible" (light blue) to "Native Support" (light green) accurately reflects the enhanced vLLM integration provided by this PR.

43-49: Excellent documentation of vLLM native support.

The updated backend documentation clearly explains vLLM's native support capabilities including model unification and provides helpful usage paths. This gives users clear guidance on how to leverage the new functionality.

internal/adapter/registry/profile/vllm_parser.go (1)

10-73: Well-structured parser implementation with robust metadata handling.

The parser correctly handles empty input, validates model IDs, and conditionally populates metadata fields. The use of hasDetails flag ensures only populated details are attached to models.

test/scripts/logic/test-provider-routing.sh (1)

287-299: Comprehensive vLLM endpoint testing added.

The test coverage for vLLM endpoints is excellent, including both model listing and proxy routing for chat completions and completions endpoints.

test/scripts/streaming/test-streaming-responses.py (3)

47-52: vLLM provider correctly added to supported providers.

The integration follows the established pattern consistently across the codebase.

166-178: Well-implemented vLLM streaming test method.

The method follows the same pattern as other providers, using the correct OpenAI-compatible endpoint format that vLLM supports.

239-239: Correct categorisation of vLLM as SSE-format provider.

vLLM uses OpenAI-compatible Server-Sent Events format for streaming, so grouping it with OpenAI and LM Studio is appropriate.

test/scripts/logic/test-provider-models.sh (1)

44-44: Good clarification of vLLM enhanced metadata.

The banner update accurately reflects that vLLM provides enhanced metadata compared to standard OpenAI format.

internal/adapter/converter/vllm_converter_test.go (5)

13-97: Comprehensive conversion testing with realistic test data.

The tests cover the main conversion scenarios thoroughly, including metadata extraction, ownership determination, and permission generation. The test data uses realistic model identifiers and endpoint configurations.

99-134: Good edge case coverage for models without vLLM native names.

This test ensures the converter gracefully handles models that don't have explicit vLLM names, falling back to aliases appropriately.

136-175: Proper filtering validation ensures correct model selection.

The test verifies that model filters work correctly with the vLLM format, which is essential for provider-specific endpoints.

209-245: Thorough testing of helper method edge cases.

The determineOwner tests cover various model ID formats and ensure consistent ownership extraction logic.

247-289: Complete coverage of vLLM name resolution logic.

The findVLLMNativeName tests validate the priority order: source endpoints first, then vLLM aliases, ensuring correct model name resolution.

internal/adapter/registry/profile/vllm_parser_test.go (1)

12-463: Excellent test coverage!

The test suite is comprehensive and well-structured, covering valid responses, edge cases, error handling, and performance considerations. The tests properly validate the parser's behaviour across typical and boundary conditions.

test/scripts/logic/test-model-routing-provider.py (1)

4-4: LGTM! Consistent vLLM provider integration

The vLLM provider has been integrated consistently with the existing provider patterns. All necessary endpoints and test cases have been added appropriately.

Also applies to: 39-40, 117-117, 147-147, 233-238, 346-376, 417-417, 431-432, 435-435

config/profiles/vllm.yaml (1)

14-22: Confirm OpenAI-compat flag vs custom response format.

openai_compatible: true together with response_format: "vllm" assumes the new converter correctly rewrites native-vLLM payloads into OpenAI shape. If any endpoints bypass the converter, mismatches might surface.

Please run an integration test covering:

/v1/chat/completions – streaming & non-streaming

/v1/embeddings – batch inputs

Error responses (e.g., 429/503)

to ensure Olla’s OpenAI shim remains contract-compatible.

internal/adapter/converter/vllm_converter.go

coderabbitai · 2025-08-07T04:11:55Z

internal/adapter/converter/vllm_converter.go

+func (c *VLLMConverter) findVLLMNativeName(model *domain.UnifiedModel) string {
+	for _, endpoint := range model.SourceEndpoints {
+		// Check if this is from a vLLM endpoint based on the native name format
+		if strings.Contains(endpoint.NativeName, "/") {
+			// vLLM models typically use organisation/model-name format
+			return endpoint.NativeName
+		}
+	}
+
+	// Check aliases for vLLM source
+	for _, alias := range model.Aliases {
+		if alias.Source == constants.ProviderTypeVLLM {
+			return alias.Name
+		}
+	}
+
+	return ""
+}


🛠️ Refactor suggestion

Improve vLLM model name detection logic

The current logic assumes any model name containing "/" is a vLLM model, which is a weak heuristic that could incorrectly match other formats.

Consider checking the endpoint type or adding a more specific vLLM indicator:

func (c *VLLMConverter) findVLLMNativeName(model *domain.UnifiedModel) string { for _, endpoint := range model.SourceEndpoints { - // Check if this is from a vLLM endpoint based on the native name format - if strings.Contains(endpoint.NativeName, "/") { - // vLLM models typically use organisation/model-name format - return endpoint.NativeName + // Check if this is from a vLLM endpoint + if endpoint.Type == constants.ProviderTypeVLLM && endpoint.NativeName != "" { + return endpoint.NativeName } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (c *VLLMConverter) findVLLMNativeName(model *domain.UnifiedModel) string {

for _, endpoint := range model.SourceEndpoints {

// Check if this is from a vLLM endpoint based on the native name format

if strings.Contains(endpoint.NativeName, "/") {

// vLLM models typically use organisation/model-name format

return endpoint.NativeName

}

}

// Check aliases for vLLM source

for _, alias := range model.Aliases {

if alias.Source == constants.ProviderTypeVLLM {

return alias.Name

}

}

return ""

}

func (c *VLLMConverter) findVLLMNativeName(model *domain.UnifiedModel) string {

for _, endpoint := range model.SourceEndpoints {

// Check if this is from a vLLM endpoint

if endpoint.Type == constants.ProviderTypeVLLM && endpoint.NativeName != "" {

return endpoint.NativeName

}

}

// Check aliases for vLLM source

for _, alias := range model.Aliases {

if alias.Source == constants.ProviderTypeVLLM {

return alias.Name

}

}

return ""

}

🤖 Prompt for AI Agents

In internal/adapter/converter/vllm_converter.go around lines 121 to 138, the current logic identifies vLLM models by checking if the NativeName contains "/", which is too broad and may cause false positives. To fix this, enhance the detection by verifying the endpoint's type or adding a more specific indicator that confirms it is a vLLM source before returning the NativeName. This could involve checking a dedicated field or constant that explicitly marks the endpoint as vLLM.

coderabbitai · 2025-08-07T04:11:55Z

internal/adapter/registry/profile/vllm_parser.go

+import (
+	"fmt"
+	"time"
+
+	"github.com/thushan/olla/internal/core/domain"
+)


⚠️ Potential issue

Missing json import required for unmarshaling.

Line 18 uses json.Unmarshal() but the encoding/json package is not imported.

Add the missing import:

import ( "fmt" "time" + "encoding/json" "github.com/thushan/olla/internal/core/domain" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import (

"fmt"

"time"

"github.com/thushan/olla/internal/core/domain"

)

import (

"fmt"

"time"

"encoding/json"

"github.com/thushan/olla/internal/core/domain"

)

🤖 Prompt for AI Agents

In internal/adapter/registry/profile/vllm_parser.go around lines 3 to 8, the encoding/json package is missing from the imports, but json.Unmarshal() is used later in the file. Add "encoding/json" to the import block to enable JSON unmarshaling functionality.

coderabbitai · 2025-08-07T04:11:55Z

internal/adapter/registry/profile/vllm.go

+type VLLMModelPermission struct {
+	Group              *string `json:"group"`
+	ID                 string  `json:"id"`
+	Object             string  `json:"object"`
+	Organization       string  `json:"organization"`
+	Created            int64   `json:"created"`
+	AllowCreateEngine  bool    `json:"allow_create_engine"`
+	AllowSampling      bool    `json:"allow_sampling"`
+	AllowLogprobs      bool    `json:"allow_logprobs"`
+	AllowSearchIndices bool    `json:"allow_search_indices"`
+	AllowView          bool    `json:"allow_view"`
+	AllowFineTuning    bool    `json:"allow_fine_tuning"`
+	IsBlocking         bool    `json:"is_blocking"`
+}


🛠️ Refactor suggestion

Eliminate duplicate struct definition to improve maintainability

The VLLMModelPermission struct is duplicated in internal/adapter/converter/vllm_converter.go (lines 30-43). This violates the DRY principle and could lead to inconsistencies during maintenance.

Consider having the converter import and use the types from this package instead:

// In internal/adapter/converter/vllm_converter.go import "github.com/thushan/olla/internal/adapter/registry/profile" // Then use profile.VLLMModelPermission instead of defining it again

🤖 Prompt for AI Agents

In internal/adapter/registry/profile/vllm.go lines 22 to 35, the VLLMModelPermission struct is defined but duplicated in internal/adapter/converter/vllm_converter.go lines 30 to 43. To fix this, remove the duplicate struct definition from vllm_converter.go and import the profile package there. Then update all references to use profile.VLLMModelPermission instead of the local definition, ensuring a single source of truth for this struct.

coderabbitai · 2025-08-07T04:11:55Z

internal/integration/providers/vllm_integration_test.go

+import (
+	"context"
+	"fmt"
+	"log/slog"
+	"net/http"
+	"os"
+	"strings"
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+	"github.com/thushan/olla/internal/adapter/discovery"
+	"github.com/thushan/olla/internal/adapter/registry/profile"
+	"github.com/thushan/olla/internal/core/domain"
+	"github.com/thushan/olla/internal/logger"
+)


⚠️ Potential issue

Add missing json import

The json package is used in TestVLLMChatCompletion_Integration but not imported.

Add the missing import:

import ( "context" + "encoding/json" "fmt" "log/slog" "net/http"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import (

"context"

"fmt"

"log/slog"

"net/http"

"os"

"strings"

"testing"

"time"

"github.com/stretchr/testify/assert"

"github.com/stretchr/testify/require"

"github.com/thushan/olla/internal/adapter/discovery"

"github.com/thushan/olla/internal/adapter/registry/profile"

"github.com/thushan/olla/internal/core/domain"

"github.com/thushan/olla/internal/logger"

)

import (

"context"

"encoding/json"

"fmt"

"log/slog"

"net/http"

"os"

"strings"

"testing"

"time"

"github.com/stretchr/testify/assert"

"github.com/stretchr/testify/require"

"github.com/thushan/olla/internal/adapter/discovery"

"github.com/thushan/olla/internal/adapter/registry/profile"

"github.com/thushan/olla/internal/core/domain"

"github.com/thushan/olla/internal/logger"

)

🤖 Prompt for AI Agents

In internal/integration/providers/vllm_integration_test.go around lines 6 to 22, the json package is used in the test function but not imported. Add the missing import "encoding/json" to the import block to resolve this issue.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

internal/adapter/converter/vllm_converter_test.go (1)
13-97: Comprehensive test coverage with one minor improvement needed.

The test thoroughly validates model conversion including metadata, permissions, and edge cases. However, the inline array creation pattern on line 47 is unnecessarily complex.

Apply this diff to improve readability:
-				MaxContextLength: &[]int64{131072}[0],
+				MaxContextLength: &maxContext2,
And add the variable declaration at the top:
		maxContext := int64(2048)
+		maxContext2 := int64(131072)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8393afd and 8f7acd4.

📒 Files selected for processing (3)

internal/adapter/converter/vllm_converter.go (1 hunks)
internal/adapter/converter/vllm_converter_test.go (1 hunks)
internal/adapter/registry/profile/vllm.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

internal/adapter/registry/profile/vllm.go
internal/adapter/converter/vllm_converter.go

🧰 Additional context used

📓 Path-based instructions (2)

**/*_test.go

📄 CodeRabbit Inference Engine (CLAUDE.md)

Unit tests should test individual components in isolation.

Files:

internal/adapter/converter/vllm_converter_test.go

internal/{app,adapter}/**/*.go

📄 CodeRabbit Inference Engine (CLAUDE.md)

Endpoints should be exposed at /internal/health and /internal/status.

Files:

internal/adapter/converter/vllm_converter_test.go

🧠 Learnings (3)

📚 Learning: applies to **/*_test.go : unit tests should test individual components in isolation....

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to **/*_test.go : Unit tests should test individual components in isolation.

Applied to files:

internal/adapter/converter/vllm_converter_test.go

📚 Learning: applies to internal/adapter/proxy/*_test.go : shared proxy tests should ensure compatibility between...

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines.

Applied to files:

internal/adapter/converter/vllm_converter_test.go

📚 Learning: applies to internal/adapter/proxy/*_test.go : integration tests should test the full request flow th...

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to internal/adapter/proxy/*_test.go : Integration tests should test the full request flow through the proxy.

Applied to files:

internal/adapter/converter/vllm_converter_test.go

🔇 Additional comments (6)

internal/adapter/converter/vllm_converter_test.go (6)

3-11: Imports are appropriate and well-organised.

The imports include necessary testing dependencies and internal packages required for the converter tests.

99-134: Well-designed fallback behaviour test.

The test properly validates handling of models without vLLM native names, including default ownership assignment and nil handling for optional fields.

136-175: Filtering logic is properly tested.

The test validates that model filters work correctly with the vLLM format, ensuring only models matching the criteria are returned.

177-202: Excellent permissions validation test.

The test ensures consistent permission generation with deterministic IDs and appropriate timestamp validation using a reasonable tolerance window.

204-207: Appropriate test for getter method.

Simple and effective test for the format name retrieval functionality.

209-310: Thorough testing of helper methods.

The table-driven test for determineOwner covers various naming patterns effectively. The findVLLMNativeName tests properly validate the critical logic of only considering vLLM sources for native names, which prevents incorrect assumptions about model compatibility.

thushan added 6 commits August 7, 2025 13:42

initial vllm implementation

e4ee050

adds vllm converter for parsing fix parsing errors when no models are present in vllm change healthcheck endpoint for vllm to standard one Update tests for converter factory add vllm profile fix vllm profile typo

update converters to use constants

2ae68ee

vLLM integration test that uses OLLA_TEST_SERVER_VLLM var for test

9c27c77

update default configuration

684185a

update readme for native vllm

9fe1904

update test scripts for vllm support (mirrors existing).

8393afd

coderabbitai bot reviewed Aug 7, 2025

View reviewed changes

thushan added 2 commits August 7, 2025 14:32

avoid multiple instances of vllm responses

af9941a

findVLLMNativeName naively checks slashes, better to remove that

8f7acd4

coderabbitai bot reviewed Aug 7, 2025

View reviewed changes

thushan merged commit 4077925 into main Aug 7, 2025
3 checks passed

thushan deleted the feature/backend/vllm branch August 7, 2025 06:20

coderabbitai bot mentioned this pull request Sep 26, 2025

feat: backend/sglang #69

Merged

coderabbitai bot mentioned this pull request Oct 8, 2025

feat: backend/lemonade #70

Merged

This was referenced Feb 19, 2026

feat: backend/docker-model-runner #106

Merged

feature: backend/vllm-mlx #109

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: backend/vllm#44

feat: backend/vllm#44
thushan merged 8 commits intomainfrom
feature/backend/vllm

thushan commented Aug 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 7, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Aug 7, 2025

Uh oh!

coderabbitai bot Aug 7, 2025

Uh oh!

coderabbitai bot Aug 7, 2025

Uh oh!

coderabbitai bot Aug 7, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thushan commented Aug 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thushan commented Aug 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 7, 2025 •

edited

Loading