Conversation
adds vllm converter for parsing fix parsing errors when no models are present in vllm change healthcheck endpoint for vllm to standard one Update tests for converter factory add vllm profile fix vllm profile typo
WalkthroughThis change introduces native vLLM support throughout the codebase. It adds a vLLM converter, parser, and profile, updates configuration and documentation, and extends discovery, routing, and test scripts to handle vLLM as a first-class provider. New integration and unit tests ensure correct vLLM behaviour and compatibility. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant OLLA
participant vLLM Server
Client->>OLLA: GET /olla/vllm/v1/models
OLLA->>vLLM Server: GET /v1/models
vLLM Server-->>OLLA: vLLM models response (native format)
OLLA-->>Client: vLLM models response (native format)
Client->>OLLA: POST /olla/vllm/v1/chat/completions
OLLA->>vLLM Server: POST /v1/chat/completions
vLLM Server-->>OLLA: Chat completion response
OLLA-->>Client: Chat completion response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (7)
readme.md (1)
43-49: Fix markdown list style consistency.The static analysis tool has flagged inconsistent list formatting. The project uses dashes for unordered lists, but these lines use asterisks.
Apply this diff to maintain consistent markdown formatting:
-* [Ollama](https://github.com/ollama/ollama) - native support for Ollama, including model unification. \ +- [Ollama](https://github.com/ollama/ollama) - native support for Ollama, including model unification. \ Use: `/olla/ollama/` -* [LM Studio](https://lmstudio.ai/) - native support for LMStudio, including model unification. \ +- [LM Studio](https://lmstudio.ai/) - native support for LMStudio, including model unification. \ Use: `/olla/lmstudio/` || `/olla/lm-studio/` || `/olla/lm_studio/` -* [vLLM](https://github.com/vllm-project/vllm) - native support for vllm, including model unification. \ +- [vLLM](https://github.com/vllm-project/vllm) - native support for vllm, including model unification. \ Use: `/olla/vllm/` \ Models from vLLM will be available under `/olla/models` and `/olla/vllm/v1/models` -* [OpenAI](https://platform.openai.com/docs/overview) - You can use OpenAI API that provides a unified query API across all providers. \ +- [OpenAI](https://platform.openai.com/docs/overview) - You can use OpenAI API that provides a unified query API across all providers. \ Use: `/olla/openai/`test/scripts/logic/test-provider-routing.sh (1)
202-210: Address shellcheck warnings about variable declarations.The static analysis correctly identifies that declaring and assigning variables in one line can mask return values from curl commands.
Apply this diff to address the warnings:
- local vllm_response=$(curl -s "${TARGET_URL}/olla/vllm/v1/models" 2>/dev/null || echo '{}') - local vllm_count=$(echo "$vllm_response" | grep -o '"id"' | wc -l) - - local openai_response=$(curl -s "${TARGET_URL}/olla/openai/v1/models" 2>/dev/null || echo '{}') - local openai_count=$(echo "$openai_response" | grep -o '"id"' | wc -l) + local vllm_response + vllm_response=$(curl -s "${TARGET_URL}/olla/vllm/v1/models" 2>/dev/null || echo '{}') + local vllm_count + vllm_count=$(echo "$vllm_response" | grep -o '"id"' | wc -l) + + local openai_response + openai_response=$(curl -s "${TARGET_URL}/olla/openai/v1/models" 2>/dev/null || echo '{}') + local openai_count + openai_count=$(echo "$openai_response" | grep -o '"id"' | wc -l)test/scripts/logic/test-provider-models.sh (1)
222-232: Excellent vLLM-specific metadata validation.The test correctly validates both the standard OpenAI format structure and vLLM-specific fields like
max_model_len. This ensures the enhanced metadata is properly exposed.Address the shellcheck warning about variable declaration:
- local model_count=$(echo "$response_body" | grep -o '"id"' | wc -l) + local model_count + model_count=$(echo "$response_body" | grep -o '"id"' | wc -l)internal/integration/providers/vllm_integration_test.go (1)
148-151: Improve error handling for EOF conditionsThe error handling for EOF could be more idiomatic and clearer.
Use
io.ReadAllfor simpler and more robust reading:- // Read response body properly - buf := make([]byte, 10240) - n, err := resp.Body.Read(buf) - if err != nil && err.Error() != "EOF" && err.Error() != "unexpected EOF" { - require.NoError(t, err) - } + // Read response body + body, err := io.ReadAll(resp.Body) + require.NoError(t, err) - models, err := vllmProfile.ParseModelsResponse(buf[:n]) + models, err := vllmProfile.ParseModelsResponse(body)Don't forget to add
"io"to the imports.config/profiles/vllm.yaml (3)
126-148: Remove trailing spaces & ensure POSIX newline.
yamllintflags trailing whitespace on Lines 126, 133 & 144 and a missing newline at EOF (Line 148). Keeping the file lint-clean avoids noisy CI failures.- - patterns: ["*1b*", "*1.1b*", "*1.5b*"]␠ + - patterns: ["*1b*", "*1.1b*", "*1.5b*"] @@ - requires_gpu: true␠ + requires_gpu: true @@ - max_concurrent: 100␠ + max_concurrent: 100 + # (EOF now ends with a single newline)
135-143: Verify matching precedence inconcurrency_limits.
concurrency_limitsis evaluated top-to-bottom by the registry parser. Because thresholds are listed from highest (min_memory_gb: 100) to lowest (0), a lookup using>=may accidentally match multiple entries, with only the first honoured. Double-check the parsing logic and, if it keeps the last match, reverse the order for clarity.concurrency_limits: - min_memory_gb: 0 # smaller models max_concurrent: 100 - min_memory_gb: 20 # 13B+ models max_concurrent: 50 - min_memory_gb: 50 # 30B+ models max_concurrent: 20 - min_memory_gb: 100 # 70B+ models max_concurrent: 10
36-38: Consider adding common alternative ports todefault_ports.Many self-hosted vLLM demos run on
8080or8001; adding them improves auto-discovery hit-rate without harm.default_ports: - 8000 + - 8001 + - 8080
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
config/config.yaml(1 hunks)config/profiles/vllm.yaml(4 hunks)internal/adapter/converter/factory.go(1 hunks)internal/adapter/converter/factory_test.go(2 hunks)internal/adapter/converter/lmstudio_converter.go(3 hunks)internal/adapter/converter/ollama_converter.go(3 hunks)internal/adapter/converter/vllm_converter.go(1 hunks)internal/adapter/converter/vllm_converter_test.go(1 hunks)internal/adapter/discovery/http_client.go(1 hunks)internal/adapter/registry/profile/parsers.go(2 hunks)internal/adapter/registry/profile/vllm.go(1 hunks)internal/adapter/registry/profile/vllm_parser.go(1 hunks)internal/adapter/registry/profile/vllm_parser_test.go(1 hunks)internal/core/domain/profile.go(1 hunks)internal/integration/providers/vllm_integration_test.go(1 hunks)readme.md(2 hunks)test/scripts/logic/test-model-routing-provider.py(8 hunks)test/scripts/logic/test-model-routing.sh(2 hunks)test/scripts/logic/test-provider-models.sh(3 hunks)test/scripts/logic/test-provider-routing.sh(3 hunks)test/scripts/streaming/test-streaming-responses.py(9 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*_test.go
📄 CodeRabbit Inference Engine (CLAUDE.md)
Unit tests should test individual components in isolation.
Files:
internal/adapter/converter/factory_test.gointernal/adapter/converter/vllm_converter_test.gointernal/integration/providers/vllm_integration_test.gointernal/adapter/registry/profile/vllm_parser_test.go
internal/{app,adapter}/**/*.go
📄 CodeRabbit Inference Engine (CLAUDE.md)
Endpoints should be exposed at
/internal/healthand/internal/status.
Files:
internal/adapter/converter/factory_test.gointernal/adapter/converter/lmstudio_converter.gointernal/adapter/converter/ollama_converter.gointernal/adapter/discovery/http_client.gointernal/adapter/registry/profile/parsers.gointernal/adapter/converter/factory.gointernal/adapter/converter/vllm_converter_test.gointernal/adapter/converter/vllm_converter.gointernal/adapter/registry/profile/vllm.gointernal/adapter/registry/profile/vllm_parser.gointernal/adapter/registry/profile/vllm_parser_test.go
test/scripts/logic/test-model-routing.sh
📄 CodeRabbit Inference Engine (CLAUDE.md)
Test routing and headers using
/test/scripts/logic/test-model-routing.sh.
Files:
test/scripts/logic/test-model-routing.sh
🧠 Learnings (7)
📚 Learning: applies to internal/adapter/proxy/*_test.go : shared proxy tests should ensure compatibility between...
Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines.
Applied to files:
internal/adapter/converter/factory_test.gointernal/adapter/converter/vllm_converter_test.gotest/scripts/logic/test-provider-routing.shinternal/integration/providers/vllm_integration_test.gointernal/adapter/registry/profile/vllm_parser_test.go
📚 Learning: applies to {proxy_sherpa.go,proxy_olla.go} : proxy implementations should be in `proxy_sherpa.go` an...
Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to {proxy_sherpa.go,proxy_olla.go} : Proxy implementations should be in `proxy_sherpa.go` and `proxy_olla.go`.
Applied to files:
internal/adapter/converter/ollama_converter.go
📚 Learning: applies to internal/adapter/proxy/*.go : expose the following response headers: `x-olla-endpoint`, `...
Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.788Z
Learning: Applies to internal/adapter/proxy/*.go : Expose the following response headers: `X-Olla-Endpoint`, `X-Olla-Model`, `X-Olla-Backend-Type`, `X-Olla-Request-ID`, `X-Olla-Response-Time`.
Applied to files:
internal/adapter/converter/ollama_converter.gointernal/adapter/registry/profile/parsers.gointernal/adapter/converter/vllm_converter.gointernal/adapter/registry/profile/vllm.gointernal/integration/providers/vllm_integration_test.goconfig/profiles/vllm.yaml
📚 Learning: applies to test/scripts/logic/test-model-routing.sh : test routing and headers using `/test/scripts/...
Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to test/scripts/logic/test-model-routing.sh : Test routing and headers using `/test/scripts/logic/test-model-routing.sh`.
Applied to files:
test/scripts/logic/test-model-routing.shtest/scripts/logic/test-provider-models.shtest/scripts/logic/test-provider-routing.shtest/scripts/logic/test-model-routing-provider.py
📚 Learning: applies to **/*_test.go : unit tests should test individual components in isolation....
Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to **/*_test.go : Unit tests should test individual components in isolation.
Applied to files:
internal/adapter/converter/vllm_converter_test.gointernal/integration/providers/vllm_integration_test.gointernal/adapter/registry/profile/vllm_parser_test.go
📚 Learning: applies to internal/adapter/proxy/*_test.go : integration tests should test the full request flow th...
Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to internal/adapter/proxy/*_test.go : Integration tests should test the full request flow through the proxy.
Applied to files:
internal/adapter/converter/vllm_converter_test.gotest/scripts/logic/test-provider-routing.shinternal/integration/providers/vllm_integration_test.gointernal/adapter/registry/profile/vllm_parser_test.go
📚 Learning: applies to internal/adapter/proxy/*_test.go : benchmark tests should measure performance of critical...
Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to internal/adapter/proxy/*_test.go : Benchmark tests should measure performance of critical paths, proxy engine comparisons, connection pooling efficiency, and circuit breaker behavior.
Applied to files:
internal/integration/providers/vllm_integration_test.gointernal/adapter/registry/profile/vllm_parser_test.go
🧬 Code Graph Analysis (7)
internal/adapter/converter/lmstudio_converter.go (1)
internal/core/constants/providers.go (1)
ProviderPrefixLMStudio1(17-17)
internal/adapter/converter/ollama_converter.go (1)
internal/core/constants/providers.go (1)
ProviderTypeOllama(4-4)
internal/adapter/discovery/http_client.go (1)
internal/core/domain/profile.go (3)
ProfileOllama(4-4)ProfileLmStudio(5-5)ProfileVLLM(6-6)
internal/adapter/registry/profile/parsers.go (1)
internal/core/constants/providers.go (4)
ProviderTypeOllama(4-4)ProviderPrefixLMStudio1(17-17)ProviderTypeVLLM(8-8)ProviderTypeOpenAI(6-6)
internal/adapter/converter/factory.go (1)
internal/adapter/converter/vllm_converter.go (1)
NewVLLMConverter(50-52)
internal/adapter/converter/vllm_converter.go (4)
internal/adapter/registry/profile/vllm.go (1)
VLLMModelPermission(22-35)internal/core/ports/model_converter.go (2)
ModelResponseConverter(10-15)ModelFilters(18-23)internal/core/constants/providers.go (1)
ProviderTypeVLLM(8-8)internal/core/domain/unified_model.go (1)
UnifiedModel(15-31)
internal/adapter/registry/profile/vllm.go (1)
internal/adapter/converter/vllm_converter.go (1)
VLLMModelPermission(31-44)
🪛 markdownlint-cli2 (0.17.2)
readme.md
43-43: Unordered list style
Expected: dash; Actual: asterisk
(MD004, ul-style)
45-45: Unordered list style
Expected: dash; Actual: asterisk
(MD004, ul-style)
47-47: Unordered list style
Expected: dash; Actual: asterisk
(MD004, ul-style)
🪛 Shellcheck (0.10.0)
test/scripts/logic/test-provider-models.sh
[warning] 226-226: Declare and assign separately to avoid masking return values.
(SC2155)
test/scripts/logic/test-provider-routing.sh
[warning] 203-203: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 204-204: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 208-208: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 209-209: Declare and assign separately to avoid masking return values.
(SC2155)
🪛 YAMLlint (1.37.1)
config/profiles/vllm.yaml
[error] 126-126: trailing spaces
(trailing-spaces)
[error] 133-133: trailing spaces
(trailing-spaces)
[error] 144-144: trailing spaces
(trailing-spaces)
[error] 148-148: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (30)
internal/core/domain/profile.go (1)
6-6: LGTM! Consistent constant definition.The addition of
ProfileVLLMfollows the established pattern and provides a centralized definition for the vLLM provider identifier.config/config.yaml (1)
64-71: Verify vLLM-specific health check endpoint and timing values.The configuration uses
/healthas the health check URL and longer intervals (5s check, 2s timeout) compared to other providers. Please verify these values are appropriate for vLLM's health check behaviour.internal/adapter/converter/factory.go (1)
25-25: LGTM! Proper converter registration.The VLLM converter is correctly registered following the established pattern used by other converters.
internal/adapter/converter/ollama_converter.go (3)
6-6: LGTM! Good practice to centralise constants.Adding the constants import enables the use of centralised provider type definitions instead of magic strings.
42-42: LGTM! Eliminates magic string.Replacing hardcoded "ollama" with
constants.ProviderTypeOllamaimproves maintainability and consistency.
69-69: LGTM! Consistent constant usage.Using the same constant for alias source checking maintains consistency with the GetFormatName method.
internal/adapter/converter/factory_test.go (3)
50-50: LGTM! Test correctly validates vLLM format in error messages.The test appropriately checks that "vllm" is included in the supported formats list within error messages.
55-55: LGTM! Correct format count update.The expected count is properly updated to 5 to account for the addition of the vLLM format.
66-66: LGTM! Complete test coverage for vLLM format.The test correctly verifies that the vLLM format is present in the supported formats map.
internal/adapter/discovery/http_client.go (1)
96-104: LGTM! vLLM correctly integrated into auto-detection.The addition of vLLM to the auto-detection sequence is well-positioned and the updated comment accurately reflects the new detection order. The placement before OpenAI Compatible makes sense as vLLM can be detected more specifically.
test/scripts/logic/test-model-routing.sh (2)
34-34: Good default change to support cross-provider testing.Changing the default provider to "openai" aligns well with Olla's unified API approach, allowing users to test cross-provider routing by default rather than being limited to a single provider.
95-99: Excellent examples showcasing vLLM integration.The updated examples clearly demonstrate how to test different providers including the new vLLM support, making it easy for users to understand the available options.
internal/adapter/converter/lmstudio_converter.go (1)
4-4: Excellent refactoring to use constants.Replacing hardcoded string literals with constants from the
constantspackage improves maintainability and ensures consistency across the codebase. This aligns well with similar changes being made for other providers.Also applies to: 36-36, 63-63
internal/adapter/registry/profile/parsers.go (1)
8-8: Excellent integration of vLLM parser with consistent refactoring.The addition of vLLM parser support alongside the conversion of hardcoded strings to constants improves both functionality and maintainability. The fallback logic remains intact and the new parser follows the established pattern.
Also applies to: 21-32
readme.md (2)
10-10: Perfect badge update reflecting native vLLM support.The change from "OpenAI Compatible" (light blue) to "Native Support" (light green) accurately reflects the enhanced vLLM integration provided by this PR.
43-49: Excellent documentation of vLLM native support.The updated backend documentation clearly explains vLLM's native support capabilities including model unification and provides helpful usage paths. This gives users clear guidance on how to leverage the new functionality.
internal/adapter/registry/profile/vllm_parser.go (1)
10-73: Well-structured parser implementation with robust metadata handling.The parser correctly handles empty input, validates model IDs, and conditionally populates metadata fields. The use of
hasDetailsflag ensures only populated details are attached to models.test/scripts/logic/test-provider-routing.sh (1)
287-299: Comprehensive vLLM endpoint testing added.The test coverage for vLLM endpoints is excellent, including both model listing and proxy routing for chat completions and completions endpoints.
test/scripts/streaming/test-streaming-responses.py (3)
47-52: vLLM provider correctly added to supported providers.The integration follows the established pattern consistently across the codebase.
166-178: Well-implemented vLLM streaming test method.The method follows the same pattern as other providers, using the correct OpenAI-compatible endpoint format that vLLM supports.
239-239: Correct categorisation of vLLM as SSE-format provider.vLLM uses OpenAI-compatible Server-Sent Events format for streaming, so grouping it with OpenAI and LM Studio is appropriate.
test/scripts/logic/test-provider-models.sh (1)
44-44: Good clarification of vLLM enhanced metadata.The banner update accurately reflects that vLLM provides enhanced metadata compared to standard OpenAI format.
internal/adapter/converter/vllm_converter_test.go (5)
13-97: Comprehensive conversion testing with realistic test data.The tests cover the main conversion scenarios thoroughly, including metadata extraction, ownership determination, and permission generation. The test data uses realistic model identifiers and endpoint configurations.
99-134: Good edge case coverage for models without vLLM native names.This test ensures the converter gracefully handles models that don't have explicit vLLM names, falling back to aliases appropriately.
136-175: Proper filtering validation ensures correct model selection.The test verifies that model filters work correctly with the vLLM format, which is essential for provider-specific endpoints.
209-245: Thorough testing of helper method edge cases.The
determineOwnertests cover various model ID formats and ensure consistent ownership extraction logic.
247-289: Complete coverage of vLLM name resolution logic.The
findVLLMNativeNametests validate the priority order: source endpoints first, then vLLM aliases, ensuring correct model name resolution.internal/adapter/registry/profile/vllm_parser_test.go (1)
12-463: Excellent test coverage!The test suite is comprehensive and well-structured, covering valid responses, edge cases, error handling, and performance considerations. The tests properly validate the parser's behaviour across typical and boundary conditions.
test/scripts/logic/test-model-routing-provider.py (1)
4-4: LGTM! Consistent vLLM provider integrationThe vLLM provider has been integrated consistently with the existing provider patterns. All necessary endpoints and test cases have been added appropriately.
Also applies to: 39-40, 117-117, 147-147, 233-238, 346-376, 417-417, 431-432, 435-435
config/profiles/vllm.yaml (1)
14-22: Confirm OpenAI-compat flag vs custom response format.
openai_compatible: truetogether withresponse_format: "vllm"assumes the new converter correctly rewrites native-vLLM payloads into OpenAI shape. If any endpoints bypass the converter, mismatches might surface.Please run an integration test covering:
/v1/chat/completions– streaming & non-streaming/v1/embeddings– batch inputs- Error responses (e.g., 429/503)
to ensure Olla’s OpenAI shim remains contract-compatible.
| func (c *VLLMConverter) findVLLMNativeName(model *domain.UnifiedModel) string { | ||
| for _, endpoint := range model.SourceEndpoints { | ||
| // Check if this is from a vLLM endpoint based on the native name format | ||
| if strings.Contains(endpoint.NativeName, "/") { | ||
| // vLLM models typically use organisation/model-name format | ||
| return endpoint.NativeName | ||
| } | ||
| } | ||
|
|
||
| // Check aliases for vLLM source | ||
| for _, alias := range model.Aliases { | ||
| if alias.Source == constants.ProviderTypeVLLM { | ||
| return alias.Name | ||
| } | ||
| } | ||
|
|
||
| return "" | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Improve vLLM model name detection logic
The current logic assumes any model name containing "/" is a vLLM model, which is a weak heuristic that could incorrectly match other formats.
Consider checking the endpoint type or adding a more specific vLLM indicator:
func (c *VLLMConverter) findVLLMNativeName(model *domain.UnifiedModel) string {
for _, endpoint := range model.SourceEndpoints {
- // Check if this is from a vLLM endpoint based on the native name format
- if strings.Contains(endpoint.NativeName, "/") {
- // vLLM models typically use organisation/model-name format
- return endpoint.NativeName
+ // Check if this is from a vLLM endpoint
+ if endpoint.Type == constants.ProviderTypeVLLM && endpoint.NativeName != "" {
+ return endpoint.NativeName
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (c *VLLMConverter) findVLLMNativeName(model *domain.UnifiedModel) string { | |
| for _, endpoint := range model.SourceEndpoints { | |
| // Check if this is from a vLLM endpoint based on the native name format | |
| if strings.Contains(endpoint.NativeName, "/") { | |
| // vLLM models typically use organisation/model-name format | |
| return endpoint.NativeName | |
| } | |
| } | |
| // Check aliases for vLLM source | |
| for _, alias := range model.Aliases { | |
| if alias.Source == constants.ProviderTypeVLLM { | |
| return alias.Name | |
| } | |
| } | |
| return "" | |
| } | |
| func (c *VLLMConverter) findVLLMNativeName(model *domain.UnifiedModel) string { | |
| for _, endpoint := range model.SourceEndpoints { | |
| // Check if this is from a vLLM endpoint | |
| if endpoint.Type == constants.ProviderTypeVLLM && endpoint.NativeName != "" { | |
| return endpoint.NativeName | |
| } | |
| } | |
| // Check aliases for vLLM source | |
| for _, alias := range model.Aliases { | |
| if alias.Source == constants.ProviderTypeVLLM { | |
| return alias.Name | |
| } | |
| } | |
| return "" | |
| } |
🤖 Prompt for AI Agents
In internal/adapter/converter/vllm_converter.go around lines 121 to 138, the
current logic identifies vLLM models by checking if the NativeName contains "/",
which is too broad and may cause false positives. To fix this, enhance the
detection by verifying the endpoint's type or adding a more specific indicator
that confirms it is a vLLM source before returning the NativeName. This could
involve checking a dedicated field or constant that explicitly marks the
endpoint as vLLM.
| import ( | ||
| "fmt" | ||
| "time" | ||
|
|
||
| "github.com/thushan/olla/internal/core/domain" | ||
| ) |
There was a problem hiding this comment.
Missing json import required for unmarshaling.
Line 18 uses json.Unmarshal() but the encoding/json package is not imported.
Add the missing import:
import (
"fmt"
"time"
+ "encoding/json"
"github.com/thushan/olla/internal/core/domain"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import ( | |
| "fmt" | |
| "time" | |
| "github.com/thushan/olla/internal/core/domain" | |
| ) | |
| import ( | |
| "fmt" | |
| "time" | |
| "encoding/json" | |
| "github.com/thushan/olla/internal/core/domain" | |
| ) |
🤖 Prompt for AI Agents
In internal/adapter/registry/profile/vllm_parser.go around lines 3 to 8, the
encoding/json package is missing from the imports, but json.Unmarshal() is used
later in the file. Add "encoding/json" to the import block to enable JSON
unmarshaling functionality.
| type VLLMModelPermission struct { | ||
| Group *string `json:"group"` | ||
| ID string `json:"id"` | ||
| Object string `json:"object"` | ||
| Organization string `json:"organization"` | ||
| Created int64 `json:"created"` | ||
| AllowCreateEngine bool `json:"allow_create_engine"` | ||
| AllowSampling bool `json:"allow_sampling"` | ||
| AllowLogprobs bool `json:"allow_logprobs"` | ||
| AllowSearchIndices bool `json:"allow_search_indices"` | ||
| AllowView bool `json:"allow_view"` | ||
| AllowFineTuning bool `json:"allow_fine_tuning"` | ||
| IsBlocking bool `json:"is_blocking"` | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Eliminate duplicate struct definition to improve maintainability
The VLLMModelPermission struct is duplicated in internal/adapter/converter/vllm_converter.go (lines 30-43). This violates the DRY principle and could lead to inconsistencies during maintenance.
Consider having the converter import and use the types from this package instead:
// In internal/adapter/converter/vllm_converter.go
import "github.com/thushan/olla/internal/adapter/registry/profile"
// Then use profile.VLLMModelPermission instead of defining it again🤖 Prompt for AI Agents
In internal/adapter/registry/profile/vllm.go lines 22 to 35, the
VLLMModelPermission struct is defined but duplicated in
internal/adapter/converter/vllm_converter.go lines 30 to 43. To fix this, remove
the duplicate struct definition from vllm_converter.go and import the profile
package there. Then update all references to use profile.VLLMModelPermission
instead of the local definition, ensuring a single source of truth for this
struct.
| import ( | ||
| "context" | ||
| "fmt" | ||
| "log/slog" | ||
| "net/http" | ||
| "os" | ||
| "strings" | ||
| "testing" | ||
| "time" | ||
|
|
||
| "github.com/stretchr/testify/assert" | ||
| "github.com/stretchr/testify/require" | ||
| "github.com/thushan/olla/internal/adapter/discovery" | ||
| "github.com/thushan/olla/internal/adapter/registry/profile" | ||
| "github.com/thushan/olla/internal/core/domain" | ||
| "github.com/thushan/olla/internal/logger" | ||
| ) |
There was a problem hiding this comment.
Add missing json import
The json package is used in TestVLLMChatCompletion_Integration but not imported.
Add the missing import:
import (
"context"
+ "encoding/json"
"fmt"
"log/slog"
"net/http"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import ( | |
| "context" | |
| "fmt" | |
| "log/slog" | |
| "net/http" | |
| "os" | |
| "strings" | |
| "testing" | |
| "time" | |
| "github.com/stretchr/testify/assert" | |
| "github.com/stretchr/testify/require" | |
| "github.com/thushan/olla/internal/adapter/discovery" | |
| "github.com/thushan/olla/internal/adapter/registry/profile" | |
| "github.com/thushan/olla/internal/core/domain" | |
| "github.com/thushan/olla/internal/logger" | |
| ) | |
| import ( | |
| "context" | |
| "encoding/json" | |
| "fmt" | |
| "log/slog" | |
| "net/http" | |
| "os" | |
| "strings" | |
| "testing" | |
| "time" | |
| "github.com/stretchr/testify/assert" | |
| "github.com/stretchr/testify/require" | |
| "github.com/thushan/olla/internal/adapter/discovery" | |
| "github.com/thushan/olla/internal/adapter/registry/profile" | |
| "github.com/thushan/olla/internal/core/domain" | |
| "github.com/thushan/olla/internal/logger" | |
| ) |
🤖 Prompt for AI Agents
In internal/integration/providers/vllm_integration_test.go around lines 6 to 22,
the json package is used in the test function but not imported. Add the missing
import "encoding/json" to the import block to resolve this issue.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
internal/adapter/converter/vllm_converter_test.go (1)
13-97: Comprehensive test coverage with one minor improvement needed.The test thoroughly validates model conversion including metadata, permissions, and edge cases. However, the inline array creation pattern on line 47 is unnecessarily complex.
Apply this diff to improve readability:
- MaxContextLength: &[]int64{131072}[0], + MaxContextLength: &maxContext2,And add the variable declaration at the top:
maxContext := int64(2048) + maxContext2 := int64(131072)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
internal/adapter/converter/vllm_converter.go(1 hunks)internal/adapter/converter/vllm_converter_test.go(1 hunks)internal/adapter/registry/profile/vllm.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/adapter/registry/profile/vllm.go
- internal/adapter/converter/vllm_converter.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*_test.go
📄 CodeRabbit Inference Engine (CLAUDE.md)
Unit tests should test individual components in isolation.
Files:
internal/adapter/converter/vllm_converter_test.go
internal/{app,adapter}/**/*.go
📄 CodeRabbit Inference Engine (CLAUDE.md)
Endpoints should be exposed at
/internal/healthand/internal/status.
Files:
internal/adapter/converter/vllm_converter_test.go
🧠 Learnings (3)
📚 Learning: applies to **/*_test.go : unit tests should test individual components in isolation....
Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to **/*_test.go : Unit tests should test individual components in isolation.
Applied to files:
internal/adapter/converter/vllm_converter_test.go
📚 Learning: applies to internal/adapter/proxy/*_test.go : shared proxy tests should ensure compatibility between...
Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines.
Applied to files:
internal/adapter/converter/vllm_converter_test.go
📚 Learning: applies to internal/adapter/proxy/*_test.go : integration tests should test the full request flow th...
Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to internal/adapter/proxy/*_test.go : Integration tests should test the full request flow through the proxy.
Applied to files:
internal/adapter/converter/vllm_converter_test.go
🔇 Additional comments (6)
internal/adapter/converter/vllm_converter_test.go (6)
3-11: Imports are appropriate and well-organised.The imports include necessary testing dependencies and internal packages required for the converter tests.
99-134: Well-designed fallback behaviour test.The test properly validates handling of models without vLLM native names, including default ownership assignment and nil handling for optional fields.
136-175: Filtering logic is properly tested.The test validates that model filters work correctly with the vLLM format, ensuring only models matching the criteria are returned.
177-202: Excellent permissions validation test.The test ensures consistent permission generation with deterministic IDs and appropriate timestamp validation using a reasonable tolerance window.
204-207: Appropriate test for getter method.Simple and effective test for the format name retrieval functionality.
209-310: Thorough testing of helper methods.The table-driven test for
determineOwnercovers various naming patterns effectively. ThefindVLLMNativeNametests properly validate the critical logic of only considering vLLM sources for native names, which prevents incorrect assumptions about model compatibility.
This PR implements the backend provider type for vllm.
You can configure a vLLM endpoint like this in the configuration (also in the default
config.yamlas an example):Features supported:
TODO:
tinyllamashutdown, server comes back up withllama4, Olla thinks it's still okay to have / route totinyllama).Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores