docs: comparisons by thushan · Pull Request #53 · thushan/olla

thushan · 2025-08-16T00:43:48Z

Documentation update for compariing Olla to common LLM infra, tools & backends.

Summary by CodeRabbit

Documentation
- Added a new "Compare" section with guides comparing Olla to alternatives (GPU orchestration, API translators, local servers), including diagrams, YAML examples and integration patterns.
- Expanded FAQ with tooling comparisons, deployment guidance and best practices.
- Updated landing and usage docs with cross-links, architecture guidance and real‑world scenarios.
- Improved navigation and README content to highlight integration patterns and common architectures.

coderabbitai · 2025-08-16T00:44:00Z

Caution

Review failed

The pull request is closed.

Walkthrough

Adds a new “Compare” documentation section with five comparison/integration pages, extends README, FAQ, index and usage docs with cross-links and guidance about integrating Olla with LiteLLM, GPUStack, LocalAI, etc., and updates MkDocs navigation. All changes are documentation-only; no code or public API changes.

Changes

Cohort / File(s)	Summary
New Compare pages `docs/content/compare/overview.md`, `docs/content/compare/integration-patterns.md`, `docs/content/compare/gpustack.md`, `docs/content/compare/litellm.md`, `docs/content/compare/localai.md`	Adds a new Compare section with detailed side-by-side comparisons, integration patterns, YAML examples, architecture diagrams, use-cases, migration patterns, performance notes and FAQs for Olla vs other tools.
Top-level docs & README `readme.md`, `docs/content/index.md`	Adds "How Olla Fits in Your Stack" guidance, common architectures, expanded FAQ entries, and brief positioning statements linking to compare docs.
FAQ and usage `docs/content/faq.md`, `docs/content/usage.md`	Expands FAQ with comparisons (LiteLLM, GPUStack, LocalAI), adds guidance on when to use Olla with other tools, and inserts cross-links to new integration-patterns and compare pages; minor formatting edits.
Docs navigation `docs/mkdocs.yml`	Adds a "Compare" navigation section and registers the five new compare pages.

Sequence Diagram(s)

(omitted — changes are documentation-only and do not modify control flow)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix: documentation update #38 — Related documentation changes for service discovery/configuration schema (fields like type, model_url, health_check) that may overlap with examples in the new compare/integration docs.
feat: docs #48 — Also updates mkdocs navigation and restructures documentation; likely overlaps with adding/organising the Compare section.

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7b7c96e and bdb66d7.

📒 Files selected for processing (1)

readme.md (5 hunks)

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch docs/compare

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (34)

docs/content/usage.md (1)
30-32: Align unordered list style to asterisk to satisfy markdownlint (MD004).

Earlier lists in this doc use asterisks; these three use dashes, which triggers lint. Consistent style also improves readability.
- - **Model Experimentation**: Easy switching between Ollama, LM Studio and OpenAI backends  
- - **Resource Management**: Automatic failover when local resources are busy
- - **Cost Optimisation**: Priority routing (local first, cloud fallback via [LiteLLM](compare/litellm.md))
+ * **Model Experimentation**: Easy switching between Ollama, LM Studio and OpenAI backends  
+ * **Resource Management**: Automatic failover when local resources are busy
+ * **Cost Optimisation**: Priority routing (local first, cloud fallback via [LiteLLM](compare/litellm.md))
docs/content/compare/localai.md (6)
33-33: Add a language identifier to the fenced code block (MD040).

Specifying a language silences lint and improves rendering. For ASCII diagrams, use “text”.
-```
+```text
118-118: Add a language identifier to the fenced code block (MD040).

Same rationale as above; use “text” for ASCII diagrams.
-```
+```text
67-67: Remove trailing punctuation in headings (MD026).

Colons at the end of headings trigger lint and don’t add value.
-### Use Olla When:
+### Use Olla When
75-75: Remove trailing punctuation in headings (MD026).
-### Use LocalAI When:
+### Use LocalAI When
107-107: Remove trailing punctuation in headings (MD026).
-### Benefits:
+### Benefits
196-199: Qualify latency claims to avoid overprecision and environment-dependence.

Hard numbers can be misleading across environments. Suggest softening to “typical/approximate” to reflect variability.
-- **Direct to LocalAI**: Baseline
-- **Through Olla**: +2ms routing overhead
-- **Benefit**: Faster failover than timeout/retry
+- **Direct to LocalAI**: Baseline
+- **Through Olla**: ~2 ms routing overhead (typical; varies by environment)
+- **Benefit**: Faster failover than client-side timeout/retry in most deployments
If you have recent benchmarks, consider linking them here for transparency.
docs/content/compare/overview.md (3)
57-57: Add language identifiers to fenced code blocks (MD040).

Use “text” for ASCII diagrams to satisfy lint and improve rendering.
-```
+```text
68-68: Add language identifiers to fenced code blocks (MD040).
-```
+```text
79-79: Add language identifiers to fenced code blocks (MD040).
-```
+```text
docs/content/compare/litellm.md (6)
33-33: Add language identifier to fenced code block (MD040).

ASCII architecture diagrams should use “text” for lint compliance.
-```
+```text
41-41: Add language identifier to fenced code block (MD040).
-```
+```text
116-116: Add language identifier to fenced code block (MD040).
-```
+```text
74-74: Remove trailing punctuation in headings (MD026).
-### Use Olla When:
+### Use Olla When
82-82: Remove trailing punctuation in headings (MD026).
-### Use LiteLLM When:
+### Use LiteLLM When
156-161: Qualify latency figures to reflect environmental variance.

Numbers like “<2ms” and “10–50ms” vary by hardware/network. Softening the language avoids misleading readers.
-- **Olla alone**: <2ms overhead
-- **LiteLLM alone**: 10-50ms overhead
-- **Olla + LiteLLM**: ~12-52ms total overhead
+- **Olla alone**: typically <2 ms overhead
+- **LiteLLM alone**: typically 10–50 ms overhead
+- **Olla + LiteLLM**: ~12–52 ms total overhead (typical)
If you have internal measurements, link or footnote them for credibility.
docs/content/compare/gpustack.md (7)
33-33: Add language identifier to fenced code block (MD040).

Use “text” for the stack-position ASCII diagram.
-```
+```text
74-74: Remove trailing punctuation in headings (MD026).
-### Use Olla When:
+### Use Olla When
82-82: Remove trailing punctuation in headings (MD026).
-### Use GPUStack When:
+### Use GPUStack When
87-87: Fix minor verb agreement.

Reads more naturally as singular in this bullet list.
-- Require automatic model distribution
+- Requires automatic model distribution
136-136: Add language identifier to fenced code block (MD040).

Use “text” for the ASCII diagram.
-```
+```text
153-153: Add language identifier to fenced code block (MD040).

Use “text” for the ASCII diagram.
-```
+```text
163-163: Add language identifier to fenced code block (MD040).

Use “text” for the ASCII diagram.
-```
+```text
docs/content/compare/integration-patterns.md (7)
9-9: Tighten phrasing (“missing article”)

Minor copy edit for readability in en‑AU: add an article before “robust LLM infrastructure”.
-This guide shows how to combine Olla with other tools to build robust LLM infrastructure for different use cases.
+This guide shows how to combine Olla with other tools to build a robust LLM infrastructure for different use cases.
17-24: Add a language to fenced block to satisfy markdownlint (MD040)

These ASCII diagrams are fenced without a language. Mark as text to fix linting and improve rendering.
-```
+```text
 Before:  Apps → LLM Endpoints (single point of failure)

 After:   Apps → Olla → LLM Endpoints (automatic failover)
                    ├── Primary endpoint
                    ├── Secondary endpoint
                    └── Tertiary endpoint
---

`50-55`: **Add a language to fenced block to satisfy markdownlint (MD040)**

Same issue here; use “text”.


```diff
-```
+```text
          Olla
            ├── Local GPU (priority 1)
            ├── [LiteLLM](./litellm.md) → Cloud APIs (priority 10)
            └── [LocalAI](./localai.md)/[Ollama](https://github.com/ollama/ollama) (priority 2)
---

`85-89`: **Add a language to fenced block to satisfy markdownlint (MD040)**

Marking this as “text” keeps the monospace diagram formatting while passing linting.


```diff
-```
+```text
 Team A Apps → Olla Config A → Team A Resources
 Team B Apps → Olla Config B → Shared Resources + Team B
 Production  → Olla Config C → Production Pool
---

`103-108`: **Add a language to fenced block to satisfy markdownlint (MD040)**

Mark these region diagrams as “text”.


```diff
-```
+```text
      Global Olla
          ├── Sydney [GPUStack](./gpustack.md) (for ANZ users)
          ├── Singapore [LocalAI](./localai.md) (for APAC users)
          └── US [vLLM](https://github.com/vllm-project/vllm) (for Americas users)
---

`328-345`: **Use the GHCR image and pin to a version for reproducibility**

The Quick Start elsewhere uses GHCR. Align the compose example and pin to a version tag to avoid “latest” drift.


```diff
   olla:
-    image: thushan/olla:latest
+    image: ghcr.io/thushan/olla:vX.Y.Z
     ports:
       - "8080:8080"
     volumes:
       - ./config.yaml:/config.yaml
@@
   litellm:
-    image: ghcr.io/berriai/litellm:latest
+    image: ghcr.io/berriai/litellm:vA.B.C
If you’d prefer to keep “latest”, consider at least documenting the tested versions above the block.

352-362: Healthcheck command may require curl inside the container

Many minimal images don’t include curl. Consider a CMD-SHELL form with wget or switching to a small helper sidecar.
   healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8080/internal/health"]
+      test: ["CMD-SHELL", "wget -qO- http://localhost:8080/internal/health >/dev/null 2>&1 || exit 1"]
       interval: 10s
docs/mkdocs.yml (1)
119-123: Fix indentation for nested nav items and standardise casing

YAML lint flagged indent; children under “Compare” should be further indented. Also use “GPUStack” casing consistently with the rest of the docs.
-      - Compare:
-        - Overview: compare/overview.md
-        - Patterns: compare/integration-patterns.md
-        - vs GpuStack: compare/gpustack.md
-        - vs LiteLLM: compare/litellm.md
-        - vs LocalAI: compare/localai.md
+      - Compare:
+          - Overview: compare/overview.md
+          - Patterns: compare/integration-patterns.md
+          - vs GPUStack: compare/gpustack.md
+          - vs LiteLLM: compare/litellm.md
+          - vs LocalAI: compare/localai.md
docs/content/index.md (1)
27-28: Neutral, clearer phrasing for positioning (en‑AU)

Minor copy tweak to avoid “Unlike … like” phrasing and keep the emphasis on reliability.
-Unlike API gateways like [LiteLLM](compare/litellm.md) or orchestration platforms like [GPUStack](compare/gpustack.md), Olla focuses on making your existing LLM infrastructure reliable through intelligent routing and failover.
+Compared to API gateways such as [LiteLLM](compare/litellm.md) or orchestration platforms such as [GPUStack](compare/gpustack.md), Olla focuses on improving the reliability of your existing LLM infrastructure through intelligent routing and failover.
docs/content/faq.md (2)
194-209: Clarify API key expectations when proxying “OpenAI-compatible”

Great example. Consider a brief note that if an upstream (e.g. LiteLLM or LocalAI) enforces auth, an API key may still be required even when talking to Olla.
 Yes, Olla provides OpenAI-compatible endpoints (similar to [LocalAI](compare/localai.md)):
@@
 response = client.chat.completions.create(
     model="llama3.2",
     messages=[{"role": "user", "content": "Hello"}]
 )
+
+Note: When Olla proxies to a backend that enforces authentication (e.g. a secured LiteLLM or LocalAI instance), you’ll still need to provide a valid API key expected by that upstream.
210-213: Reinforce “works together” with a direct link to patterns

Nice positioning. Add a pointer to concrete integration patterns.
-[LiteLLM](compare/litellm.md) is an API translation layer for cloud providers, while Olla is an infrastructure proxy for self-hosted endpoints. They solve different problems and work well together - LiteLLM for cloud APIs, Olla for local infrastructure reliability.
+[LiteLLM](compare/litellm.md) is an API translation layer for cloud providers, while Olla is an infrastructure proxy for self-hosted endpoints. They solve different problems and work well together — LiteLLM for cloud APIs, Olla for local infrastructure reliability. See [integration patterns](compare/integration-patterns.md#tool-specific-integrations) for concrete setups.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1908f7b and 7b7c96e.

📒 Files selected for processing (9)

docs/content/compare/gpustack.md (1 hunks)
docs/content/compare/integration-patterns.md (1 hunks)
docs/content/compare/litellm.md (1 hunks)
docs/content/compare/localai.md (1 hunks)
docs/content/compare/overview.md (1 hunks)
docs/content/faq.md (5 hunks)
docs/content/index.md (2 hunks)
docs/content/usage.md (2 hunks)
docs/mkdocs.yml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{go,md}

📄 CodeRabbit Inference Engine (CLAUDE.md)

Use Australian English for comments and documentation, and write comments explaining why rather than what

Files:

docs/content/compare/localai.md
docs/content/compare/gpustack.md
docs/content/compare/overview.md
docs/content/compare/litellm.md
docs/content/index.md
docs/content/faq.md
docs/content/usage.md
docs/content/compare/integration-patterns.md

🪛 YAMLlint (1.37.1)

docs/mkdocs.yml

[warning] 119-119: wrong indentation: expected 10 but found 8

(indentation)

🪛 markdownlint-cli2 (0.17.2)

docs/content/compare/localai.md

33-33: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

67-67: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

75-75: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

107-107: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

118-118: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/content/compare/gpustack.md

32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

40-40: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

74-74: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

82-82: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

116-116: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/content/compare/overview.md

57-57: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

68-68: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

79-79: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/content/compare/litellm.md

32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

40-40: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

74-74: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

82-82: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

116-116: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/content/usage.md

30-30: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

31-31: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

32-32: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

docs/content/compare/integration-patterns.md

32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

40-40: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

74-74: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

82-82: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

116-116: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 LanguageTool

docs/content/compare/gpustack.md

[grammar] ~87-~87: Possible verb agreement error. Did you mean “requires”? (Some collective nouns can be treated as both singular and plural, so ‘Require’ is not always incorrect.)
Context: ...stration - Managing a cluster of GPUs - Require automatic model distribution - Need GPU...

(COLLECTIVE_NOUN_VERB_AGREEMENT_VBP)

[uncategorized] ~123-~123: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...rastructure** - Deploys models based on demand - Handles GPU allocation - Manage...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

docs/content/compare/litellm.md

[grammar] ~54-~54: A determiner may be missing.
Context: ...c fallbacks | | Round-robin | ✅ | ❌ | | Least connections | ✅ | ❌ | | Circuit breaker...

(THE_SUPERLATIVE)

docs/content/compare/integration-patterns.md

[uncategorized] ~9-~9: Possible missing article found.
Context: ... combine Olla with other tools to build robust LLM infrastructure for different use ca...

(AI_HYDRA_LEO_MISSING_A)

🔇 Additional comments (9)

docs/content/usage.md (1)

98-99: Nice cross-linking to related compare pages.

Clear guidance to pair Olla with LiteLLM and GPUStack, and a pointer to integration patterns. This helps users choose the right tooling mix.

docs/content/compare/gpustack.md (1)

122-132: Good, clear “Better Together” articulation.

Explains the complementary roles succinctly and gives readers a practical mental model of where each tool fits.

docs/content/compare/integration-patterns.md (2)

217-227: Good call-out on circuit breakers and engine selection

Clear guidance that circuit breakers require the Olla engine, plus sensible default health intervals. No changes needed.

389-391: Conclusion reads well and matches the “integrator” positioning

Nicely reinforces Olla’s role alongside other tools. No changes needed.

docs/mkdocs.yml (1)

118-118: Great addition: Compare section under Home

Adding a Compare hub improves discoverability of the new docs set.

docs/content/index.md (1)

108-109: Useful cross-links to the new comparison guides

Good addition that helps readers discover the Compare section from the landing page.

docs/content/faq.md (3)

15-16: Nice ecosystem cross-link

Good to surface the comparison overview early in the FAQ.

239-242: Crisp distinction between deployment vs routing

Clear guidance that Olla doesn’t deploy models and complements GPUStack. No changes needed.

346-354: Actionable best‑practice guidance; reads well

Concise, prescriptive recommendations with links to deeper docs. Looks good.

Comparison docs for Olla from becky & wilson

7b7c96e

thushan added the documentation Improvements or additions to documentation label Aug 16, 2025

readme refresher

bdb66d7

thushan merged commit 5d2915b into main Aug 16, 2025
1 check passed

thushan deleted the docs/compare branch August 16, 2025 00:48

coderabbitai bot reviewed Aug 16, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Aug 21, 2025

feat: backend/litellm #56

Merged

coderabbitai bot mentioned this pull request Oct 22, 2025

prepare: v0.0.20 #78

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: comparisons#53

docs: comparisons#53
thushan merged 2 commits intomainfrom
docs/compare

thushan commented Aug 16, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 16, 2025 •

edited

Loading

Review failed

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thushan commented Aug 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thushan commented Aug 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 16, 2025 •

edited

Loading