bug: proxy streaming is broken by thushan · Pull Request #41 · thushan/olla

thushan · 2025-08-05T10:20:58Z

This PR addresses the choppiness in using Olla with streaming text clients like OpenWebUI and degraded buffered output caused by irregular buffering.

Reported by #40 and internally as OLLA-102.

We've added comprehensive test suites to validate token streaming and preparations for a profile type to be added so an engine can run for streaming or buffering (images etc) modes.

You can find the scripts in test/scripts/streaming, opt to run individually or ./run-all-tests.sh (make sure you have phi4:latest which is our current baseline - fast and cheap on most hosts).

Summary by CodeRabbit

New Features

Introduced comprehensive streaming detection and latency test scripts for Olla, including automated test runners and question datasets.
Added detailed documentation for new and existing test suites, covering setup, usage, troubleshooting, and CI integration.

Bug Fixes

Improved Windows console Unicode support and graceful handling of user interruptions in test scripts.
Enhanced HTTP response streaming to reduce choppiness by ensuring buffered data is flushed promptly.

Documentation

Added and updated README files for test directories, detailing test categories, usage, and troubleshooting steps.

Chores

Updated ignore rules to better manage virtual environments and temporary files.

coderabbitai · 2025-08-05T10:21:04Z

Caution

Review failed

The pull request is closed.

Walkthrough

This update introduces new and revised test scripts, documentation, and ignore rules for streaming behaviour validation in the Olla project. It adds comprehensive streaming detection and latency test suites, supporting scripts, and detailed documentation. Minor code improvements include graceful handling of user interrupts and enhanced Unicode support. Some redundant or outdated files were removed.

Changes

Cohort / File(s)	Change Summary
Ignore Rules Updates `.gitignore`, `test/scripts/.gitignore`, `test/scripts/logic/.gitignore`	`.gitignore` and `test/scripts/.gitignore` updated to ignore `.venv/`, `tmp/`, `.env`, and `.env.local`. The redundant `test/scripts/logic/.gitignore` was deleted.
Middleware Enhancement `internal/app/middleware/logging.go`	Added a `Flush()` method to the `responseWriter` type, implementing the `http.Flusher` interface to enable immediate flushing of buffered HTTP data for streaming responses.
Streaming Test Infrastructure `test/scripts/streaming/run-all-tests.sh`, `test/scripts/streaming/questions.txt`	Added a Bash script to automate running all streaming tests, and a questions file with 20 prompts for latency testing.
Streaming Detection & Latency Scripts `test/scripts/streaming/test-streaming-detection.py`, `test/scripts/streaming/test-streaming-latency.py`	Introduced comprehensive Python scripts for detecting streaming vs buffering and measuring streaming latency, with robust CLI, timing analysis, model selection, and detailed output.
Streaming Responses Script Improvements `test/scripts/streaming/test-streaming-responses.py`	Enhanced Unicode output handling and graceful KeyboardInterrupt support. Main logic refactored into a dedicated function for better structure and signal handling.
Streaming Documentation `test/scripts/streaming/README.md`	New README detailing the streaming test suite, script purposes, configuration, troubleshooting, expected behaviours, and CI/CD integration.
General Test Scripts Documentation `test/scripts/README.md`	New README providing setup instructions, script categories, usage patterns, conventions, troubleshooting, and Windows compatibility notes for all test scripts.
Logic Test Documentation Update `test/scripts/logic/README.md`	References to the removed `test-streaming-responses.py` script were deleted. Installation and test running instructions updated accordingly.
Logic Test Script Improvement `test/scripts/logic/test-model-routing-provider.py`	Added KeyboardInterrupt handling for graceful exit with status code 130 and a coloured message on user interruption.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant TestScript
    participant OllaServer

    User->>TestScript: Run streaming detection/latency script
    TestScript->>OllaServer: Health check (/internal/health)
    OllaServer-->>TestScript: Health status
    TestScript->>OllaServer: Fetch models (/olla/models)
    OllaServer-->>TestScript: Model list
    User->>TestScript: (Optional) Select model
    loop For each test question
        TestScript->>OllaServer: Send streaming request (chat/image)
        OllaServer-->>TestScript: Stream response chunks
        TestScript->>TestScript: Record chunk timings, analyse streaming/buffering
    end
    TestScript->>User: Output results, metrics, and streaming classification

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

bug: proxy streaming is broken #41: Contains identical changes to .gitignore and the addition of the Flush() method on responseWriter in internal/app/middleware/logging.go, indicating the same or directly related fixes addressing streaming flush behavior.

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d379795 and de31335.

📒 Files selected for processing (1)

test/scripts/streaming/run-all-tests.sh (1 hunks)

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bug/proxy-streaming

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🔭 Outside diff range comments (1)

test/scripts/streaming/test-streaming-responses.py (1)

396-455: Remove duplicate main() function

The original main() function (lines 396-455) should be removed since its logic has been moved to run_main(). Currently, there's duplicate code and the original main() has incorrect self references that will cause runtime errors.

-def main():
-    parser = argparse.ArgumentParser(description='Test Olla streaming responses')
-    parser.add_argument('--url', default=TARGET_URL, help='Olla base URL')
-    parser.add_argument('--max-time', type=int, default=DEFAULT_MAX_STREAM_TIME,
-                        help='Maximum streaming time per test (seconds)')
-    parser.add_argument('--models', nargs='+', help='Specific models to test')
-    parser.add_argument('--providers', nargs='+', choices=['openai', 'ollama', 'lmstudio'],
-                        help='Providers to test (default: all)')
-    parser.add_argument('--sample', action='store_true',
-                        help='Test only a few models per provider')
-    parser.add_argument('--analyze', action='store_true',
-                        help='Show detailed streaming pattern analysis')
-    
-    args = parser.parse_args()
-    
-    providers = args.providers or ['openai', 'ollama', 'lmstudio']
-    
-    tester = StreamingTester(args.url, args.max_time)
-    tester.print_header()
-    
-    if not tester.check_health():
-        sys.exit(1)
-        
-    print()
-    
-    if not tester.fetch_models():
-        self.print_color(RED, "No models found!")
-        sys.exit(1)
-        
-    print()
-    self.print_color(WHITE, f"Configuration:")
-    print(f"  Max streaming time: {CYAN}{args.max_time}s{RESET}")
-    print(f"  Max tokens:         {CYAN}{DEFAULT_MAX_TOKENS}{RESET}")
-    print(f"  Providers:          {CYAN}{', '.join(providers)}{RESET}")
-    
-    # Determine which models to test
-    if args.models:
-        models_to_test = args.models
-    elif args.sample:
-        # Sample 2-3 models per provider
-        models_to_test = []
-        for provider in providers:
-            provider_models = tester.provider_models.get(provider, [])
-            models_to_test.extend(provider_models[:3])
-        models_to_test = list(set(models_to_test))  # Remove duplicates
-    else:
-        models_to_test = tester.all_models
-        
-    self.print_color(WHITE, f"\nTesting {len(models_to_test)} models for streaming capability...")
-    
-    # Test each model
-    for model in models_to_test:
-        tester.test_model_streaming(model, providers, sample_only=args.sample)
-        
-    # Show analysis if requested
-    if args.analyze and tester.test_results:
-        tester.analyze_streaming_patterns()
-        
-    tester.print_summary()

♻️ Duplicate comments (1)

test/scripts/logic/README.md (1)

234-235: Same path change as above

This line repeats the updated path; see previous comment for the verification step.
No further action needed here.

🧹 Nitpick comments (12)

test/scripts/streaming/test-streaming-responses.py (2)
14-14: Remove unused imports

The static analysis correctly identifies unused imports that should be removed.
-from typing import Dict, List, Tuple, Optional, Any
+from typing import Dict, List, Optional, Any
 from collections import defaultdict
-from datetime import datetime
Also applies to: 16-16

456-464: Simplify the MainRunner class

The MainRunner class with a single static method is unnecessary complexity. The print_color function can be defined directly in run_main().
 def run_main():
     """Main execution function wrapped for interrupt handling"""
-    # Fix 'self' references in main()
-    class MainRunner:
-        @staticmethod
-        def print_color(color: str, message: str):
-            print(f"{color}{message}{RESET}")
-            
-    runner = MainRunner()
+    def print_color(color: str, message: str):
+        print(f"{color}{message}{RESET}")
Then update the references from runner.print_color to just print_color in lines 491, 495, 513.
test/scripts/streaming/run-all-tests.sh (1)
23-27: Consider making dependency installation optional

The script automatically installs dependencies which might not be desired in all environments (e.g., production, CI/CD with pre-installed deps).

Consider adding a flag to control this behaviour:
+# Skip dependency installation if NO_INSTALL is set
+if [ -z "$NO_INSTALL" ]; then
 # Install dependencies if needed
 if ! python -c "import requests" 2>/dev/null; then
     echo "Installing dependencies..."
     pip install -r ../requirements.txt
     echo
 fi
+fi
test/scripts/README.md (3)
31-31: Fix repeated word in headers

Remove the duplicate word "Tests" in the section headers.
-### `/streaming` - Streaming Behavior Tests
-Tests for validating Olla's streaming vs buffering behavior for LLM responses.
+### `/streaming` - Streaming Behavior Tests
+For validating Olla's streaming vs buffering behavior for LLM responses.

-### `/logic` - Model Routing & Header Tests
-Tests for Olla's request routing logic and header propagation.
+### `/logic` - Model Routing & Header Tests
+For Olla's request routing logic and header propagation.
Also applies to: 37-37

58-58: Format bare URL properly

Convert the bare URL to a proper markdown link.
-- `--url` - Olla base URL (default: http://localhost:40114)
+- `--url` - Olla base URL (default: `http://localhost:40114`)
49-49: Add hyphens to compound adjectives

Add hyphens to compound adjectives as suggested by the linter.
-### `/security` - Security & Rate Limiting Tests
+### `/security` - Security & Rate-Limiting Tests
 Security-focused test scripts.
-- Rate limiting validation
+- Rate-limiting validation
Also applies to: 51-51
test/scripts/streaming/README.md (1)
128-129: Use en dash for number ranges

Replace hyphens with en dashes for number ranges.
-- 1-2 chunks: Definitely buffered
-- 3-10 chunks: Possibly buffered or very short response
+- 1–2 chunks: Definitely buffered
+- 3–10 chunks: Possibly buffered or very short response
test/scripts/streaming/test-streaming-latency.py (4)
15-17: Remove unused imports

Remove imports that are not used in the code.
-from typing import Dict, List, Tuple, Optional, Any
-from collections import deque
-from datetime import datetime
+from typing import Dict, List, Optional, Any
68-69: Remove unused variable assignment

The exception variable e is assigned but never used.
-        except Exception as e:
+        except Exception:
             pass
229-229: Remove unnecessary f-string prefixes

These strings don't contain any placeholders, so the f prefix is unnecessary.

For example, line 229:
-            self.print_color(YELLOW, f"    Streaming: ", end='')
+            self.print_color(YELLOW, "    Streaming: ", end='')
Apply similar changes to lines 338, 350, 360, 367, 426, 440, and 444.

Also applies to: 338-338, 350-350, 360-360, 367-367, 426-426, 440-440, 444-444

303-306: Combine if branches using logical OR

Simplify the conditional logic by combining the branches.
-                    elif avg_interval < 0.05:  # 10-50ms is typical for good streaming
-                        result['streaming_quality'] = 'smooth'
-                    elif avg_interval < 0.2:  # 50-200ms is acceptable streaming
-                        result['streaming_quality'] = 'smooth'
+                    elif avg_interval < 0.2:  # <200ms is good streaming
+                        result['streaming_quality'] = 'smooth'
test/scripts/streaming/test-streaming-detection.py (1)
356-356: Remove unnecessary f-string prefix.

The f-string doesn't contain any placeholders.
-                self.print_color(GREEN, f"  Mode: BUFFERED (as expected with stream:false)")
+                self.print_color(GREEN, "  Mode: BUFFERED (as expected with stream:false)")

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07f1eb1 and d379795.

📒 Files selected for processing (13)

.gitignore (1 hunks)
internal/app/middleware/logging.go (1 hunks)
test/scripts/.gitignore (1 hunks)
test/scripts/README.md (1 hunks)
test/scripts/logic/.gitignore (0 hunks)
test/scripts/logic/README.md (1 hunks)
test/scripts/logic/test-model-routing-provider.py (1 hunks)
test/scripts/streaming/README.md (1 hunks)
test/scripts/streaming/questions.txt (1 hunks)
test/scripts/streaming/run-all-tests.sh (1 hunks)
test/scripts/streaming/test-streaming-detection.py (1 hunks)
test/scripts/streaming/test-streaming-latency.py (1 hunks)
test/scripts/streaming/test-streaming-responses.py (4 hunks)

💤 Files with no reviewable changes (1)

test/scripts/logic/.gitignore

🧰 Additional context used

📓 Path-based instructions (1)

internal/{app,adapter}/**/*.go

📄 CodeRabbit Inference Engine (CLAUDE.md)

Endpoints should be exposed at /internal/health and /internal/status.

Files:

internal/app/middleware/logging.go

🧠 Learnings (3)

📚 Learning: applies to test/scripts/security/*.sh : security tests should validate rate limiting and size restri...

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to test/scripts/security/*.sh : Security tests should validate rate limiting and size restrictions (see `/test/scripts/security/`).

Applied to files:

test/scripts/streaming/run-all-tests.sh
test/scripts/logic/README.md
test/scripts/streaming/README.md
test/scripts/README.md

📚 Learning: applies to test/scripts/logic/test-model-routing.sh : test routing and headers using `/test/scripts/...

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to test/scripts/logic/test-model-routing.sh : Test routing and headers using `/test/scripts/logic/test-model-routing.sh`.

Applied to files:

test/scripts/logic/README.md
test/scripts/logic/test-model-routing-provider.py

📚 Learning: applies to internal/adapter/proxy/*_test.go : integration tests should test the full request flow th...

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-27T12:59:29.787Z
Learning: Applies to internal/adapter/proxy/*_test.go : Integration tests should test the full request flow through the proxy.

Applied to files:

test/scripts/logic/README.md

🪛 Shellcheck (0.10.0)

test/scripts/streaming/run-all-tests.sh

[error] 1-1: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 2-2: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 3-3: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 4-4: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 5-5: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 6-6: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 7-7: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 8-8: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 9-9: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 10-10: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 11-11: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 12-12: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 13-13: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 14-14: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 15-15: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 16-16: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 17-17: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 18-18: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 19-19: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 20-20: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 21-21: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 22-22: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 23-23: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 24-24: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 25-25: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 26-26: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 27-27: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 28-28: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 29-29: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 30-30: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 31-31: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 32-32: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 33-33: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 34-34: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 35-35: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 36-36: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 37-37: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 38-38: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 39-39: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 40-40: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 41-41: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 42-42: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 43-43: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 44-44: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 45-45: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 46-46: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 47-47: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 48-48: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 49-49: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 50-50: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 51-51: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 52-52: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 53-53: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 54-54: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 55-55: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 56-56: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 57-57: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 58-58: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 59-59: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 60-60: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

🪛 Ruff (0.12.2)

test/scripts/streaming/test-streaming-responses.py

14-14: typing.Tuple imported but unused

Remove unused import: typing.Tuple

(F401)

16-16: datetime.datetime imported but unused

Remove unused import: datetime.datetime

(F401)

test/scripts/streaming/test-streaming-latency.py

15-15: typing.Tuple imported but unused

Remove unused import: typing.Tuple

(F401)

16-16: collections.deque imported but unused

Remove unused import: collections.deque

(F401)

17-17: datetime.datetime imported but unused

Remove unused import: datetime.datetime

(F401)

68-68: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

229-229: f-string without any placeholders

Remove extraneous f prefix

(F541)

303-306: Combine if branches using logical or operator

Combine if branches

(SIM114)

338-338: f-string without any placeholders

Remove extraneous f prefix

(F541)

350-350: f-string without any placeholders

Remove extraneous f prefix

(F541)

360-360: f-string without any placeholders

Remove extraneous f prefix

(F541)

367-367: f-string without any placeholders

Remove extraneous f prefix

(F541)

426-426: f-string without any placeholders

Remove extraneous f prefix

(F541)

440-440: f-string without any placeholders

Remove extraneous f prefix

(F541)

444-444: f-string without any placeholders

Remove extraneous f prefix

(F541)

test/scripts/streaming/test-streaming-detection.py

15-15: datetime.datetime imported but unused

Remove unused import: datetime.datetime

(F401)

63-63: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

356-356: f-string without any placeholders

Remove extraneous f prefix

(F541)

526-526: Loop control variable test_name not used within loop body

(B007)

🪛 LanguageTool

test/scripts/streaming/README.md

[typographical] ~128-~128: If specifying a range, consider using an en dash instead of a hyphen.
Context: ...g between chunks

Chunk Count:

1-2 chunks: Definitely buffered
3-10 chu...

(HYPHEN_TO_EN)

[typographical] ~129-~129: If specifying a range, consider using an en dash instead of a hyphen.
Context: ...:

1-2 chunks: Definitely buffered
3-10 chunks: Possibly buffered or very short...

(HYPHEN_TO_EN)

test/scripts/README.md

[duplication] ~31-~31: Possible typo: you repeated a word.
Context: ...

`/streaming` - Streaming Behavior Tests

Tests for validating Olla's streaming vs buff...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~37-~37: Possible typo: you repeated a word.
Context: ...

`/logic` - Model Routing & Header Tests

Tests for Olla's request routing logic and he...

(ENGLISH_WORD_REPEAT_RULE)

[uncategorized] ~49-~49: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...itoring

`/security` - Security & Rate Limiting Tests

Security-focused test scripts.
...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

[uncategorized] ~51-~51: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ests
Security-focused test scripts.

Rate limiting validation
Request size limits
Au...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

[misspelling] ~87-~87: This word is normally spelled as one.
Context: ...Python is using UTF-8 encoding (scripts auto-configure this)

Connection Errors

If scri...

(EN_COMPOUNDS_AUTO_CONFIGURE)

🪛 markdownlint-cli2 (0.17.2)

test/scripts/README.md

58-58: Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (15)

test/scripts/logic/README.md (1)

223-224: Path verification complete – requirements.txt exists
The file test/scripts/requirements.txt is present, so the ../requirements.txt reference in test/scripts/logic/README.md is correct. No further changes required.

.gitignore (1)

21-21: LGTM!

Standard addition for excluding Python virtual environments from version control.

test/scripts/.gitignore (1)

1-4: LGTM!

Appropriate .gitignore entries for test scripts directory, excluding virtual environments, temporary files, and environment configuration files.

test/scripts/logic/test-model-routing-provider.py (1)

421-425: LGTM!

Excellent addition of graceful keyboard interrupt handling with appropriate exit code 130 and user-friendly messaging. This enhances the user experience for long-running tests.

internal/app/middleware/logging.go (1)

41-50: LGTM!

Excellent implementation of the http.Flusher interface to address the streaming choppiness issue (OLLA-102). The type assertion pattern is correct Go idiom, and the detailed comment clearly explains the purpose. This is a critical fix for ensuring buffered data is sent immediately in streaming responses.

test/scripts/streaming/questions.txt (1)

1-23: LGTM!

Comprehensive and well-structured test dataset with diverse questions covering various topics and complexity levels. The clear formatting with comment lines and varied question types will provide excellent coverage for streaming latency testing scenarios.

test/scripts/streaming/test-streaming-responses.py (1)

526-531: Good practice for handling interrupts

The graceful handling of KeyboardInterrupt with proper exit code (130) is excellent for CI/CD integration.

test/scripts/streaming/README.md (1)

70-75: Excellent documentation of quality classifications

The quality classification categories are well-defined and will help users understand test results effectively.

test/scripts/streaming/test-streaming-latency.py (2)

465-469: Excellent interrupt handling

Good practice using the standard exit code 130 for SIGINT, consistent with other test scripts.

196-319: Well-structured streaming analysis logic

The streaming quality classification based on chunk intervals is well thought out and provides meaningful insights into streaming behaviour.

test/scripts/streaming/test-streaming-detection.py (5)

90-136: Well-implemented model selection logic.

The method provides good user experience with automatic selection of phi4:latest, filtering of embedding models, and graceful handling of user interrupts.

137-245: Comprehensive streaming test implementation.

The method effectively tests both OpenAI and Ollama endpoint formats, provides visual feedback, and accurately collects timing data for analysis.

246-326: Robust streaming detection heuristics.

The analysis methods use well-thought-out criteria to distinguish between streaming and buffering modes, accounting for model loading times and chunk patterns.

472-599: Excellent comprehensive test suite and analysis.

The implementation provides thorough testing of various scenarios, clear result analysis with color-coded output, and helpful configuration guidance for users.

674-679: Proper keyboard interrupt handling.

Good practice using the standard Unix exit code 130 for SIGINT termination.

test/scripts/streaming/run-all-tests.sh

coderabbitai · 2025-08-05T10:32:49Z

test/scripts/streaming/run-all-tests.sh

+# Test 2: Streaming Latency (3 questions)
+echo "2. Running streaming latency test..."
+python test-streaming-latency.py --url "$TARGET_URL" --count 3 || {
+    echo "❌ Streaming latency test failed"
+    exit 1
+}
+echo


🛠️ Refactor suggestion

Fix inconsistency between comment and command

The comment says "3 questions" but the command uses --count 3 instead of --questions parameter as shown in the summary.

Based on the AI summary mentioning --questions parameter pointing to a questions file, consider:

-# Test 2: Streaming Latency (3 questions) +# Test 2: Streaming Latency echo "2. Running streaming latency test..." -python test-streaming-latency.py --url "$TARGET_URL" --count 3 || { +python test-streaming-latency.py --url "$TARGET_URL" --questions questions.txt || { echo "❌ Streaming latency test failed" exit 1 }

Or update the comment to match the command:

-# Test 2: Streaming Latency (3 questions) +# Test 2: Streaming Latency (3 tests)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Test 2: Streaming Latency (3 questions)

echo "2. Running streaming latency test..."

python test-streaming-latency.py --url "$TARGET_URL" --count 3 || {

echo "❌ Streaming latency test failed"

exit 1

}

echo

# Test 2: Streaming Latency (3 tests)

echo "2. Running streaming latency test..."

python test-streaming-latency.py --url "$TARGET_URL" --count 3 || {

echo "❌ Streaming latency test failed"

exit 1

}

echo

🧰 Tools

🪛 Shellcheck (0.10.0)

[error] 43-43: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 44-44: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 45-45: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 46-46: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 47-47: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 48-48: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

[error] 49-49: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

🤖 Prompt for AI Agents

In test/scripts/streaming/run-all-tests.sh around lines 43 to 49, the comment mentions "3 questions" but the python command uses the parameter --count 3 instead of --questions. To fix this inconsistency, either update the comment to reflect that the test runs with a count of 3 or modify the command to use the --questions parameter with the appropriate questions file as indicated in the summary. Choose one approach to ensure the comment and command align correctly.

coderabbitai · 2025-08-05T10:32:50Z

test/scripts/streaming/test-streaming-detection.py

+import requests
+import os
+from typing import Dict, List, Tuple, Optional, Any
+from datetime import datetime


⚠️ Potential issue

Remove unused import.

The datetime module is imported but never used in the code.

-from datetime import datetime

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from datetime import datetime

🧰 Tools

🪛 Ruff (0.12.2)

15-15: datetime.datetime imported but unused

Remove unused import: datetime.datetime

(F401)

🤖 Prompt for AI Agents

In test/scripts/streaming/test-streaming-detection.py at line 15, the datetime module is imported but not used anywhere in the code. Remove the line importing datetime to clean up unused imports and improve code clarity.

coderabbitai · 2025-08-05T10:32:50Z

test/scripts/streaming/test-streaming-detection.py

+        except Exception as e:
+            pass


⚠️ Potential issue

Remove unused exception variable.

The exception variable e is caught but never used.

- except Exception as e: + except Exception:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except Exception as e:

pass

except Exception:

pass

🧰 Tools

🪛 Ruff (0.12.2)

63-63: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🤖 Prompt for AI Agents

In test/scripts/streaming/test-streaming-detection.py around lines 63 to 64, the exception variable 'e' is caught but never used in the except block. Remove the variable 'e' from the except clause to avoid unused variable warnings by changing 'except Exception as e:' to 'except Exception:'.

thushan added 2 commits August 5, 2025 20:13

test scripts and infra for streaming repro

d086582

we need to flush the reponsewriter properly in order to push tokens

d379795

coderabbitai bot reviewed Aug 5, 2025

View reviewed changes

line endings?

de31335

thushan merged commit 26ffd12 into main Aug 5, 2025
2 of 3 checks passed

thushan deleted the bug/proxy-streaming branch August 5, 2025 10:38

coderabbitai bot mentioned this pull request Feb 20, 2026

docs: vllm-mlx #110

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: proxy streaming is broken#41

bug: proxy streaming is broken#41
thushan merged 3 commits intomainfrom
bug/proxy-streaming

thushan commented Aug 5, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 5, 2025 •

edited

Loading

Review failed

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

`/streaming` - Streaming Behavior Tests

`/logic` - Model Routing & Header Tests

`/security` - Security & Rate Limiting Tests

Connection Errors

Uh oh!

Uh oh!

coderabbitai bot Aug 5, 2025

Uh oh!

coderabbitai bot Aug 5, 2025

Uh oh!

coderabbitai bot Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thushan commented Aug 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

/streaming - Streaming Behavior Tests

/logic - Model Routing & Header Tests

/security - Security & Rate Limiting Tests

Connection Errors

Uh oh!

Uh oh!

coderabbitai bot Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thushan commented Aug 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 5, 2025 •

edited

Loading

`/streaming` - Streaming Behavior Tests

`/logic` - Model Routing & Header Tests

`/security` - Security & Rate Limiting Tests