Describe the bug
When a custom fallback provider returns a non-streaming JSON response to a stream=True request, the OpenAI SDK's streaming parser receives zero chunks. This causes:
content_parts stays empty → full_content = "".join([]) or None = None
- Response is flagged as "empty" → retry loop → fallback cascade
- The provider's valid response is silently discarded
This affects any custom provider that doesn't implement SSE streaming (e.g., lightweight proxies, self-hosted endpoints, Vertex AI REST API).
To reproduce
- Configure a custom fallback provider that returns valid JSON but not SSE:
fallback_providers:
- provider: custom
model: my-model
base_url: http://my-proxy:8080/v1
- Primary provider hits rate limit → Hermes falls back to custom provider
- Custom provider returns valid
{"choices": [...]} JSON
- Hermes logs:
⚠️ Empty response from model — retrying (1/3)
- After 3 retries: cascades to next fallback or gives up
Root cause
run_agent.py line ~8089: _use_streaming = True is unconditional — there's no per-provider or per-fallback streaming toggle. The comment says "Always prefer the streaming path" for health-monitoring benefits, but this assumption breaks custom providers.
When client.chat.completions.create(stream=True) receives a JSON response instead of SSE, the SDK's Stream iterator yields zero chunks. The streaming response builder at line ~5040 produces full_content = None with no tool calls → flagged as invalid.
Expected behavior
Either:
- (A) Add a per-provider config flag to disable streaming:
fallback_providers: [{provider: custom, model: x, base_url: y, stream: false}]
- (B) Detect non-SSE responses in the streaming path and fall back to non-streaming parsing
- (C) Document that custom providers MUST support SSE streaming
Workaround
We built a lightweight proxy (~200 lines Python) that translates OpenAI streaming requests to Vertex AI's native streamGenerateContent?alt=sse endpoint and converts the chunks back to OpenAI chat.completion.chunk format. Happy to contribute this as a reference implementation or built-in adapter.
Environment
- Hermes version: 0.8.0
- Provider: custom (Vertex AI via proxy)
- Platform: Docker (gateway mode)
- OS: macOS (Apple Silicon)
Describe the bug
When a custom fallback provider returns a non-streaming JSON response to a
stream=Truerequest, the OpenAI SDK's streaming parser receives zero chunks. This causes:content_partsstays empty →full_content = "".join([]) or None=NoneThis affects any custom provider that doesn't implement SSE streaming (e.g., lightweight proxies, self-hosted endpoints, Vertex AI REST API).
To reproduce
{"choices": [...]}JSON⚠️ Empty response from model — retrying (1/3)Root cause
run_agent.pyline ~8089:_use_streaming = Trueis unconditional — there's no per-provider or per-fallback streaming toggle. The comment says "Always prefer the streaming path" for health-monitoring benefits, but this assumption breaks custom providers.When
client.chat.completions.create(stream=True)receives a JSON response instead of SSE, the SDK'sStreamiterator yields zero chunks. The streaming response builder at line ~5040 producesfull_content = Nonewith no tool calls → flagged as invalid.Expected behavior
Either:
fallback_providers: [{provider: custom, model: x, base_url: y, stream: false}]Workaround
We built a lightweight proxy (~200 lines Python) that translates OpenAI streaming requests to Vertex AI's native
streamGenerateContent?alt=sseendpoint and converts the chunks back to OpenAIchat.completion.chunkformat. Happy to contribute this as a reference implementation or built-in adapter.Environment