Custom fallback providers fail silently when they don't support SSE streaming

**Describe the bug**

When a custom fallback provider returns a non-streaming JSON response to a `stream=True` request, the OpenAI SDK's streaming parser receives zero chunks. This causes:
- `content_parts` stays empty → `full_content = "".join([]) or None` = `None`
- Response is flagged as "empty" → retry loop → fallback cascade
- The provider's valid response is silently discarded

This affects any custom provider that doesn't implement SSE streaming (e.g., lightweight proxies, self-hosted endpoints, Vertex AI REST API).

**To reproduce**

1. Configure a custom fallback provider that returns valid JSON but not SSE:
```yaml
fallback_providers:
  - provider: custom
    model: my-model
    base_url: http://my-proxy:8080/v1
```

2. Primary provider hits rate limit → Hermes falls back to custom provider
3. Custom provider returns valid `{"choices": [...]}` JSON
4. Hermes logs: `⚠️ Empty response from model — retrying (1/3)`
5. After 3 retries: cascades to next fallback or gives up

**Root cause**

`run_agent.py` line ~8089: `_use_streaming = True` is unconditional — there's no per-provider or per-fallback streaming toggle. The comment says "Always prefer the streaming path" for health-monitoring benefits, but this assumption breaks custom providers.

When `client.chat.completions.create(stream=True)` receives a JSON response instead of SSE, the SDK's `Stream` iterator yields zero chunks. The streaming response builder at line ~5040 produces `full_content = None` with no tool calls → flagged as invalid.

**Expected behavior**

Either:
- (A) Add a per-provider config flag to disable streaming: `fallback_providers: [{provider: custom, model: x, base_url: y, stream: false}]`
- (B) Detect non-SSE responses in the streaming path and fall back to non-streaming parsing
- (C) Document that custom providers MUST support SSE streaming

**Workaround**

We built a lightweight proxy (~200 lines Python) that translates OpenAI streaming requests to Vertex AI's native `streamGenerateContent?alt=sse` endpoint and converts the chunks back to OpenAI `chat.completion.chunk` format. Happy to contribute this as a reference implementation or built-in adapter.

**Environment**
- Hermes version: 0.8.0
- Provider: custom (Vertex AI via proxy)
- Platform: Docker (gateway mode)
- OS: macOS (Apple Silicon)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom fallback providers fail silently when they don't support SSE streaming #21522

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Custom fallback providers fail silently when they don't support SSE streaming #21522

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions