[Bug]: Infinite retry loop caused by stream stale timeout during local LLM prefill phase

### Bug Description

### Describe the bug
When using Hermes Agent with a local LLM backend(especiallly heavy ones), the agent falls into an infinite loop of timeouts and immediate retries if the local model's prompt processing (prefill) time exceeds the default stream timeout (e.g., 180s). 

The agent aborts the request (`[stream_generate] Aborting request`) before the first token is generated, causing the backend to interrupt the prefill (`prefill interrupted`). The agent then immediately retries the same heavy request, leading to a permanent deadlock where the model never gets enough time to finish the prefill phase.


### Additional context
Increasing the stream timeout in the configuration serves as a temporary workaround, but the core issue lies in the retry logic aggressively looping without allowing the local backend sufficient time to complete prompt ingestion.

### Steps to Reproduce

1. Set up Hermes Agent with an MLX-based local LLM backend on an Apple Silicon machine (e.g., Mac M3 Ultra with 512GB Unified RAM).
2. Load a massive local model, such as `baa-ai/GLM-5.1-RAM-420GB-MLX`.
3. Send a complex prompt to the agent. Due to the sheer size of the model (420GB), the initial prompt processing (prefill) naturally takes longer than 180 seconds.
4. Observe the logs: The agent times out after 180s, the backend throws a `prefill interrupted` error, and the agent immediately retries, starting the infinite loop.

### Expected Behavior

1. The agent should ideally have a separate, longer timeout configuration for the "Time To First Token" (TTFT) or prefill phase, distinct from the inter-token stream timeout.
2. When a timeout occurs, the retry mechanism should implement an exponential backoff or limit the maximum number of retries, rather than immediately and endlessly spamming the backend with the same heavy context.


### Actual Behavior

The agent forcibly aborts the connection after the default 180-second timeout (`[stream_generate] Aborting request`). This causes the local backend to halt its prompt evaluation (`prefill interrupted`). Immediately after the timeout, the agent retries the exact same heavy request without any delay or backoff. This results in an infinite retry loop, completely preventing the model from ever finishing the prefill phase and returning a response.

### Affected Component

Other

### Messaging Platform (if gateway-related)

Telegram

### Operating System

MacOS Tahoe

### Python Version

3.11

### Hermes Version

0.8.0

### Relevant Logs / Traceback

```shell

```

### Root Cause Analysis (optional)

_No response_

### Proposed Fix (optional)

_No response_

### Are you willing to submit a PR for this?

- [ ] I'd like to fix this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Infinite retry loop caused by stream stale timeout during local LLM prefill phase #7069

Bug Description

Describe the bug

Additional context

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Operating System

Python Version

Hermes Version

Relevant Logs / Traceback

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: Infinite retry loop caused by stream stale timeout during local LLM prefill phase #7069

Description

Bug Description

Describe the bug

Additional context

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Operating System

Python Version

Hermes Version

Relevant Logs / Traceback

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions