Summary
Switching the main session model to arcee/trinity-large-thinking can brick the session: the TUI repeatedly shows connection error, Telegram becomes nonresponsive, and the main lane stalls until fallback finally fires ~20-24s later. The same model can work in other contexts, which makes this a dangerous weak link in the main-session path.
This is not fixed by uncapping output tokens in config and restarting.
Severity
High. This degrades the primary interactive session path and effectively wedges both the local TUI and Telegram responsiveness whenever Trinity is selected as the main session model.
Environment
- OpenClaw runtime: main agent session on macOS
- Model:
arcee/trinity-large-thinking (alias trinity)
- Provider:
arcee
- Session type affected: main session
- Observed date: 2026-04-07
What we already ruled out
We explicitly tested the obvious config variable and ruled it out:
- Trinity config persists correctly
- output cap was uncapped in config
- gateway restarted cleanly
- failure still persists
So this is not just a token-cap config issue.
User-visible behavior
- Switch main session to Trinity
- TUI starts spamming
connection error
- Telegram goes dark / nonresponsive
- Session appears bricked until fallback eventually succeeds on another model
This is not a graceful provider failure. It is a main-session stability failure.
Fresh log evidence (last ~15 min after uncap + restart)
From ~/.openclaw/logs/gateway.err.log:
2026-04-07T22:15:40.639-04:00 [agent] embedded run agent end: runId=c40546af-1962-4622-ac32-cff3b3006ba9 isError=true model=trinity-large-thinking provider=arcee error=LLM request failed: network connection error. rawError=Connection error.
2026-04-07T22:15:44.125-04:00 [agent] embedded run agent end: runId=c40546af-1962-4622-ac32-cff3b3006ba9 isError=true model=trinity-large-thinking provider=arcee error=LLM request failed: network connection error. rawError=Connection error.
2026-04-07T22:15:49.538-04:00 [agent] embedded run agent end: runId=c40546af-1962-4622-ac32-cff3b3006ba9 isError=true model=trinity-large-thinking provider=arcee error=LLM request failed: network connection error. rawError=Connection error.
2026-04-07T22:16:00.614-04:00 [agent] embedded run agent end: runId=c40546af-1962-4622-ac32-cff3b3006ba9 isError=true model=trinity-large-thinking provider=arcee error=LLM request failed: network connection error. rawError=Connection error.
2026-04-07T22:16:54.442-04:00 [model-fallback] model fallback decision: decision=candidate_failed requested=arcee/trinity-large-thinking candidate=arcee/trinity-large-thinking reason=overloaded next=openai-codex/gpt-5.4
2026-04-07T22:17:06.013-04:00 [model-fallback] model fallback decision: decision=candidate_succeeded requested=arcee/trinity-large-thinking candidate=openai-codex/gpt-5.4 reason=unknown next=none
2026-04-07T22:17:10.473-04:00 [agent] embedded run agent end: runId=7b00bf0c-c380-43a3-ab31-93cef91d2346 isError=true model=trinity-large-thinking provider=arcee error=LLM request failed: network connection error. rawError=Connection error.
2026-04-07T22:17:13.901-04:00 [agent] embedded run agent end: runId=7b00bf0c-c380-43a3-ab31-93cef91d2346 isError=true model=trinity-large-thinking provider=arcee error=LLM request failed: network connection error. rawError=Connection error.
2026-04-07T22:17:19.323-04:00 [agent] embedded run agent end: runId=7b00bf0c-c380-43a3-ab31-93cef91d2346 isError=true model=trinity-large-thinking provider=arcee error=LLM request failed: network connection error. rawError=Connection error.
2026-04-07T22:17:28.686-04:00 [agent] embedded run agent end: runId=7b00bf0c-c380-43a3-ab31-93cef91d2346 isError=true model=trinity-large-thinking provider=arcee error=LLM request failed: network connection error. rawError=Connection error.
2026-04-07T22:17:29.704-04:00 [agent] embedded run failover decision: runId=7b00bf0c-c380-43a3-ab31-93cef91d2346 stage=assistant decision=fallback_model reason=timeout provider=arcee/trinity-large-thinking profile=-
2026-04-07T22:17:29.705-04:00 [diagnostic] lane task error: lane=main durationMs=23552 error="FailoverError: LLM request failed: network connection error."
2026-04-07T22:17:29.705-04:00 [diagnostic] lane task error: lane=session:agent:main:main durationMs=23553 error="FailoverError: LLM request failed: network connection error."
2026-04-07T22:17:29.706-04:00 [model-fallback] model fallback decision: decision=candidate_failed requested=arcee/trinity-large-thinking candidate=arcee/trinity-large-thinking reason=timeout next=openai-codex/gpt-5.4
2026-04-07T22:17:39.723-04:00 [model-fallback] model fallback decision: decision=candidate_succeeded requested=arcee/trinity-large-thinking candidate=openai-codex/gpt-5.4 reason=unknown next=none
Important detail
Earlier failures also presented as Internal server error, then later as repeated network connection error / timeout behavior. So the failure mode appears to have shifted, but the main-session brick remains.
Example earlier same-day evidence:
2026-04-07T20:40:19.906-04:00 [agent] embedded run failover decision: runId=9edf7806-201c-4d5f-a565-10a70c454af2 stage=assistant decision=fallback_model reason=timeout provider=arcee/trinity-large-thinking profile=-
2026-04-07T20:42:25.974-04:00 [agent] embedded run failover decision: runId=7218d66b-dc0c-415d-8753-7e90d777cf2a stage=assistant decision=fallback_model reason=timeout provider=arcee/trinity-large-thinking profile=-
Why this is bad
The current behavior does not fail fast and recover cleanly. Instead it:
- retries repeatedly in the main interactive lane,
- surfaces repeated connection errors to the TUI,
- starves responsiveness on Telegram,
- and only later falls back.
That means one unstable model/provider pairing can effectively poison the main session UX.
Strong suspicion / likely failure area
One or more of these is still wrong in the main-session path:
- main-session handling of Trinity/Arcee failures is too sticky and does not fail fast,
- a hidden runtime cap or request shaping difference still exists in the main lane,
- Trinity response handling in the main lane differs from subagent lane,
- provider transport errors are not being isolated from the user-facing session loop.
The key point: OpenClaw should not allow a model switch to brick the primary session experience.
Expected behavior
If Trinity/Arcee is unhealthy for a main session request, OpenClaw should:
- fail fast,
- mark the candidate unhealthy,
- immediately fallback,
- keep TUI responsive,
- keep Telegram responsive,
- and avoid repeating visible
connection error spam.
Repro steps
- Configure
arcee/trinity-large-thinking
- Switch the main session model to Trinity
- Send a normal main-session prompt
- Observe repeated
connection error in TUI and stalled Telegram responsiveness
- Wait ~20-24s for eventual fallback to another model
Request
Please treat this as a stability bug in the main-session lane, not a cosmetic provider hiccup. A broken model/provider should degrade gracefully, not wedge the user’s primary session surfaces.
Summary
Switching the main session model to
arcee/trinity-large-thinkingcan brick the session: the TUI repeatedly showsconnection error, Telegram becomes nonresponsive, and the main lane stalls until fallback finally fires ~20-24s later. The same model can work in other contexts, which makes this a dangerous weak link in the main-session path.This is not fixed by uncapping output tokens in config and restarting.
Severity
High. This degrades the primary interactive session path and effectively wedges both the local TUI and Telegram responsiveness whenever Trinity is selected as the main session model.
Environment
arcee/trinity-large-thinking(aliastrinity)arceeWhat we already ruled out
We explicitly tested the obvious config variable and ruled it out:
So this is not just a token-cap config issue.
User-visible behavior
connection errorThis is not a graceful provider failure. It is a main-session stability failure.
Fresh log evidence (last ~15 min after uncap + restart)
From
~/.openclaw/logs/gateway.err.log:Important detail
Earlier failures also presented as
Internal server error, then later as repeatednetwork connection error/ timeout behavior. So the failure mode appears to have shifted, but the main-session brick remains.Example earlier same-day evidence:
Why this is bad
The current behavior does not fail fast and recover cleanly. Instead it:
That means one unstable model/provider pairing can effectively poison the main session UX.
Strong suspicion / likely failure area
One or more of these is still wrong in the main-session path:
The key point: OpenClaw should not allow a model switch to brick the primary session experience.
Expected behavior
If Trinity/Arcee is unhealthy for a main session request, OpenClaw should:
connection errorspam.Repro steps
arcee/trinity-large-thinkingconnection errorin TUI and stalled Telegram responsivenessRequest
Please treat this as a stability bug in the main-session lane, not a cosmetic provider hiccup. A broken model/provider should degrade gracefully, not wedge the user’s primary session surfaces.