Skip to content

Webchat race: chat final / session.message / sessions.changed triggers eager chat.history reload, causing flicker, collapse, or duplicate bubbles #66875

@BiznessFish

Description

@BiznessFish

Summary

After chat final, the webchat client can trigger eager chat.history reloads that replace optimistic/live messages, causing flicker, collapse, or duplicate assistant bubbles. This happens even with "Toggle assistant thinking/working output" turned off.

Related Issues

This issue consolidates the root cause analysis covering all these symptoms.

Environment

  • OpenClaw v2026.4.12+
  • Bundle: /opt/homebrew/lib/node_modules/openclaw/dist/control-ui/assets/index-CsLHusjn.js
  • macOS, Chrome/Safari

Symptoms Observed

  1. Assistant response streams correctly, then flickers/disappears/collapses
  2. Multiple separate assistant bubbles appear instead of one
  3. User messages duplicate (seen 4x copies of same message)
  4. Transient "thinking/working" bubble appears, then response vanishes

All symptoms occur even with thinking toggle OFF — the toggle affects presentation, not the underlying event/reload behavior.

Websocket Event Sequence

event:"chat" state:"delta"   → streams normally
event:"chat" state:"final"   → assistant message finalized
event:"session.message"      → arrives ~immediately after
event:"sessions.changed"     → arrives shortly after
→ chat.history request/reload triggered

The backend emits the message correctly. This is a client reconciliation/render bug.

Root Cause Analysis

Key Functions Traced

Function Role
MC(e,t) Event dispatcher
AC(e,t) Handles chat events
cp(e,t) Processes chat state transitions
kC(e,t,n) Terminal state cleanup
jC(e,t) Handles session.message events
Qf(e) Forces chat.history reload
gp(e) Handles sessions.changed

Bug #1: Unconditional Qf() on chat final

In AC():

let r = cp(e,t), i = kC(e,t,r);
r===`final`&&!i&&aC(t)&&Qf(e)  // ← always calls Qf on normal final

This forces a history reload after every successful message, even when local state is already correct.

Bug #2: jC() too eager after finalization

function jC(e,t){
  let n=t?.sessionKey?.trim();
  if(!n||n!==e.sessionKey)return;
  if(e.chatRunId)return;  // ← guard only works while run active
  Qf(e)                    // ← triggers reload after chatRunId cleared
}

Once cp() clears chatRunId on final, subsequent session.message events trigger additional Qf() calls.

Bug #3: Cleanup not reliably executed in cp(... final ...)

The cleanup of stream state:

e.chatStream = null
e.chatRunId = null
e.chatStreamStartedAt = null

appears to be tied to a ternary/conditional branch rather than always executing after final. This can leave stale stream state and amplify reconciliation issues.

Race Condition Sequence

1. User message optimistically added to chatMessages
2. Assistant streams via chat delta events
3. cp(...final...) appends assistant message
4. chatRunId is cleared ← CRITICAL MOMENT
5. session.message / sessions.changed arrive
6. jC() and/or AC() trigger Qf()
7. Qf() replaces chatMessages wholesale from chat.history
8. Timing/replication lag causes optimistic message to disappear, duplicate, or flicker

Diagnostic Patch (Confirmed)

Changed in AC():

// Original:
r===`final`&&!i&&aC(t)&&Qf(e)

// Patched:
r===`final`&&!i&&aC(t)&&false&&Qf(e)

Result:

  • Flicker/disappearing/extra-bubble behavior stopped immediately
  • Streaming worked normally
  • Final answer rendered cleanly

This confirms the eager post-final Qf() is a causal part of the bug.

Caveat

The blunt suppression may prevent some legitimate finalized message hydration (including thinking blocks that need server-side persistence). The patch is diagnostic evidence, not the final fix.

Proposed Fix

1. Fix cp(... final ...) cleanup

Ensure cleanup always runs after any final:

// Always clear after final:
e.chatStream = null
e.chatRunId = null  
e.chatStreamStartedAt = null

2. Remove unconditional Qf() on same-session chat final

In AC(), only reload when:

  • Local incorporation failed
  • Reconciling a different run/session
  • After a short delay/cooldown

3. Add post-final cooldown to jC()

Current guard only checks chatRunId, but it is cleared before session events finish arriving. Options:

  • Track lastFinalizedAt timestamp, skip Qf() for 500ms after
  • Track lastFinalizedRunId, skip if recent run just completed

4. Longer-term: Merge instead of replace

Qf() should merge history into existing local state rather than replacing chatMessages wholesale. This would naturally handle reconciliation without flicker.

Minimal Safe Fix

  1. Keep local final handling in cp()
  2. Remove unconditional Qf() on same-session final in AC()
  3. Suppress jC()-triggered reloads for ~500ms after local finalization

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions