Skip to content

fix(chat): thinking execution graph#880

Merged
hazeone merged 13 commits intomainfrom
fix/thinking_execution_graph
Apr 20, 2026
Merged

fix(chat): thinking execution graph#880
hazeone merged 13 commits intomainfrom
fix/thinking_execution_graph

Conversation

@hazeone
Copy link
Copy Markdown
Contributor

@hazeone hazeone commented Apr 20, 2026

Summary

Fix thinking status for the session.

Type of Change

  • Bug fix
  • New feature
  • Documentation
  • Refactor
  • Other

Checklist

  • I ran relevant checks/tests locally.
  • I updated docs if behavior or interfaces changed.
  • I verified there are no unrelated changes in this PR.

hazeone and others added 13 commits April 20, 2026 15:52
… from reply bubble

- Prevent execution graph from auto-collapsing while reply is still
  streaming by excluding from autoCollapsedRunKeys and keeping
  expanded=true via controlled prop
- Strip thinking blocks from the streaming ChatMessage when the reply
  renders as a separate bubble, so thinking content doesn't duplicate
  alongside the response text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gateway history contains `role: 'user'` messages that are actually
tool-result wrappers (Anthropic API format). These were incorrectly
treated as run boundaries in nextUserMessageIndexes, causing:
- isLatestOpenRun=false during tool execution → graph collapses
- Run split into multiple segments → incorrect step attribution

Add isRealUserMessage() that detects tool-result wrappers by checking
if all content blocks are type 'tool_result', and use it in both
nextUserMessageIndexes computation and userRunCards filtering.

Also remove debug logging from previous iterations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ph cache

- hasCompletedToolPhase now checks that the last assistant message in the
  segment has no tool_use blocks, preventing false positives during
  intermediate tool rounds that would suppress the trailing thinking indicator
- Filter reply text from cached graph steps when a completed run falls
  back to the step cache, preventing the final response from appearing
  inside the graph when expanding after completion
- Remove debug logging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Revert hasCompletedToolPhase to simple check (segmentHasTools only).
  The lastAssistantHasNoTools guard was too restrictive: during reply
  streaming the last assistant in history still has tool_use (reply only
  exists in streamingMessage). The intermediate-narration edge case is
  already handled by stripProcessMessagePrefix producing empty
  trimmedReplyText, causing graceful fallback to buildSteps(false).

- Fix stale graph cache: filter out stream-generated message steps
  (id prefix 'stream-message') instead of brittle exact-match. These
  steps contain accumulated narration+reply text from streaming phase
  that should not persist after completion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ay events

loadHistory repeatedly set sending=false during server-side tool execution
by incorrectly inferring run completion from message content.

Run completion is now ONLY signalled by:
1. Gateway's phase 'completed' event (gateway.ts)
2. Streaming 'final' event (runtime-event-handlers.ts)
3. Safety timeout after 90s of no events

Also: fully controlled graph expanded prop, stable key, card.active
decoupled from streamingReplyText, suppressThinking prop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Gateway sends phase "end" after each tool-execution round (sub-run),
not just when the entire conversation finishes. This caused sending=false
between tool rounds, breaking the thinking indicator and input state.

Add a 5-second grace timer: on phase "end", delay sending=false. If a
new streaming event, "started" phase, or chat data arrives within the
window, the timer is cancelled and sending stays true. Only if the
grace period expires with no new activity does the run finalize.

Also: remove loadHistory finalize logic entirely — run completion is
now handled exclusively by Gateway phase events (with grace) and
streaming final events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… completion handling

Eliminate the phase completion timer and its associated logic from the Gateway. The handling of run completion is now solely based on Gateway phase events and streaming final events. This change simplifies the code and ensures that the state transitions are more reliable, as run completion is no longer inferred from the timer.

Additionally, update the runtime send actions to finalize the sending state immediately after the chat.send RPC completes, ensuring accurate state management during agent conversations.
…alidation

- Bump version in package.json to 0.3.10-beta.4.
- Add a new GitHub Actions job to validate that the version in package.json matches the release tag.
- Introduce scripts for versioning and release validation to streamline the release process.
- Refactor reply text overrides to use useMemo for improved performance and clarity.
- Update unit test to reflect changes in execution graph behavior during streaming replies, ensuring the graph remains expanded while replies are streaming.
…state

Gateway fires phase "end" per tool-execution round, setting sending=false
between tool calls. Instead of fighting this at the store level (grace
timers, loadHistory heuristics), handle it at the UI layer:

- isLatestOpenRun now includes runStillExecutingTools: if historical
  messages have tool_use but no pure-text final reply, the run is still
  in progress regardless of sending state
- ChatInput receives sending || hasActiveExecutionGraph so the stop
  button stays visible during server-side tool execution
- autoCollapsedRunKeys collapses only when !card.active (run has final
  reply) — not during intermediate tool rounds
- Revert gateway.ts and runtime-send-actions.ts to their original
  behavior (no grace timer, no RPC-based finalization)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Eliminated the in-flight user prompt and its associated history from the chat task visualizer tests.
- Updated the test suite to reflect changes in the execution graph behavior, ensuring it accurately represents the current state without the removed elements.
@hazeone hazeone marked this pull request as ready for review April 20, 2026 12:49
@hazeone hazeone merged commit 9a15751 into main Apr 20, 2026
5 checks passed
@hazeone hazeone deleted the fix/thinking_execution_graph branch April 20, 2026 12:53
Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4dae16adaa

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/pages/Chat/index.tsx
Comment on lines +241 to +245
if (extractText(m).trim().length === 0) return false;
const content = m.content;
if (!Array.isArray(content)) return true;
return !(content as Array<{ type?: string }>).some(
(b) => b.type === 'tool_use' || b.type === 'toolCall',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Close runs when terminal assistant output is non-text

hasFinalReply currently returns true only for assistant messages that contain text and no tool_use/toolCall blocks. For completed runs whose last assistant turn is image-only (or mixed text + tool call), this leaves runStillExecutingTools true forever, so isLatestOpenRun never clears. Because this commit also wires ChatInput to sending || hasActiveExecutionGraph, those sessions can stay stuck in “sending” mode and block follow-up prompts. The completion check should align with the broader terminal-output semantics used elsewhere (e.g., reply detection that is not text-only).

Useful? React with 👍 / 👎.

DigitalNomad-Chat added a commit to DigitalNomad-Chat/ClawX that referenced this pull request Apr 26, 2026
… dedupe (ValueCell-ai#821 ValueCell-ai#845 ValueCell-ai#870 ValueCell-ai#873 ValueCell-ai#875 ValueCell-ai#878 ValueCell-ai#880 ValueCell-ai#885 ValueCell-ai#886 ValueCell-ai#887 ValueCell-ai#891 ValueCell-ai#903)

Overhaul execution graph card (collapse/expand, narration steps, web_fetch links),
separate thinking messages, render LaTeX math, dedupe optimistic messages,
hide recoverable gateway timeouts, add startup history recovery.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant