feat(chat): session recovery with retry on errors by MarkSiqiZhang · Pull Request #81 · OpenLAIR/dr-claw

MarkSiqiZhang · 2026-03-23T05:39:26Z

Summary

Sessions interrupted by usage limits or API errors can now be recovered instead of being lost
Error messages show a Retry button that pre-fills the last user message for easy re-send
Usage limit errors display a clean "AI usage limit reached" message

Problem

When a session was interrupted (usage limit, API error, network issue), users couldn't continue the conversation:

Error messages showed raw text with no way to retry
No error classification — frontend couldn't distinguish retryable errors from permanent ones

Changes

Backend (all 3 providers)

server/claude-sdk.js — Record session to index on early errors; classify errors (usage_limit, overloaded, network, auth) and send
errorType/isRetryable with claude-error
server/openai-codex.js — Add error classification to codex-error
server/gemini-cli.js — Add error classification to sendGeminiError

Frontend

ChatMessage type — Added errorType and isRetryable fields
useChatRealtimeHandlers — Pass error metadata to ChatMessage for all providers; clear session timers on error to prevent stale loading state
MessageComponent — Render Retry button on retryable errors with usage limit hint
ChatInterface — Stable handleRetry callback using ref pattern (preserves React.memo)
formatUsageLimitText — Simplified to "AI usage limit reached. Please try again later."
i18n — All 3 locales (en, zh-CN, ko) updated

Test plan

Trigger a usage limit error → Retry button appears with hint text
Click Retry → last user message fills the input box (does NOT auto-send)
Non-retryable errors (auth) → no Retry button shown

Zhang-Henry

Code Review

Overall: Good feature — error classification across all 3 providers with retry UX. Consistent approach. A few issues to address.

Issues

1. [Refactor] Error classification logic duplicated 3 times

claude-sdk.js, gemini-cli.js, and openai-codex.js contain identical regex blocks:

let errorType = 'unknown';
if (/usage[_ ]limit|rate[_ ]limit/i.test(errorMsg)) errorType = 'usage_limit';
else if (/overloaded/i.test(errorMsg)) errorType = 'overloaded';
else if (/network|ECONNREFUSED|ETIMEDOUT/i.test(errorMsg)) errorType = 'network';
else if (/\bauth\b|unauthorized|forbidden/i.test(errorMsg)) errorType = 'auth';

Should be extracted into a shared utility: classifyError(message) → { errorType, isRetryable }. Copy-pasted regex is a maintenance hazard — when a new error pattern is added, all 3 files must be updated identically.

2. [Bug/UX regression] formatUsageLimitText loses reset time information

The original code (chatFormatting.ts:133-163) parsed Claude AI usage limit reached|<timestamp> and converted the Unix timestamp to a user-friendly local time: "Your limit will reset at 14:00 GMT+8 (Shanghai) - 23 Mar 2026".

The new code replaces this with a blanket 'AI usage limit reached. Please try again later.' — users completely lose the information about when they can retry. If this is an intentional design decision, it should be called out in the PR description with rationale. Otherwise, the original timestamp parsing should be preserved.

3. [Bug] sessionStorage cleanup scope is too broad

const timerKeys = Object.keys(sessionStorage).filter((k) => k.startsWith('session_timer_start_'));
timerKeys.forEach((k) => sessionStorage.removeItem(k));
sessionStorage.removeItem('pendingSessionId');

This clears all session timers, not just the one for the errored session. If the user has multiple tabs, this could interfere with other active sessions. The fix direction is correct (error path was indeed missing pendingSessionId cleanup — the success path at claude-complete does clean it up conditionally), but should be scoped to the specific session ID.

4. [Nit] handleRetry dependency array includes unused textareaRef

const handleRetry = useCallback(() => { ... }, [setInput, textareaRef]);

textareaRef is not referenced inside the callback body. Remove it from the deps array.

5. [Design] Retry only pre-fills input, doesn't auto-send

For usage_limit errors, pre-filling makes sense (user should wait). But for transient errors like network/overloaded, users typically expect "Retry" to actually retry the request. Consider auto-sending for transient errors, or at least differentiate the button behavior.

6. [Verify] Session creation → immediate removal timing

The new logic in claude-sdk.js (lines 657-671) sends session-created on early errors, then immediately calls removeSession(capturedSessionId). The frontend receives session-created followed quickly by claude-error. Verify the frontend handles this "create then destroy" sequence gracefully — sidebar shouldn't flash or leave ghost sessions.

MarkSiqiZhang · 2026-03-25T05:25:39Z

Thanks for the thorough review! All 6 issues have been addressed:

[Refactor] Error classification duplicated 3 times — Extracted into shared/errorClassifier.js. Exports classifyError() (regex-based for process-level errors) and classifySDKError() (per-provider structured error code mappings for Claude AssistantMessageError, Codex ResponseError.code, and Gemini gRPC codes). All 3 provider files now import from the shared module.
[Bug/UX regression] formatUsageLimitText loses reset time — Restored. Now parses the Unix timestamp and displays local reset time with proper timezone: "Your limit will reset at 06:46 PM GMT-4 (America/New_York) - Mar 24, 2026". Also fixed GMT offset for fractional-hour timezones (e.g. UTC+5:30).
[Bug] sessionStorage cleanup scope too broad — Scoped. clearSessionTimerStart was already called by markSessionsAsCompleted so the redundant call was removed. pendingSessionId is now only cleared when it matches the errored session, matching the pattern in the abort handler.
[Nit] Unused textareaRef in deps — Removed.
[Design] Retry only pre-fills, doesn't auto-send — Retry now always auto-resends the previous message via submitProgrammaticInput.
[Verify] Session create → immediate removal timing — Verified safe. recordIndexedSession persists to disk before removeSession clears memory. Frontend session-created handler adds the session to sidebar, and claude-error handler only marks it completed (clears loading state) — doesn't remove the sidebar entry. No ghost sessions or flashing.

Additional improvements in this update:

Detected SDK-level streaming errors (message.error on Claude AssistantMessage, error/turn.failed events on Codex) that previously bypassed the error UI entirely
Fixed dual error emission in Codex — error/turn.failed events now send only codex-error, not both codex-response + codex-error
Broadened regex patterns to match real provider error strings (Gemini "capacity exhausted (HTTP 429)", common Node.js network errors like ECONNRESET/ENOTFOUND)

- Record sessions eagerly on error so they appear in sidebar even if the first API call fails (claude-sdk.js) - Classify errors (usage_limit, overloaded, network, auth) across all three providers and send errorType/isRetryable to frontend - Add Retry button on retryable error messages that pre-fills the last user message into the input box - Simplify usage limit display to "AI usage limit reached" without timezone-dependent reset timestamps - Update all 3 i18n locales (en, zh-CN, ko)

Error handler now clears all pending session_timer_start_* keys from sessionStorage, preventing useChatSessionState from re-triggering isLoading after clearLoadingIndicators has already run.

…ery issues - Extract duplicated error classification regex into shared/errorClassifier.js with per-provider SDK error code mappings (Claude, Codex, Gemini) - Detect SDK-level streaming errors (e.g. rate_limit, rate_limit_exceeded) that previously bypassed the error UI in Claude and Codex - Restore usage limit reset time display instead of generic "try again later" - Scope sessionStorage cleanup to errored session only (multi-tab safe) - Fix handleRetry to auto-resend the previous message on retry - Remove unused textareaRef from handleRetry dependency array - Fix GMT offset for fractional-hour timezones (e.g. UTC+5:30) - Skip codex-response for error events to prevent duplicate error messages

Zhang-Henry reviewed Mar 23, 2026

View reviewed changes

Zhang-Henry force-pushed the feat/session-recovery branch from a54d724 to 7f9936a Compare March 26, 2026 06:43

MarkSiqiZhang added 3 commits March 26, 2026 01:50

fix(chat): clear session timers on error to prevent stale loading state

e53b417

Error handler now clears all pending session_timer_start_* keys from sessionStorage, preventing useChatSessionState from re-triggering isLoading after clearLoadingIndicators has already run.

Zhang-Henry force-pushed the feat/session-recovery branch from 7f9936a to fbaa8de Compare March 26, 2026 06:50

Zhang-Henry merged commit ff4ab40 into main Mar 26, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(chat): session recovery with retry on errors#81

feat(chat): session recovery with retry on errors#81
Zhang-Henry merged 3 commits intomainfrom
feat/session-recovery

MarkSiqiZhang commented Mar 23, 2026

Uh oh!

Zhang-Henry left a comment

Uh oh!

MarkSiqiZhang commented Mar 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MarkSiqiZhang commented Mar 23, 2026

Summary

Problem

Changes

Backend (all 3 providers)

Frontend

Test plan

Uh oh!

Zhang-Henry left a comment

Choose a reason for hiding this comment

Code Review

Issues

Uh oh!

MarkSiqiZhang commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MarkSiqiZhang commented Mar 25, 2026 •

edited

Loading