feat(chat): session recovery with retry on errors#81
Conversation
Zhang-Henry
left a comment
There was a problem hiding this comment.
Code Review
Overall: Good feature — error classification across all 3 providers with retry UX. Consistent approach. A few issues to address.
Issues
1. [Refactor] Error classification logic duplicated 3 times
claude-sdk.js, gemini-cli.js, and openai-codex.js contain identical regex blocks:
let errorType = 'unknown';
if (/usage[_ ]limit|rate[_ ]limit/i.test(errorMsg)) errorType = 'usage_limit';
else if (/overloaded/i.test(errorMsg)) errorType = 'overloaded';
else if (/network|ECONNREFUSED|ETIMEDOUT/i.test(errorMsg)) errorType = 'network';
else if (/\bauth\b|unauthorized|forbidden/i.test(errorMsg)) errorType = 'auth';Should be extracted into a shared utility: classifyError(message) → { errorType, isRetryable }. Copy-pasted regex is a maintenance hazard — when a new error pattern is added, all 3 files must be updated identically.
2. [Bug/UX regression] formatUsageLimitText loses reset time information
The original code (chatFormatting.ts:133-163) parsed Claude AI usage limit reached|<timestamp> and converted the Unix timestamp to a user-friendly local time: "Your limit will reset at 14:00 GMT+8 (Shanghai) - 23 Mar 2026".
The new code replaces this with a blanket 'AI usage limit reached. Please try again later.' — users completely lose the information about when they can retry. If this is an intentional design decision, it should be called out in the PR description with rationale. Otherwise, the original timestamp parsing should be preserved.
3. [Bug] sessionStorage cleanup scope is too broad
const timerKeys = Object.keys(sessionStorage).filter((k) => k.startsWith('session_timer_start_'));
timerKeys.forEach((k) => sessionStorage.removeItem(k));
sessionStorage.removeItem('pendingSessionId');This clears all session timers, not just the one for the errored session. If the user has multiple tabs, this could interfere with other active sessions. The fix direction is correct (error path was indeed missing pendingSessionId cleanup — the success path at claude-complete does clean it up conditionally), but should be scoped to the specific session ID.
4. [Nit] handleRetry dependency array includes unused textareaRef
const handleRetry = useCallback(() => { ... }, [setInput, textareaRef]);textareaRef is not referenced inside the callback body. Remove it from the deps array.
5. [Design] Retry only pre-fills input, doesn't auto-send
For usage_limit errors, pre-filling makes sense (user should wait). But for transient errors like network/overloaded, users typically expect "Retry" to actually retry the request. Consider auto-sending for transient errors, or at least differentiate the button behavior.
6. [Verify] Session creation → immediate removal timing
The new logic in claude-sdk.js (lines 657-671) sends session-created on early errors, then immediately calls removeSession(capturedSessionId). The frontend receives session-created followed quickly by claude-error. Verify the frontend handles this "create then destroy" sequence gracefully — sidebar shouldn't flash or leave ghost sessions.
|
Thanks for the thorough review! All 6 issues have been addressed:
Additional improvements in this update:
|
a54d724 to
7f9936a
Compare
- Record sessions eagerly on error so they appear in sidebar even if
the first API call fails (claude-sdk.js)
- Classify errors (usage_limit, overloaded, network, auth) across all
three providers and send errorType/isRetryable to frontend
- Add Retry button on retryable error messages that pre-fills the last
user message into the input box
- Simplify usage limit display to "AI usage limit reached" without
timezone-dependent reset timestamps
- Update all 3 i18n locales (en, zh-CN, ko)
Error handler now clears all pending session_timer_start_* keys from sessionStorage, preventing useChatSessionState from re-triggering isLoading after clearLoadingIndicators has already run.
…ery issues
- Extract duplicated error classification regex into shared/errorClassifier.js
with per-provider SDK error code mappings (Claude, Codex, Gemini)
- Detect SDK-level streaming errors (e.g. rate_limit, rate_limit_exceeded)
that previously bypassed the error UI in Claude and Codex
- Restore usage limit reset time display instead of generic "try again later"
- Scope sessionStorage cleanup to errored session only (multi-tab safe)
- Fix handleRetry to auto-resend the previous message on retry
- Remove unused textareaRef from handleRetry dependency array
- Fix GMT offset for fractional-hour timezones (e.g. UTC+5:30)
- Skip codex-response for error events to prevent duplicate error messages
7f9936a to
fbaa8de
Compare
Summary
Problem
When a session was interrupted (usage limit, API error, network issue), users couldn't continue the conversation:
Changes
Backend (all 3 providers)
server/claude-sdk.js— Record session to index on early errors; classify errors (usage_limit,overloaded,network,auth) and senderrorType/isRetryablewithclaude-errorserver/openai-codex.js— Add error classification tocodex-errorserver/gemini-cli.js— Add error classification tosendGeminiErrorFrontend
ChatMessagetype — AddederrorTypeandisRetryablefieldsuseChatRealtimeHandlers— Pass error metadata to ChatMessage for all providers; clear session timers on error to prevent stale loading stateMessageComponent— Render Retry button on retryable errors with usage limit hintChatInterface— StablehandleRetrycallback using ref pattern (preservesReact.memo)formatUsageLimitText— Simplified to "AI usage limit reached. Please try again later."Test plan