fix(chrome-relay): auto-reconnect, MV3 persistence, and keepalive by derrickburns · Pull Request #15817 · openclaw/openclaw

derrickburns · 2026-02-13T22:50:34Z

Problem

The Chrome extension relay drops connection after page navigation, sleep/wake cycles, or MV3 service worker restarts — and never recovers. Users must manually re-click the toolbar icon after every navigation. Related: #1160

Root Cause

Five failure modes identified via code audit of background.js and the relay server:

No reconnection logic — WebSocket drops are permanent (extension clears state and stops)
MV3 state amnesia — service worker restarts wipe all in-memory Maps
No keepalive — Chrome kills idle service worker after ~30s
Navigation detaches debugger — chrome.debugger auto-detaches on page navigation with no re-attach
No pending request cleanup — dropped messages leak memory

Fix

Drop-in replacement for background.js + one manifest permission (alarms):

Auto-reconnect — Exponential backoff (1s→30s cap, 10 attempts) on WS drop
State persistence — chrome.storage.local saves attached tabs, sessions — survives worker restarts
Keepalive alarm — chrome.alarms every 24s (under MV3 30s limit) checks WS health
Navigation re-attach — On target_closed detach, waits 500ms then re-attaches if tab exists
Per-tab locks — Prevents double-attach race from rapid toolbar clicks
Tab lifecycle cleanup — onRemoved/onUpdated listeners clean state on close/navigate
Request timeouts — 30s timeout on pending requests prevents memory leaks
Child session cleanup — Proper detach events for child sessions when parent disconnects

Testing

Tested on macOS (Chrome Profile 11) against Ancestry.com:

✅ Snapshot through relay
✅ Navigate to different page + snapshot (previously broke here)
✅ Extension reload + reconnect

Changes

assets/chrome-extension/background.js — +280 lines (reconnect, persistence, keepalive, lifecycle)
assets/chrome-extension/manifest.json — added alarms permission

No changes to relay server protocol, options page, or CDP command handling.

Greptile Overview

Greptile Summary

This PR rewrites the Chrome extension service worker (assets/chrome-extension/background.js) to make the relay connection resilient: it adds auto-reconnect with backoff, per-tab operation locks, request timeouts for pending relay RPCs, and state persistence via chrome.storage.local. It also introduces a keepalive alarm (chrome.alarms) to keep the MV3 service worker active, and tab lifecycle handling (onRemoved/onUpdated) plus navigation-triggered re-attach logic for debugger detaches.

assets/chrome-extension/manifest.json is updated to request the new alarms permission required for the keepalive.

Confidence Score: 3/5

This PR is close to mergeable but has reconnection/persistence logic gaps that can prevent recovery in common scenarios.
Core reconnect/keepalive/persistence changes look coherent, but the keepalive path can fail to schedule reconnect when connection attempts throw early, and restored state currently marks tabs as attached without re-attaching the debugger, which can leave the extension in an inconsistent state after MV3 restarts.
assets/chrome-extension/background.js (keepalive/reconnect failure paths, restoreState/reattach behavior)

_{Last reviewed commit: 1891255}

_{(2/5) Greptile learns from your feedback when you react with thumbs up/down!}

Relates to #15099

The Chrome extension relay loses connection after navigation, sleep/wake, or service worker restarts and never recovers. This is because: 1. No reconnection logic exists — WebSocket drops are permanent 2. MV3 service worker restarts wipe all in-memory state 3. No keepalive prevents Chrome from killing the idle worker 4. chrome.debugger detaches on navigation with no re-attach This patch adds: - Auto-reconnect with exponential backoff (1s-30s cap, 10 attempts) - State persistence via chrome.storage.local (survives worker restarts) - chrome.alarms keepalive (24s interval, under MV3 30s limit) - Re-attach on debugger detach from navigation/reload - Per-tab operation locks (prevents double-attach race) - Tab lifecycle listeners (cleanup on close/navigate) - Pending request timeouts (30s, prevents memory leaks) - Child session cleanup on parent detach Also adds 'alarms' permission to manifest.json. Tested: snapshot, navigate+snapshot (previously broke here), and extension reload — all working on macOS with Ancestry.com. Relates to openclaw#1160

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-13T22:55:41Z

assets/chrome-extension/background.js

+chrome.alarms.onAlarm.addListener(async (alarm) => {
+  if (alarm.name === 'relay-keepalive') {
+    // Check WebSocket health and reconnect if needed
+    if (!relayWs || relayWs.readyState !== WebSocket.OPEN) {
+      if (!relayConnectPromise && !reconnectTimer) {
+        console.log('Keepalive: WebSocket unhealthy, triggering reconnect')
+        await ensureRelayConnection().catch(() => {
+          // If connection fails, scheduleReconnect will be called by onRelayClosed
+        })
+      }


Keepalive never reconnects

In the chrome.alarms keepalive handler, ensureRelayConnection() failures are swallowed, but ensureRelayConnection() can throw (e.g., preflight fetch fails or WS connect rejects) without triggering onRelayClosed(), so scheduleReconnect() never runs. This means after an MV3 restart or when the relay is down, the keepalive tick can repeatedly do nothing and the extension may never auto-reconnect. Consider calling scheduleReconnect() from the keepalive catch/failure path (or from ensureRelayConnection failure paths) so reconnect is guaranteed to be scheduled.

Prompt To Fix With AI

This is a comment left during a code review. Path: assets/chrome-extension/background.js Line: 687:696 Comment: **Keepalive never reconnects** In the `chrome.alarms` keepalive handler, `ensureRelayConnection()` failures are swallowed, but `ensureRelayConnection()` can throw (e.g., preflight `fetch` fails or WS connect rejects) without triggering `onRelayClosed()`, so `scheduleReconnect()` never runs. This means after an MV3 restart or when the relay is down, the keepalive tick can repeatedly do nothing and the extension may never auto-reconnect. Consider calling `scheduleReconnect()` from the keepalive catch/failure path (or from `ensureRelayConnection` failure paths) so reconnect is guaranteed to be scheduled. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-13T22:55:42Z

assets/chrome-extension/background.js

+async function restoreState() {
+  try {
+    const { extensionState } = await chrome.storage.local.get(['extensionState'])
+    if (extensionState) {
+      // Restore nextSession counter to avoid ID conflicts
+      if (typeof extensionState.nextSession === 'number') {
+        nextSession = extensionState.nextSession
+      }
+
+      // Validate and restore tabs - some may have closed during service worker downtime
+      if (Array.isArray(extensionState.attachedTabs)) {
+        for (const [tabId, tabState] of extensionState.attachedTabs) {
+          try {
+            const tab = await chrome.tabs.get(tabId)
+            if (tab) {
+              tabs.set(tabId, tabState)
+              if (tabState.sessionId) {
+                tabBySession.set(tabState.sessionId, tabId)
+              }
+            }
+          } catch {
+            // Tab no longer exists, skip it
+          }
+        }
+      }
+
+      // Restore child session mappings for still-valid tabs
+      if (Array.isArray(extensionState.childSessions)) {
+        for (const [sessionId, tabId] of extensionState.childSessions) {
+          if (tabs.has(tabId)) {
+            childSessionToTab.set(sessionId, tabId)
+          }
+        }
+      }
+    }
+  } catch (err) {
+    console.warn('Failed to restore state:', err)
+  }
+}


Restored state not attached

restoreState() repopulates tabs/tabBySession/childSessionToTab, but it never re-attaches chrome.debugger for those tabs or refreshes badge/title. After an MV3 service worker restart, this leaves the extension believing tabs are attached/connected while Chrome has no debugger session, which can break command routing and makes the UI state misleading until the user manually toggles. A fix is to either (a) re-attach debuggers (and send attached events) for restored tabs, or (b) treat restored tabs as disconnected and require a fresh attach.

Prompt To Fix With AI

This is a comment left during a code review. Path: assets/chrome-extension/background.js Line: 77:115 Comment: **Restored state not attached** `restoreState()` repopulates `tabs`/`tabBySession`/`childSessionToTab`, but it never re-attaches `chrome.debugger` for those tabs or refreshes badge/title. After an MV3 service worker restart, this leaves the extension believing tabs are attached/connected while Chrome has no debugger session, which can break command routing and makes the UI state misleading until the user manually toggles. A fix is to either (a) re-attach debuggers (and send attached events) for restored tabs, or (b) treat restored tabs as disconnected and require a fresh attach. How can I resolve this? If you propose a fix, please make it concise.

codexGW · 2026-02-14T05:21:35Z

Hey — just a heads up, I posted a detailed root cause analysis and working fixes for these exact issues on #15099 before this PR was opened. Auto-reconnect with exponential backoff, debugger re-attach on navigation, MV3 state persistence — all covered there with code snippets.

Would've been nice to get a mention or a "relates to #15099." Not a big deal, but credit where it's due.

derrickburns · 2026-02-14T07:31:33Z

To be honest, I never looked at GitHub at all. Not even a little. I just stumbled over the problem and it blocked me. So I asked openclaw to fix itself using Claude and Codex. It did and after testing I told it to create an
Issue and a PR. Then when the issue was rejected as a duplicate, I told it to attach this to the original.

No disrespect was intended! Apologies!

…tate re-attaches debuggers Fixes two issues found in code review: 1. Keepalive handler: ensureRelayConnection() can throw without triggering onRelayClosed (e.g. preflight fetch fails before WS creation), leaving no reconnect scheduled. Now explicitly calls scheduleReconnect() from the catch path. 2. restoreState(): After MV3 service worker restart, tab maps were repopulated but chrome.debugger was never re-attached, leaving the extension in a stale state. Now marks restored tabs as disconnected, then re-attaches debuggers after relay connects. Relates to openclaw#15099

…th 409 After a Chrome restart, the old extension WebSocket may not have fired its close event yet. The gateway was rejecting the new connection with 409 'Extension already connected', requiring a gateway restart to clear the stale state. Now: close the stale connection and accept the new one seamlessly.

…nect+reconnect When navigating between pages, Chrome detaches the debugger with reason 'target_closed'. Previously, this triggered a full detachTab() which sent Target.detachedFromTarget events to the relay, breaking active CDP sessions. The 500ms re-attach then created a new session. Now: on navigation detach, skip the relay disconnect notification, show a 'connecting' badge, and re-attach after 500ms. The gateway sees a seamless session replacement instead of a disruptive disconnect+reconnect cycle. Full cleanup only happens for non-navigation detaches (user action, crash, etc.).

If the user clicks the toolbar button to detach during the 500ms navigation re-attach grace period, the timeout would re-attach the tab anyway. Now checks tabs.has(tabId) before re-attaching — if the tab was manually detached, the timeout is a no-op.

… state corruption - extension-relay.ts: Guard close handler against stale WS nulling new connection. When a replaced WS fires its close event, it was clearing extensionWs, connectedTargets, and disconnecting all CDP clients even though a new connection was already active. - background.js: Reset reconnectAttempts on any successful connection, not just auto-reconnect. Prevents exhausted counter from blocking future auto-reconnects after manual recovery. - background.js: Add tabOperationLocks to reattachKnownTabs to prevent races with concurrent user toolbar clicks during reconnection.

derrickburns · 2026-02-15T01:07:01Z

Testing Results

Torture Test Suite

Two test suites were created and run against a managed (isolated) Chrome instance (openclaw profile) to avoid interfering with user sessions.

1. Aggressive Stress Test

Rapid-fire operations with minimal delays:

Sequential navigation (5 pages, 500ms intervals)
Navigate + immediate snapshot race (5x)
Rapid tab open (5 tabs, 300ms intervals)
Snapshot across all tabs
Rapid tab close
Error/edge-case URLs (404, 500, about:blank)
Machine-gun navigation (10 concurrent navigates, 200ms intervals)
Final health check

Result: 8/8 passed

2. Human-Paced Endurance Test (30 minutes)

Simulates realistic browsing with natural delays (2-6s reading pauses, 15-30s idle periods):

Multi-page research sessions (navigate → read → follow link)
New tab side-quests (open → browse → close)
Quick back-and-forth navigation
Error page recovery (404 → navigate away)
Idle periods (15-30s, simulating user away)
Health checks each iteration

Result: 226/228 passed over 30 minutes (99.1%)

The 2 failures were both the managed Chrome process crashing (not relay bugs) — the relay detected and recovered automatically each time. Zero relay/extension failures across the entire run.

Code Review

Two independent code reviews were run (different model from the author):

Security/Safety Review found:

🔴 No auth on /extension WebSocket (any local process can connect) — noted as follow-up
🔴 No CDP method allowlist — architectural, by design
✅ Loopback-only binding, auth on /cdp, proper cleanup

Logic/Correctness Review found:

🔴 Stale WS close handler nulling new connection → fixed in commit 6
🔴 Navigation re-attach racing with manual detach → fixed in commit 5
🔴 Missing tabOperationLocks in reattach paths → fixed in commit 6
🟡 reconnectAttempts not resetting on manual connect → fixed in commit 6

All critical findings from code review were addressed before testing.

iMikio · 2026-02-20T20:36:30Z

Running OpenClaw on WSL2 and hit this exact issue — after every gateway restart, the extension drops and requires a manual re-click to re-attach.

We patched background.js locally with auto-reconnect + exponential backoff (similar approach to what's described here), and it works great. The flow is:

Relay disconnects → save attached tab IDs
Retry with exponential backoff (2s, 4s, 6s... up to 10 attempts)
On failure → notify via gateway's /hooks/agent API, with chrome.storage.local as fallback for when the gateway itself is down

Would love to see this merged officially so we don't have to maintain a local patch. The current UX (manual re-click after every restart) is a real pain point for anyone running a persistent setup. 🙏

steipete · 2026-02-26T14:33:57Z

Thanks for the detailed work here.

Closing as superseded by newer main implementation in the same area. Current main already includes the reconnect/persistence/race-hardening set (for example reconnect race hardening, stale socket replacement guards, tab/session state handling, keepalive, and related relay tests), but on top of the newer relay/auth architecture.

Merging this branch now would effectively roll back newer changes and reintroduce divergence.

openclaw-barnacle bot added the size: M label Feb 13, 2026

greptile-apps bot reviewed Feb 13, 2026

View reviewed changes

This was referenced Feb 14, 2026

Chrome extension relay: frequent disconnects require manual re-attach (fixable in background.js) #15099

Closed

Browser extension relay returns stale tab cache after CDP connections die #6175

Closed

derrickburns added 5 commits February 14, 2026 09:50

codexGW mentioned this pull request Feb 14, 2026

fix(chrome-relay): resilient reconnect, MV3 persistence, and navigation re-attach #16023

Closed

thewilloftheshadow force-pushed the main branch from bfc1ccb to f92900f Compare February 15, 2026 18:46

codexGW mentioned this pull request Feb 17, 2026

fix(browser): handle stale extension WebSocket on reconnect #18698

Closed

steipete closed this Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(chrome-relay): auto-reconnect, MV3 persistence, and keepalive#15817

fix(chrome-relay): auto-reconnect, MV3 persistence, and keepalive#15817
derrickburns wants to merge 6 commits intoopenclaw:mainfrom
derrickburns:fix/chrome-relay-reconnect

derrickburns commented Feb 13, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 13, 2026

Uh oh!

greptile-apps bot Feb 13, 2026

Uh oh!

codexGW commented Feb 14, 2026

Uh oh!

derrickburns commented Feb 14, 2026 •

edited

Loading

Uh oh!

derrickburns commented Feb 15, 2026

Uh oh!

iMikio commented Feb 20, 2026

Uh oh!

steipete commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

derrickburns commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Fix

Testing

Changes

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

codexGW commented Feb 14, 2026

Uh oh!

derrickburns commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

derrickburns commented Feb 15, 2026

Testing Results

Torture Test Suite

1. Aggressive Stress Test

2. Human-Paced Endurance Test (30 minutes)

Code Review

Uh oh!

iMikio commented Feb 20, 2026

Uh oh!

steipete commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

derrickburns commented Feb 13, 2026 •

edited

Loading

derrickburns commented Feb 14, 2026 •

edited

Loading