Skip to content

fix(connection-manager): cancel in-flight init on disconnect to stop pool resurrection#984

Open
yashkot007 wants to merge 1 commit into
garrytan:masterfrom
yashkot007:fix/disconnect-cancel-pending-init
Open

fix(connection-manager): cancel in-flight init on disconnect to stop pool resurrection#984
yashkot007 wants to merge 1 commit into
garrytan:masterfrom
yashkot007:fix/disconnect-cancel-pending-init

Conversation

@yashkot007

Copy link
Copy Markdown

Summary

ConnectionManager.disconnect() (src/core/connection-manager.ts:396) has two race holes when a caller is mid-flight in getDirectPool():

  1. The _directInit = null line lives inside the if (this._directPool) guard. If disconnect runs while init is still pending (pool not assigned yet), the guard fails and _directInit stays pointing at the in-flight Promise. When that Promise resolves, its .then(pool => { this._directPool = pool; ... }) callback fires and resurrects _directPool after disconnect has supposedly torn everything down.

  2. The resurrected pool's socket is now held by _directPool with no tracking — a clean fd leak across rapid reconnect cycles (tests, Conductor worktrees, worker shutdown flows).

Pre-fix shape:

async disconnect(): Promise<void> {
  if (this._directPool) {                              // ← guard fails when
    try { await this._directPool.end(); } catch {}     //   init is in flight
    this._directPool = null;
    this._directInit = null;                           // ← only cleared inside guard
  }
  ...
}

Repro

Spying on private initDirectPool to return a controllable Promise (no DB needed):

  1. Caller A starts getDirectPool()_directInit set to chained Promise
  2. Caller B calls disconnect() BEFORE init resolves → guard fails → _directInit stays pointing at the live Promise
  3. Init Promise resolves → .then callback fires → _directPool = pool
  4. Final state: _directPool SET (resurrected), _directInit SET (stale), but disconnect was supposed to tear everything down. The pool's socket is now an orphan fd.

Pinned in test/connection-manager-disconnect-race.test.ts. Pre-fix: 2 of 3 tests fail (orphan pool's .end() was never called; post-disconnect getDirectPool() silently tried to reach the real Supabase URL and ENOTFOUNDs).

Fix

Three coordinated pieces:

 export class ConnectionManager {
   ...
+  /**
+   * Set by disconnect() so any in-flight initDirectPool() chain that
+   * resolves AFTER disconnect can detect the cancellation, close its
+   * now-orphaned pool, and skip writing to `_directPool`.
+   */
+  private _disconnected = false;
 
   private async getDirectPool(): Promise<Sql> {
     if (this._directPool) return this._directPool;
+    if (this._disconnected) {
+      throw new Error('connection-manager: disconnect() was called; create a new instance to reconnect');
+    }
     if (!this._directInit) {
       this._directInit = this.initDirectPool().then(pool => {
+        if (this._disconnected) {
+          pool.end().catch(() => { /* best-effort cleanup */ });
+          return null;
+        }
         this._directPool = pool;
         return pool;
       }).catch(err => { ... });
     }
     ...
   }
 
   async disconnect(): Promise<void> {
+    this._disconnected = true;
+    const pendingInit = this._directInit;
+    this._directInit = null;                      // ← lifted out of the guard
     if (this._directPool) {
       try { await this._directPool.end(); } catch {}
       this._directPool = null;
-      this._directInit = null;                    // ← was here
     }
+    if (pendingInit) {
+      try { await pendingInit; } catch {}         // ← orphan closes inside .then
+    }
     if (this._readPool && !this._readPoolOwnedExternally) { ... }
   }

Behavior change

getDirectPool() after disconnect() now throws connection-manager: disconnect() was called; create a new instance to reconnect instead of silently starting a fresh init. Matches the engine-ownership invariant test/worker-shutdown-disconnect.test.ts already pins — callers construct their own engine; once disposed, it stays disposed.

Test plan

New test/connection-manager-disconnect-race.test.ts (3 cases, all passing):

  • Race case — orphan pool closed exactly once, _directPool null after race, _directInit null, caller surfaces a clean error instead of silently holding a leaked pool
  • Post-disconnect guardgetDirectPool() after disconnect throws instead of silently re-initializing
  • Trivial idempotency — repeated disconnect on a fresh manager doesn't throw

RED-verified by stashing the fix and re-running: tests 1 and 2 fail on pre-fix code with the exact symptoms above.

Verification:

  • bun test test/connection-manager-disconnect-race.test.ts3 pass / 0 fail
  • bun test test/connection-manager.serial.test.ts24 existing pass / 0 fail (happy-path unchanged)
  • bun run verify — all 11 invariant gates green
  • Parallel-safe: spyOn on instance method (no mock.module, no env mutation) — meets R2 of scripts/check-test-isolation.sh without needing the .serial suffix
  • No new dependencies, no API surface change for happy-path callers

🤖 Generated with Claude Code

…pool resurrection

`ConnectionManager.disconnect()` had two race holes when a caller was
mid-flight in `getDirectPool()`:

  1. The `_directInit = null` line lived INSIDE the `if (this._directPool)`
     guard. If disconnect ran while init was still pending (pool not
     assigned yet), the guard failed and `_directInit` stayed pointing at
     the in-flight Promise. When that Promise resolved, its
     `.then(pool => { this._directPool = pool; ... })` callback fired
     and resurrected `_directPool` AFTER disconnect had supposedly
     torn everything down.

  2. The resurrected pool's socket was now held by `_directPool` with no
     tracking — a clean fd leak across rapid reconnect cycles (tests,
     Conductor worktrees, worker shutdown flows).

Pre-fix shape:

    async disconnect(): Promise<void> {
      if (this._directPool) {                     // guard fails when
        try { await this._directPool.end(); } catch {}  // init is in flight
        this._directPool = null;
        this._directInit = null;                  // ← only cleared inside guard
      }
      ...
    }

Fix wires three pieces:

  - new `_disconnected: boolean` flag, flipped first thing in disconnect()
  - `.then` callback in `getDirectPool` checks the flag: if disconnect
    raced ahead, close the orphan pool (`pool.end()`) and resolve null
  - `disconnect()` always clears `_directInit` (lifted out of the guard)
    AND awaits the captured pending Promise so the orphan close fires
    before disconnect resolves

Behavior change for callers: `getDirectPool()` after `disconnect()` now
throws `connection-manager: disconnect() was called; create a new
instance to reconnect` instead of silently spinning up a fresh init
chain. Matches the engine-ownership invariant
`test/worker-shutdown-disconnect.test.ts` already pins (callers
construct their own engine; once disposed, it stays disposed).

Test plan:
  - New test/connection-manager-disconnect-race.test.ts pins three
    invariants: orphan pool closed on race, no _directPool resurrection,
    post-disconnect calls throw cleanly. RED-verified against pre-fix
    code: tests 1 + 2 fail (orphan end() never called; post-disconnect
    silently tries to reach real Supabase, ENOTFOUNDs). With fix: 3/3 pass.
  - `bun test test/connection-manager.serial.test.ts` — 24 existing tests
    still pass (no behavioral change on the happy path)
  - `bun run verify` — all 11 invariant gates green
  - Parallel-safe: spyOn on instance method, no mock.module, no env
    mutation — does not need .serial suffix per R2 of
    scripts/check-test-isolation.sh

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant