Skip to content

fix: auto-reconnect connection pool before batch retry#1615

Closed
justemu wants to merge 4 commits into
garrytan:masterfrom
justemu:fix/batch-retry-auto-reconnect
Closed

fix: auto-reconnect connection pool before batch retry#1615
justemu wants to merge 4 commits into
garrytan:masterfrom
justemu:fix/batch-retry-auto-reconnect

Conversation

@justemu

@justemu justemu commented May 29, 2026

Copy link
Copy Markdown

Problem

The batchRetry helper retries transient connection errors but never re-establishes the database connection between attempts. When the postgres.js pool drops (network blip, Supavisor circuit-breaker) or the module-level db.sql is nulled mid-cycle ("No database connection: connect() has not been called"), every retry fails with the same error and all rows in the batch are silently lost.

Observed in the wild: extract.links_fs reports connection blip, retrying (1/3)... (2/3)... (3/3) followed by batch error (66 link rows lost): No database connection: connect() has not been called.

Fix

Two minimal changes:

  1. PostgresEngine.batchRetry(): call this.reconnect() before each retry attempt. reconnect() already handles both instance-level and module-level connection styles, has its own _reconnecting mutex to prevent races, and no-ops if the pool is healthy.

  2. withRetry (retry.ts): await the onRetry callback so async reconnect operations complete before the inter-attempt delay starts. Non-async callbacks resolve immediately (backward-compatible interface change: void | Promise<void>).

Testing

  • bun run build — compiles cleanly
  • bun test test/core/retry.test.ts test/retry-matcher.test.ts — all 52 tests pass
  • gbrain dream --phase extract --json — cycle completes cleanly on 365-page brain

Closes #1603

justemu added 4 commits May 27, 2026 07:56
…user-provided models

Two fixes in gateway.ts:

1. Embed preflight (): When a user configures
   ,
   the preflight incorrectly returns  because
   it checks  without verifying whether the
   model string actually carries a model id. Add  guard
   so that a non-empty model id after the colon in
   satisfies the preflight.

2. Chat response normalization: AI SDK v6 returns reasoning-model output
   (e.g. DeepSeek ) as typed  blocks in
    or as . The current code skips
   these, producing empty text output when a reasoning model responds
   without a separate text part. Handle both paths:
   - In the  array: treat  as text
   - In the flat fallback: check  when  is empty
DeepSeek v4-flash defaults to thinking mode, producing ~50% unnecessary
output tokens for gbrain's non-reasoning use cases. The same risk exists
for any OpenAI-compatible provider (Ollama, OpenRouter, Groq, Together,
llama-server, etc.) that may add thinking support in the future.

Rather than targeting only DeepSeek, apply the thinking disable to ALL
openai-compatible providers (recipe.implementation === 'openai-compatible').
The @ai-sdk/openai-compatible provider spreads providerOptions[recipe.id]
into the request body; providers that don't recognize 'thinking' silently
ignore the parameter — 100% safe.

Does not affect native providers (OpenAI o-series, Anthropic Claude, Google
Gemini) which have different parameter names for reasoning control.
The batchRetry helper retries transient connection errors but never
re-establishes the database connection between attempts. When the
postgres.js pool drops (network blip, Supavisor circuit-breaker) or
the module-level db.sql is nulled mid-cycle, every retry fails with
the same 'No database connection' error and all rows in the batch
are silently lost.

Fix:
1. PostgresEngine.batchRetry(): call this.reconnect() before each
   retry attempt. reconnect() already handles both instance-level
   and module-level connection styles, has its own _reconnecting
   mutex to prevent races, and no-ops if the pool is healthy.
2. withRetry (retry.ts): await the onRetry callback so async
   reconnect operations complete before the inter-attempt delay
   starts. Non-async callbacks resolve immediately (backward-
   compatible interface change: void | Promise<void>).

Fixes garrytan#1603
@garrytan

garrytan commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Thanks for this! The Postgres module-singleton / "connect() has not been called" ownership bug was fixed canonically in #1805 (atomic owner-only-disconnect), with follow-ups in #1608 and #1572 — all merged to master. Closing as already-resolved. If you still reproduce after upgrading, please reopen, and thank you for the contribution.

@garrytan garrytan closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

getConfig lacks retry protection on transient connection loss

2 participants