Skip to content

Add client-side detection of load balancer idle timeout scenarios#662

Closed
Claude wants to merge 5 commits into
mainfrom
claude/add-client-side-lb-timeout-detection
Closed

Add client-side detection of load balancer idle timeout scenarios#662
Claude wants to merge 5 commits into
mainfrom
claude/add-client-side-lb-timeout-detection

Conversation

@Claude

@Claude Claude AI commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements client-side detection of potential load balancer timeout scenarios by monitoring packet activity. A new idle_packet_timeout configuration parameter (default: 5 minutes) tracks when no data (headers or body chunks) is received from the server, helping distinguish LB idle timeouts from query execution timeouts.

Changes

Configuration

  • Added idle_packet_timeout parameter to BaseClickHouseClientConfigOptions (default: 300,000ms)
  • Configurable per-client or can be disabled by setting to 0

Implementation

  • Idle packet timer starts when request begins
  • Timer resets on response headers receipt
  • Timer resets on each data chunk received from response stream
  • Timer cleared when stream ends, errors, or closes
  • On timeout, request aborted with descriptive error: "Idle packet timeout: no data received from the server for Xms. This might indicate a load balancer idle timeout."
  • Comprehensive logging at TRACE and WARN levels

Usage Example

const client = createClient({
  request_timeout: 400_000,      // Total request timeout
  idle_packet_timeout: 300_000,  // Abort if no packets for 5 minutes
})

For long-running queries (e.g., INSERT FROM SELECT), this helps detect when a load balancer silently drops the connection while the server is still processing, rather than waiting for the full request_timeout to expire.

Checklist

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG
  • For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

Claude AI and others added 2 commits April 17, 2026 22:14
Agent-Logs-Url: https://github.com/ClickHouse/clickhouse-js/sessions/bc795cae-2b89-41b5-9655-6887a1adddf9

Co-authored-by: peter-leonov-ch <209667683+peter-leonov-ch@users.noreply.github.com>
@CLAassistant

CLAassistant commented Apr 17, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 3 committers have signed the CLA.

✅ peter-leonov-ch
❌ Copilot
❌ Claude
You have signed the CLA already but the status is still pending? Let us recheck it.

@Claude Claude AI requested review from Copilot and removed request for Copilot April 17, 2026 22:24
@Claude Claude AI requested a review from peter-leonov-ch April 17, 2026 22:25
…tion or class'

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 5, 2026 11:36
@peter-leonov-ch peter-leonov-ch marked this pull request as ready for review May 5, 2026 11:37
@peter-leonov-ch peter-leonov-ch requested a review from mshustov as a code owner May 5, 2026 11:37
@peter-leonov-ch

Copy link
Copy Markdown
Collaborator

@copilot resolve the merge conflicts in this pull request

…e-lb-timeout-detection

Co-authored-by: peter-leonov-ch <209667683+peter-leonov-ch@users.noreply.github.com>

Copilot AI commented May 5, 2026

Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

Merged origin/main into this branch (commit 92b8981). The only conflict was in packages/client-common/src/config.ts — resolved by keeping request_timeout as the pre-computed variable (from main) and retaining the idle_packet_timeout line. All 116 unit tests pass.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new idle_packet_timeout client configuration option to detect “no response activity” periods that may indicate load balancer idle timeouts, and documents the recommended mitigation strategies.

Changes:

  • Added idle_packet_timeout to shared config/connection params with a default of 300,000ms (5 minutes).
  • Implemented Node.js-side idle packet monitoring in the socket pool (currently via WARN logging).
  • Added Node.js integration coverage and updated troubleshooting docs + changelog entry.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
skills/clickhouse-js-node-troubleshooting/reference/socket-hangup.md Documents idle_packet_timeout as a recommended approach for diagnosing LB idle drops and adds guidance structure (options).
packages/client-node/src/connection/socket_pool.ts Adds idle-packet inactivity tracking and WARN logging during streaming responses.
packages/client-node/tests/unit/node_client.test.ts Updates unit test fixtures to include the new connection param.
packages/client-node/tests/integration/node_idle_packet_timeout.test.ts Adds integration tests covering warn/no-warn/disabled scenarios.
packages/client-common/src/connection.ts Extends exported ConnectionParams with idle_packet_timeout.
packages/client-common/src/config.ts Adds the new config option docs and defaults it in getConnectionParams.
CHANGELOG.md Adds a 1.18.3 changelog entry describing the new option and usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +375 to +381
// Set up data event listener to reset idle packet timeout on each chunk
if (idlePacketTimeout > 0) {
responseStream.on('data', resetIdlePacketTimeout)
// Clean up the timer when stream ends or errors
responseStream.once('end', clearIdlePacketTimeout)
responseStream.once('error', clearIdlePacketTimeout)
responseStream.once('close', clearIdlePacketTimeout)
Comment on lines +163 to +190
const resetIdlePacketTimeout = () => {
if (idlePacketTimeout > 0) {
if (idlePacketTimeoutHandle) {
clearTimeout(idlePacketTimeoutHandle)
}
idlePacketTimeoutHandle = setTimeout(() => {
// Only emit warning once to avoid log spam
if (
!idlePacketWarningEmitted &&
log_level <= ClickHouseLogLevel.WARN
) {
idlePacketWarningEmitted = true
log_writer.warn({
message: `${op}: no data received from the server for ${idlePacketTimeout}ms. The connection might be dropped by a load balancer soon. Consider using send_progress_in_http_headers ClickHouse setting to keep the connection alive.`,
args: {
operation: op,
connection_id: this.connectionId,
query_id,
request_id,
idle_packet_timeout_ms: idlePacketTimeout,
},
})
}
// Reset timer to potentially warn again if still no data
resetIdlePacketTimeout()
}, idlePacketTimeout)
}
}
Comment on lines 154 to +166
const start = Date.now()
const request = this.createClientRequest(params)
const request_id = this.getNewRequestId()

return new Promise((resolve, reject) => {
// Idle packet timeout tracking - warns when no packets received
let idlePacketTimeoutHandle: ReturnType<typeof setTimeout> | undefined
let idlePacketWarningEmitted = false

const resetIdlePacketTimeout = () => {
if (idlePacketTimeout > 0) {
if (idlePacketTimeoutHandle) {
clearTimeout(idlePacketTimeoutHandle)
Comment on lines +169 to +177
// Only emit warning once to avoid log spam
if (
!idlePacketWarningEmitted &&
log_level <= ClickHouseLogLevel.WARN
) {
idlePacketWarningEmitted = true
log_writer.warn({
message: `${op}: no data received from the server for ${idlePacketTimeout}ms. The connection might be dropped by a load balancer soon. Consider using send_progress_in_http_headers ClickHouse setting to keep the connection alive.`,
args: {
/** Idle packet timeout in milliseconds. If no data (headers or body chunks) is received
* from the server for this duration, a warning will be logged to help detect potential
* load balancer idle connection timeouts. The connection remains active and is not aborted.
* Set to 0 to disable this check.
export interface ConnectionParams {
url: URL
request_timeout: number
idle_packet_timeout: number
Comment on lines +52 to +70
// Should succeed and emit a warning
const rs = await client.query({ query: 'SELECT 1' })
const text = await rs.text()
expect(text).toContain('initial data')
expect(text).toContain('final data')

// Should have emitted a warning about no data received
expect(warnSpy).toHaveBeenCalled()
const warningCall = warnSpy.mock.calls.find((call) =>
call[0].message?.includes('no data received from the server'),
)
expect(warningCall).toBeDefined()
expect(warningCall[0].message).toContain(
'might be dropped by a load balancer',
)

await client.close()
await closeServer(server)
})
@peter-leonov-ch

Copy link
Copy Markdown
Collaborator

Yeah, too intrusive. Gonna take another turn after some changed to the core.

@peter-leonov-ch peter-leonov-ch deleted the claude/add-client-side-lb-timeout-detection branch May 6, 2026 12:00
peter-leonov-ch added a commit that referenced this pull request May 9, 2026
`AGENTS.md` had drifted from reality after PRs #662#712: it claimed
`examples/web` has no `performance/` folder, pointed at a placeholder
docs path, and said nothing about the `skills/` directory or the
`.claude/skills/setup/` skill that replaced `copilot-setup-steps.yml`.

## Summary

- **Examples › `performance/`** — drop the "no `performance/` folder
under `examples/web`" claim; #701 added one for browser-safe streaming.
- **Log-message docs guidance** — point at the real `docs/` layout from
#677 (`docs/socket_hang_up_econnreset.md`, `docs/howto/`) instead of a
fictional `docs/example-log-message.md`. The example URL in the snippet
now also resolves to a real page (`docs/socket_hang_up_econnreset.md`).
- **New `Skills` section** — documents the repo-root `skills/` directory
(#681, #682, #702): `client-node`'s `prepack` ships it via
`@clickhouse/client`, the `agents.skills` field of
`packages/client-node/package.json` declares which skills are
discoverable, and the `Skills E2E` workflow
(`.github/workflows/e2e-skills.yml`, backed by
`tests/e2e/skills/check.js`) asserts the packaged tarball contains the
declared skills. Routes contributors to `.claude/skills/setup/SKILL.md`
(#705) instead of re-introducing `copilot-setup-steps.yml`.
- **`agents.skills` manifest** — adds `clickhouse-js-node-coding`
alongside `clickhouse-js-node-troubleshooting` so both shipped skills
are discoverable to downstream tooling. `tests/e2e/skills/check.js` is
extended to verify both skills land in the packaged `@clickhouse/client`
tarball.
- **New `Embedded docs` section** — surfaces `docs/` as the preferred
home for pages linked from log messages.
- **CHANGELOG guidance** — aligns with the PR template: entry goes in
the PR description and is folded into `CHANGELOG.md` at release time.

Documentation refresh plus a small manifest/test change to make the
second shipped skill discoverable.

## Checklist

- [x] A human-readable description of the changes was provided to
include in CHANGELOG

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: peter-leonov-ch <209667683+peter-leonov-ch@users.noreply.github.com>
Co-authored-by: Peter Leonov <peter.leonov@clickhouse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants