Add client-side detection of load balancer idle timeout scenarios#662
Add client-side detection of load balancer idle timeout scenarios#662Claude wants to merge 5 commits into
Conversation
Agent-Logs-Url: https://github.com/ClickHouse/clickhouse-js/sessions/bc795cae-2b89-41b5-9655-6887a1adddf9 Co-authored-by: peter-leonov-ch <209667683+peter-leonov-ch@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ClickHouse/clickhouse-js/sessions/bc795cae-2b89-41b5-9655-6887a1adddf9 Co-authored-by: peter-leonov-ch <209667683+peter-leonov-ch@users.noreply.github.com>
|
|
Agent-Logs-Url: https://github.com/ClickHouse/clickhouse-js/sessions/0103dea9-11ff-4437-950f-39c8d577be11 Co-authored-by: peter-leonov-ch <209667683+peter-leonov-ch@users.noreply.github.com>
…tion or class' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
|
@copilot resolve the merge conflicts in this pull request |
…e-lb-timeout-detection Co-authored-by: peter-leonov-ch <209667683+peter-leonov-ch@users.noreply.github.com>
Merged |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new idle_packet_timeout client configuration option to detect “no response activity” periods that may indicate load balancer idle timeouts, and documents the recommended mitigation strategies.
Changes:
- Added
idle_packet_timeoutto shared config/connection params with a default of 300,000ms (5 minutes). - Implemented Node.js-side idle packet monitoring in the socket pool (currently via WARN logging).
- Added Node.js integration coverage and updated troubleshooting docs + changelog entry.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/clickhouse-js-node-troubleshooting/reference/socket-hangup.md | Documents idle_packet_timeout as a recommended approach for diagnosing LB idle drops and adds guidance structure (options). |
| packages/client-node/src/connection/socket_pool.ts | Adds idle-packet inactivity tracking and WARN logging during streaming responses. |
| packages/client-node/tests/unit/node_client.test.ts | Updates unit test fixtures to include the new connection param. |
| packages/client-node/tests/integration/node_idle_packet_timeout.test.ts | Adds integration tests covering warn/no-warn/disabled scenarios. |
| packages/client-common/src/connection.ts | Extends exported ConnectionParams with idle_packet_timeout. |
| packages/client-common/src/config.ts | Adds the new config option docs and defaults it in getConnectionParams. |
| CHANGELOG.md | Adds a 1.18.3 changelog entry describing the new option and usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Set up data event listener to reset idle packet timeout on each chunk | ||
| if (idlePacketTimeout > 0) { | ||
| responseStream.on('data', resetIdlePacketTimeout) | ||
| // Clean up the timer when stream ends or errors | ||
| responseStream.once('end', clearIdlePacketTimeout) | ||
| responseStream.once('error', clearIdlePacketTimeout) | ||
| responseStream.once('close', clearIdlePacketTimeout) |
| const resetIdlePacketTimeout = () => { | ||
| if (idlePacketTimeout > 0) { | ||
| if (idlePacketTimeoutHandle) { | ||
| clearTimeout(idlePacketTimeoutHandle) | ||
| } | ||
| idlePacketTimeoutHandle = setTimeout(() => { | ||
| // Only emit warning once to avoid log spam | ||
| if ( | ||
| !idlePacketWarningEmitted && | ||
| log_level <= ClickHouseLogLevel.WARN | ||
| ) { | ||
| idlePacketWarningEmitted = true | ||
| log_writer.warn({ | ||
| message: `${op}: no data received from the server for ${idlePacketTimeout}ms. The connection might be dropped by a load balancer soon. Consider using send_progress_in_http_headers ClickHouse setting to keep the connection alive.`, | ||
| args: { | ||
| operation: op, | ||
| connection_id: this.connectionId, | ||
| query_id, | ||
| request_id, | ||
| idle_packet_timeout_ms: idlePacketTimeout, | ||
| }, | ||
| }) | ||
| } | ||
| // Reset timer to potentially warn again if still no data | ||
| resetIdlePacketTimeout() | ||
| }, idlePacketTimeout) | ||
| } | ||
| } |
| const start = Date.now() | ||
| const request = this.createClientRequest(params) | ||
| const request_id = this.getNewRequestId() | ||
|
|
||
| return new Promise((resolve, reject) => { | ||
| // Idle packet timeout tracking - warns when no packets received | ||
| let idlePacketTimeoutHandle: ReturnType<typeof setTimeout> | undefined | ||
| let idlePacketWarningEmitted = false | ||
|
|
||
| const resetIdlePacketTimeout = () => { | ||
| if (idlePacketTimeout > 0) { | ||
| if (idlePacketTimeoutHandle) { | ||
| clearTimeout(idlePacketTimeoutHandle) |
| // Only emit warning once to avoid log spam | ||
| if ( | ||
| !idlePacketWarningEmitted && | ||
| log_level <= ClickHouseLogLevel.WARN | ||
| ) { | ||
| idlePacketWarningEmitted = true | ||
| log_writer.warn({ | ||
| message: `${op}: no data received from the server for ${idlePacketTimeout}ms. The connection might be dropped by a load balancer soon. Consider using send_progress_in_http_headers ClickHouse setting to keep the connection alive.`, | ||
| args: { |
| /** Idle packet timeout in milliseconds. If no data (headers or body chunks) is received | ||
| * from the server for this duration, a warning will be logged to help detect potential | ||
| * load balancer idle connection timeouts. The connection remains active and is not aborted. | ||
| * Set to 0 to disable this check. |
| export interface ConnectionParams { | ||
| url: URL | ||
| request_timeout: number | ||
| idle_packet_timeout: number |
| // Should succeed and emit a warning | ||
| const rs = await client.query({ query: 'SELECT 1' }) | ||
| const text = await rs.text() | ||
| expect(text).toContain('initial data') | ||
| expect(text).toContain('final data') | ||
|
|
||
| // Should have emitted a warning about no data received | ||
| expect(warnSpy).toHaveBeenCalled() | ||
| const warningCall = warnSpy.mock.calls.find((call) => | ||
| call[0].message?.includes('no data received from the server'), | ||
| ) | ||
| expect(warningCall).toBeDefined() | ||
| expect(warningCall[0].message).toContain( | ||
| 'might be dropped by a load balancer', | ||
| ) | ||
|
|
||
| await client.close() | ||
| await closeServer(server) | ||
| }) |
|
Yeah, too intrusive. Gonna take another turn after some changed to the core. |
`AGENTS.md` had drifted from reality after PRs #662–#712: it claimed `examples/web` has no `performance/` folder, pointed at a placeholder docs path, and said nothing about the `skills/` directory or the `.claude/skills/setup/` skill that replaced `copilot-setup-steps.yml`. ## Summary - **Examples › `performance/`** — drop the "no `performance/` folder under `examples/web`" claim; #701 added one for browser-safe streaming. - **Log-message docs guidance** — point at the real `docs/` layout from #677 (`docs/socket_hang_up_econnreset.md`, `docs/howto/`) instead of a fictional `docs/example-log-message.md`. The example URL in the snippet now also resolves to a real page (`docs/socket_hang_up_econnreset.md`). - **New `Skills` section** — documents the repo-root `skills/` directory (#681, #682, #702): `client-node`'s `prepack` ships it via `@clickhouse/client`, the `agents.skills` field of `packages/client-node/package.json` declares which skills are discoverable, and the `Skills E2E` workflow (`.github/workflows/e2e-skills.yml`, backed by `tests/e2e/skills/check.js`) asserts the packaged tarball contains the declared skills. Routes contributors to `.claude/skills/setup/SKILL.md` (#705) instead of re-introducing `copilot-setup-steps.yml`. - **`agents.skills` manifest** — adds `clickhouse-js-node-coding` alongside `clickhouse-js-node-troubleshooting` so both shipped skills are discoverable to downstream tooling. `tests/e2e/skills/check.js` is extended to verify both skills land in the packaged `@clickhouse/client` tarball. - **New `Embedded docs` section** — surfaces `docs/` as the preferred home for pages linked from log messages. - **CHANGELOG guidance** — aligns with the PR template: entry goes in the PR description and is folded into `CHANGELOG.md` at release time. Documentation refresh plus a small manifest/test change to make the second shipped skill discoverable. ## Checklist - [x] A human-readable description of the changes was provided to include in CHANGELOG --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: peter-leonov-ch <209667683+peter-leonov-ch@users.noreply.github.com> Co-authored-by: Peter Leonov <peter.leonov@clickhouse.com>
Summary
Implements client-side detection of potential load balancer timeout scenarios by monitoring packet activity. A new
idle_packet_timeoutconfiguration parameter (default: 5 minutes) tracks when no data (headers or body chunks) is received from the server, helping distinguish LB idle timeouts from query execution timeouts.Changes
Configuration
idle_packet_timeoutparameter toBaseClickHouseClientConfigOptions(default: 300,000ms)0Implementation
"Idle packet timeout: no data received from the server for Xms. This might indicate a load balancer idle timeout."Usage Example
For long-running queries (e.g.,
INSERT FROM SELECT), this helps detect when a load balancer silently drops the connection while the server is still processing, rather than waiting for the fullrequest_timeoutto expire.Checklist