Skip to content

feat(api-proxy): add WebSocket upgrade support to fix Codex /v1/responses streaming#1486

Merged
lpcox merged 4 commits intomainfrom
copilot/fix-websocket-upgrade-handling
Mar 28, 2026
Merged

feat(api-proxy): add WebSocket upgrade support to fix Codex /v1/responses streaming#1486
lpcox merged 4 commits intomainfrom
copilot/fix-websocket-upgrade-handling

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 28, 2026

The api-proxy sidecar had no upgrade event handler, so WebSocket connections from Codex CLI (ws://172.30.0.30:10000/v1/responses) were silently mishandled — exhausting retries (~90s) before falling back to broken HTTP, causing smoke test failures.

Approach

Raw CONNECT tunnel through Squid (Node.js built-ins only, no new dependencies):

client --[HTTP Upgrade]--> api-proxy --[CONNECT]--> Squid:3128 --[TLS]--> api.openai.com:443
  1. Validate upgrade type (websocket only) and URL (relative paths, SSRF prevention)
  2. Rate-limit the initial handshake (WS frames not tracked)
  3. CONNECT tunnel through Squid → TLS handshake with SNI
  4. Replay the GET … HTTP/1.1 / Upgrade: websocket request with auth injected, client-supplied auth stripped
  5. Bidirectional raw socket pipe for the session lifetime

Changes

  • server.js: proxyWebSocket(req, socket, head, targetHost, injectHeaders, provider, basePath) — full metrics (gaugeInc/Dec, requests_total, request_duration_ms) and structured logging (websocket_upgrade_start/complete/failed)
  • All four servers get upgrade handlers: OpenAI :10000, Anthropic :10001, Copilot :10002, OpenCode :10004; unconfigured OpenAI server returns 503
  • server.test.js: 12 new unit tests — validation rejection, CONNECT/TLS failure paths, auth injection, client-auth stripping, head byte forwarding, Squid proxy routing

📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.

Copilot AI and others added 2 commits March 28, 2026 16:37
Add proxyWebSocket() function using raw CONNECT tunnel through Squid
(no new dependencies). Handles WebSocket upgrade requests on all four
proxy servers: OpenAI (10000), Anthropic (10001), Copilot (10002),
OpenCode (10004).

- Validates upgrade type and URL (rejects non-WebSocket, SSRF prevention)
- Rate-limits the initial upgrade request
- Opens CONNECT tunnel through Squid to upstream:443
- TLS-upgrades the tunnel and replays Upgrade request with auth injected
- Strips client-supplied auth headers; injects proxy credentials
- Bidirectionally pipes raw TCP sockets for the WebSocket session
- Logs upgrade start/complete/failed events with metrics
- Adds 12 unit tests covering validation and tunnel paths"

Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/f1b847f0-b4fd-412c-b244-1132754fe11b

Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
…clarity

- Use separate close handlers for socket/tlsSocket to avoid double-destroy
- Add comment explaining WebSocket frame bytes in test fixture

Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/f1b847f0-b4fd-412c-b244-1132754fe11b

Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix WebSocket upgrade handling in api-proxy sidecar feat(api-proxy): add WebSocket upgrade support to fix Codex /v1/responses streaming Mar 28, 2026
Copilot AI requested a review from lpcox March 28, 2026 16:41
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 28, 2026

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 82.71% 82.81% 📈 +0.10%
Statements 82.37% 82.47% 📈 +0.10%
Functions 81.22% 81.22% ➡️ +0.00%
Branches 76.03% 76.08% 📈 +0.05%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/docker-manager.ts 86.0% → 86.5% (+0.42%) 85.5% → 86.0% (+0.41%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

The detection job (added by gh-aw v0.64.2 compiler) runs npm ci and
npm run build but has no Checkout repository step, causing npm ci to
fail with EUSAGE (missing package-lock.json).

Add checkout step to the lock file and update the postprocessing script
to automatically inject checkout steps before awf build steps in any
job that lacks one.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Results — PASS

💥 [THE END] — Illustrated by Smoke Claude for issue #1486

@github-actions
Copy link
Copy Markdown
Contributor

🤖 Smoke Test Results

GitHub MCP — Last 2 merged PRs: #1484 "[WIP] Fix failing GitHub Actions workflow agent", #1483 "fix: recompile smoke-codex workflow with gh-aw v0.64.2 to unblock github.com"
Playwright — github.com title contains "GitHub"
File Write/tmp/gh-aw/agent/smoke-test-copilot-23690297380.txt created and verified
Bash Tool — File contents confirmed via cat

Overall: PASS | Author: @Copilot | Assignees: @lpcox, @Copilot

📰 BREAKING: Report filed by Smoke Copilot for issue #1486

@github-actions
Copy link
Copy Markdown
Contributor

Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3 ❌ NO
Node.js v24.14.0 v20.20.1 ❌ NO
Go go1.22.12 go1.22.12 ✅ YES

Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot.

Tested by Smoke Chroot for issue #1486

@github-actions
Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx All passed ✅ PASS
Node.js execa All passed ✅ PASS
Node.js p-limit All passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #1486 ·

@lpcox lpcox marked this pull request as ready for review March 28, 2026 17:45
@lpcox lpcox requested a review from Mossaka as a code owner March 28, 2026 17:45
Copilot AI review requested due to automatic review settings March 28, 2026 17:45
@lpcox lpcox merged commit b9e0e86 into main Mar 28, 2026
57 of 60 checks passed
@lpcox lpcox deleted the copilot/fix-websocket-upgrade-handling branch March 28, 2026 17:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds WebSocket upgrade handling to the api-proxy sidecar so Codex CLI /v1/responses streaming works correctly through the Squid proxy, and updates CI smoke workflow post-processing to ensure the repo is checked out before local AWF build steps.

Changes:

  • Add proxyWebSocket(...) implementation and register upgrade handlers for OpenAI/Anthropic/Copilot/OpenCode proxy servers.
  • Add unit tests covering WebSocket upgrade validation, CONNECT/TLS failure modes, header injection/stripping, and head forwarding.
  • Ensure smoke workflows include a checkout step before the injected local build steps (and apply this to smoke-codex.lock.yml).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
containers/api-proxy/server.js Implements WebSocket upgrade tunneling via CONNECT→TLS through Squid; wires upgrade handlers.
containers/api-proxy/server.test.js Adds unit tests for proxyWebSocket validation and failure/success paths.
scripts/ci/postprocess-smoke-workflows.ts Injects checkout steps before “Install awf dependencies” when missing.
.github/workflows/smoke-codex.lock.yml Adds a checkout step before npm ci/npm run build in the smoke codex workflow.
Comments suppressed due to low confidence (1)

containers/api-proxy/server.js:631

  • The client socket only gets an 'error' handler inside the secureConnect callback. If the client disconnects or the socket errors earlier (during CONNECT/TLS), Node can emit an 'error' event with no listener, which can crash the process and/or leak active_requests (finalize never runs). Attach socket.once('error', …)/socket.once('close', …) immediately after validation (and have it call abort/finalize) so early failures are handled safely.
      // Finalize once when either side closes; destroy the other side.
      socket.once('close', () => { finalize(false); tlsSocket.destroy(); });
      tlsSocket.once('close', () => { finalize(false); socket.destroy(); });

      // Suppress unhandled-error crashes; destroy triggers the close handler.
      socket.on('error', () => socket.destroy());
      tlsSocket.on('error', () => tlsSocket.destroy());
    });

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +141 to +149
const checkoutStep = [
`${indent}- name: Checkout repository`,
`${indent} uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2`,
`${indent} with:`,
`${indent} persist-credentials: false`,
].join('\n');
lines.splice(i, 0, checkoutStep);
injectedCheckouts++;
i += 4; // Skip past the inserted lines
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkoutStep is built as a multi-line string and inserted via lines.splice(i, 0, checkoutStep), which adds only one array element containing embedded \ns. That makes i += 4 incorrect (it didn’t insert 4 elements) and can cause the loop to skip/behave incorrectly when there are multiple install steps in a file. Build checkoutStepLines as an array and splice with ...checkoutStepLines, then advance i by the number of inserted lines (or remove the manual increment).

Suggested change
const checkoutStep = [
`${indent}- name: Checkout repository`,
`${indent} uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2`,
`${indent} with:`,
`${indent} persist-credentials: false`,
].join('\n');
lines.splice(i, 0, checkoutStep);
injectedCheckouts++;
i += 4; // Skip past the inserted lines
const checkoutStepLines = [
`${indent}- name: Checkout repository`,
`${indent} uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2`,
`${indent} with:`,
`${indent} persist-credentials: false`,
];
lines.splice(i, 0, ...checkoutStepLines);
injectedCheckouts++;

Copilot uses AI. Check for mistakes.
Comment on lines +570 to +593
// ── Step 1: CONNECT tunnel through Squid to targetHost:443 ────────────
const connectReq = http.request({
host: proxyHost,
port: proxyPort,
method: 'CONNECT',
path: `${targetHost}:443`,
headers: { 'Host': `${targetHost}:443` },
});

connectReq.once('error', (err) => abort(`CONNECT error: ${err.message}`));

connectReq.once('connect', (connectRes, tunnel) => {
if (connectRes.statusCode !== 200) {
abort(`CONNECT failed: HTTP ${connectRes.statusCode}`, tunnel);
return;
}

// ── Step 2: TLS-upgrade the raw tunnel ──────────────────────────────
const tlsSocket = tls.connect({ socket: tunnel, servername: targetHost, rejectUnauthorized: true });

// Pre-TLS error handler: removed once TLS is established.
const onTlsError = (err) => abort(`TLS handshake error: ${err.message}`, tunnel);
tlsSocket.once('error', onTlsError);

Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CONNECT and TLS handshake path has no explicit timeouts (connectReq or the TLS socket). If Squid/upstream stalls mid-handshake, the sockets (and active_requests gauge) can hang indefinitely. Consider setting reasonable timeouts for CONNECT and TLS negotiation and calling abort(...) on timeout to ensure resources/metrics are released.

Copilot uses AI. Check for mistakes.
Comment on lines +343 to +350
describe('proxy configuration errors', () => {
it('returns 502 when HTTPS_PROXY is not configured', () => {
// The module was loaded without HTTPS_PROXY; proxyWebSocket should fail-safe.
const socket = makeMockSocket();
proxyWebSocket(makeUpgradeReq(), socket, Buffer.alloc(0), 'api.openai.com', {}, 'openai');
expect(socket.write).toHaveBeenCalledWith(expect.stringContaining('HTTP/1.1 502 Bad Gateway'));
expect(socket.destroy).toHaveBeenCalled();
});
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test assumes proxyWebSocket was loaded with HTTPS_PROXY unset, but ./server is required at the top of the file so the module-level HTTPS_PROXY constant will reflect whatever env the test runner already has. In environments where HTTPS_PROXY is set by default, this will fail. Make the test deterministic by deleting process.env.HTTPS_PROXY/HTTP_PROXY before requiring the module (e.g. via jest.isolateModules/jest.resetModules) and restoring afterward.

Copilot uses AI. Check for mistakes.
Comment on lines +453 to +466
return new Promise(resolve => setTimeout(() => {
// The upgrade request is written as a string to tlsSocket
const upgradeWrite = tlsSocket.write.mock.calls.find(
c => typeof c[0] === 'string' && c[0].startsWith('GET ')
);
expect(upgradeWrite).toBeDefined();
const upgradeReqStr = upgradeWrite[0];
expect(upgradeReqStr).toContain('Authorization: Bearer secret');
expect(upgradeReqStr).toContain('host: api.openai.com');
// Both sides should be piped
expect(tlsSocket.pipe).toHaveBeenCalledWith(socket);
expect(socket.pipe).toHaveBeenCalledWith(tlsSocket);
resolve();
}, 30));
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several of these tests rely on small fixed setTimeout(..., 30) delays to wait for async events, which can be flaky on slower CI machines. Prefer awaiting a specific event/condition (e.g., resolve a Promise when tlsSocket.write is called or when secureConnect is emitted) rather than sleeping for an arbitrary duration.

Copilot uses AI. Check for mistakes.
const indent = installMatch[1];
const checkoutStep = [
`${indent}- name: Checkout repository`,
`${indent} uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2`,
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This injects actions/checkout@de0fac2e... with a # v6.0.2 comment. Elsewhere in the repo the same SHA is commented as # v4 (e.g. .github/workflows/build.yml:28), so the version annotation is currently inconsistent. Consider using the same comment format/version across workflows to avoid confusion about what the SHA represents.

Suggested change
`${indent} uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2`,
`${indent} uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v4`,

Copilot uses AI. Check for mistakes.
Comment on lines +624 to +626
// Finalize once when either side closes; destroy the other side.
socket.once('close', () => { finalize(false); tlsSocket.destroy(); });
tlsSocket.once('close', () => { finalize(false); socket.destroy(); });
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finalize(false) is triggered solely by socket close events, so an upstream handshake rejection (e.g. 401/403/400 instead of 101 Switching Protocols) will still be logged/recorded as websocket_upgrade_complete with status_class: '1xx'. If you want metrics/logs to reflect actual upgrade success, consider parsing the first upstream response line/headers to confirm status=101 before marking success; otherwise treat early close as an upgrade failure.

This issue also appears on line 624 of the same file.

Suggested change
// Finalize once when either side closes; destroy the other side.
socket.once('close', () => { finalize(false); tlsSocket.destroy(); });
tlsSocket.once('close', () => { finalize(false); socket.destroy(); });
// Track whether the upstream actually completed the HTTP 101 upgrade.
let upgradeEstablished = false;
const inspectStatusLine = (chunk) => {
// Only need the first response chunk to determine if the upgrade succeeded.
try {
const str = chunk.toString('ascii');
const firstLineEnd = str.indexOf('\r\n');
const statusLine = firstLineEnd === -1 ? str : str.slice(0, firstLineEnd);
const match = /^HTTP\/1\.[01]\s+(\d{3})/.exec(statusLine);
if (match && match[1] === '101') {
upgradeEstablished = true;
}
} finally {
// Ensure we don't keep inspecting further data chunks.
tlsSocket.removeListener('data', inspectStatusLine);
}
};
tlsSocket.once('data', inspectStatusLine);
// Finalize once when either side closes; destroy the other side.
socket.once('close', () => { finalize(!upgradeEstablished); tlsSocket.destroy(); });
tlsSocket.once('close', () => { finalize(!upgradeEstablished); socket.destroy(); });

Copilot uses AI. Check for mistakes.
forwardHeaders[name] = value;
}
}
Object.assign(forwardHeaders, injectHeaders);
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proxyWebSocket generates a requestId when the client doesn't supply a valid x-request-id, but that generated ID is never forwarded upstream. In proxyRequest you always set headers['x-request-id'] = requestId; doing the same here would keep tracing/log correlation consistent for WebSocket upgrades too.

Suggested change
Object.assign(forwardHeaders, injectHeaders);
Object.assign(forwardHeaders, injectHeaders);
forwardHeaders['x-request-id'] = requestId; // Ensure consistent tracing for WebSocket upgrades

Copilot uses AI. Check for mistakes.
- name: Install Codex CLI
run: npm install -g @openai/codex@latest
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pinned actions/checkout@de0fac2e... SHA is annotated here as # v6.0.2, but the same SHA is annotated as # v4 in other workflows (e.g. .github/workflows/build.yml:28). Please align the version comment with the repo’s standard so the pin is unambiguous.

Suggested change
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v4

Copilot uses AI. Check for mistakes.
Comment on lines +810 to +816
opencodeServer.on('upgrade', (req, socket, head) => {
const anthropicHeaders = { 'x-api-key': ANTHROPIC_API_KEY };
if (!req.headers['anthropic-version']) {
anthropicHeaders['anthropic-version'] = '2023-06-01';
}
proxyWebSocket(req, socket, head, ANTHROPIC_API_TARGET, anthropicHeaders, 'opencode');
});
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the OpenCode server, the new WebSocket upgrade path uses provider: 'opencode', but the existing HTTP handler just above still calls proxyRequest(...) without passing a provider (so metrics/rate limiting labels differ and may end up as provider=undefined). Consider updating the HTTP path to pass 'opencode' (and basePath if relevant) so both HTTP and WS requests are attributed consistently.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

api-proxy sidecar does not support WebSocket upgrades (Codex /v1/responses streaming fails)

3 participants