-
Notifications
You must be signed in to change notification settings - Fork 18
api-proxy sidecar does not support WebSocket upgrades (Codex /v1/responses streaming fails) #1485
Description
Problem
The api-proxy sidecar (containers/api-proxy/server.js) does not handle HTTP WebSocket upgrade requests. When the Codex CLI connects to the OpenAI proxy at ws://172.30.0.30:10000/v1/responses to stream responses via WebSocket, the connection fails because the proxy treats the upgrade request as a normal HTTP request.
Observed behavior (Smoke Codex CI)
From run 23688803182:
ERROR: Reconnecting... 2/5
ERROR: Reconnecting... 3/5
ERROR: Reconnecting... 4/5
ERROR: Reconnecting... 5/5
WARN: falling back to HTTP
After exhausting WebSocket retries the Codex CLI falls back to HTTP, but the agent ultimately produces no safe outputs (no PR comments, no labels), causing the smoke test to fail.
Root cause
The api-proxy is a plain http.createServer() with only request/response handling:
- No
.on('upgrade')event handler — WebSocket upgrades are not intercepted - No
wsor WebSocket library in dependencies - No HTTP 101 Switching Protocols handling
When a WebSocket upgrade request arrives (Connection: Upgrade, Upgrade: websocket):
- The proxy treats it as a normal HTTP GET
- Collects the (empty) body via
req.on('data')/req.on('end') - Forwards via
https.request()to the upstream API — which fails because the WebSocket handshake is lost - The upstream API rejects it (not a valid HTTP request) and the connection drops
Impact
- Codex CLI's preferred transport (WebSocket streaming via
/v1/responses) is broken - HTTP fallback works for the API call itself, but causes instability and wasted time on retries (~90 seconds of reconnection attempts)
- Smoke Codex tests fail intermittently because the agent runs out of time budget after retry delays
Proposed fix
Option A: WebSocket tunnel forwarding (recommended)
Add an upgrade event handler to each HTTP server that tunnels the raw TCP socket to the upstream API server, injecting auth headers during the initial HTTP upgrade request:
const { WebSocket } = require('ws');
server.on('upgrade', (req, socket, head) => {
// Build upstream WebSocket URL
const upstreamUrl = `wss://${OPENAI_API_TARGET}${req.url}`;
// Create upstream WebSocket with auth headers
const upstream = new WebSocket(upstreamUrl, {
headers: {
'Authorization': `Bearer ${OPENAI_API_KEY}`,
...req.headers,
},
agent: new HttpsProxyAgent(proxyUrl), // route through Squid
});
// On upstream open, complete the client handshake
upstream.on('open', () => {
socket.write(
'HTTP/1.1 101 Switching Protocols\r\n' +
'Upgrade: websocket\r\n' +
'Connection: Upgrade\r\n\r\n'
);
// Pipe raw sockets bidirectionally
upstream._socket.pipe(socket);
socket.pipe(upstream._socket);
});
upstream.on('error', (err) => {
logRequest('error', 'websocket_upgrade_failed', { error: err.message });
socket.destroy();
});
});Dependencies to add: ws (npm package)
Option B: Raw socket tunnel (simpler, no ws dependency)
Use Node.js net/tls to establish a raw TCP tunnel to the upstream server through Squid (via CONNECT), then replay the upgrade request with injected auth headers:
server.on('upgrade', (req, socket, head) => {
// Establish CONNECT tunnel through Squid to upstream
const connectReq = http.request({
host: SQUID_HOST, port: SQUID_PORT,
method: 'CONNECT',
path: `${OPENAI_API_TARGET}:443`,
});
connectReq.on('connect', (_, upstream) => {
// TLS upgrade on the tunnel
const tls = require('tls');
const tlsSocket = tls.connect({ socket: upstream, servername: OPENAI_API_TARGET });
// Replay the upgrade request with auth
const headers = { ...req.headers, 'Authorization': `Bearer ${OPENAI_API_KEY}` };
tlsSocket.write(`GET ${req.url} HTTP/1.1\r\nHost: ${OPENAI_API_TARGET}\r\n`);
for (const [k, v] of Object.entries(headers)) tlsSocket.write(`${k}: ${v}\r\n`);
tlsSocket.write('\\r\\n');
// Bidirectional pipe
tlsSocket.pipe(socket);
socket.pipe(tlsSocket);
});
connectReq.end();
});This avoids adding the ws dependency but requires more careful error handling.
Additional considerations
- All four proxy servers (OpenAI:10000, Anthropic:10001, Copilot:10002, OpenCode:10004) should add upgrade handling, even if only OpenAI uses it today
- Rate limiting should apply to the initial upgrade request (not per-frame)
- Logging should record WebSocket upgrade attempts and outcomes
- The Squid proxy already allows CONNECT tunnels for HTTPS, so WebSocket-over-TLS should work through the existing ACL rules
- Tests should verify WebSocket upgrade + auth injection + Squid routing
References
- CI failure: https://github.com/github/gh-aw-firewall/actions/runs/23688803182/job/69012711966
- api-proxy source:
containers/api-proxy/server.js - Codex CLI Responses API: uses
ws://transport for/v1/responsesstreaming