fix: decompress gzip responses for Anthropic token extraction#1550
fix: decompress gzip responses for Anthropic token extraction#1550
Conversation
Claude Code CLI uses WebSocket streaming to the Anthropic API, which routes through proxyWebSocket() instead of proxyRequest(). The proxyWebSocket function did not call trackTokenUsage(), so all Anthropic/Claude token usage went unrecorded. This adds: - parseWebSocketFrames(): lightweight server→client frame parser - trackWebSocketTokenUsage(): sniffs upstream TLS socket data events, skips HTTP 101 header, parses WebSocket text frames, and extracts token usage using existing extractUsageFromSseLine() - 12 new tests for frame parsing and WebSocket token extraction The fix is non-blocking: it adds a data listener alongside the existing bidirectional pipe relay, with no impact on latency or throughput. Closes #1536 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
There was a problem hiding this comment.
Pull request overview
Adds token usage tracking for Anthropic/Claude traffic that streams over WebSockets (which previously bypassed the existing HTTP/SSE token tracker), ensuring token-usage records are produced for Claude smoke runs.
Changes:
- Added a lightweight WebSocket frame parser and a WebSocket token-usage tracker that sniffs upstream TLS socket data and extracts usage from JSON text frames.
- Wired WebSocket token tracking into
proxyWebSocket()alongside the existing bidirectional socket piping. - Added unit tests covering WebSocket frame parsing and WebSocket token usage extraction/finalization behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| containers/api-proxy/token-tracker.js | Adds parseWebSocketFrames() and trackWebSocketTokenUsage() and logs token usage for WebSocket-based Anthropic streaming. |
| containers/api-proxy/server.js | Calls trackWebSocketTokenUsage() from the WebSocket proxy path and adds safe fallback when token-tracker is unavailable. |
| containers/api-proxy/token-tracker.test.js | Adds test coverage for WebSocket frame parsing and token extraction/finalization. |
Comments suppressed due to low confidence (1)
containers/api-proxy/token-tracker.test.js:25
- The test sets
process.env.AWF_TOKEN_LOG_DIRinbeforeAll, butrequire('./token-tracker')happens at module top-level before that. SinceTOKEN_LOG_DIR/TOKEN_LOG_FILEare computed at require-time, the env override won’t take effect and tests may attempt to write under/var/log/api-proxy. Set the env var before requiring the module (or refactor token-tracker to read the env var lazily).
const {
extractUsageFromJson,
extractUsageFromSseLine,
parseSseDataLines,
parseWebSocketFrames,
normalizeUsage,
isStreamingResponse,
trackTokenUsage,
trackWebSocketTokenUsage,
} = require('./token-tracker');
const { EventEmitter } = require('events');
const os = require('os');
const path = require('path');
const fs = require('fs');
// Redirect token log output to a temp dir to avoid /var/log permission errors
let tmpLogDir;
beforeAll(() => {
tmpLogDir = fs.mkdtempSync(path.join(os.tmpdir(), 'token-tracker-test-'));
process.env.AWF_TOKEN_LOG_DIR = tmpLogDir;
});
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| provider, | ||
| model: streamingModel || 'unknown', | ||
| path: reqPath, | ||
| status: 200, |
There was a problem hiding this comment.
In WebSocket token usage records, status is hard-coded to 200. For a WebSocket upgrade the actual HTTP status is 101 (and the surrounding WebSocket request metrics/logging treat it as 1xx). Please set this to 101 (or capture the real upgrade status) so downstream analysis doesn’t misclassify these entries as normal 200 HTTP responses.
| status: 200, | |
| status: 101, |
| upstreamSocket.on('data', (chunk) => { | ||
| totalBytes += chunk.length; | ||
| buffer = Buffer.concat([buffer, chunk]); | ||
|
|
||
| // Safety: drop buffer if it grows too large (malformed frames) | ||
| if (buffer.length > MAX_WS_BUFFER) { | ||
| buffer = Buffer.alloc(0); | ||
| httpHeaderParsed = true; // skip header parsing | ||
| return; | ||
| } | ||
|
|
||
| // Skip the HTTP 101 Switching Protocols response header | ||
| if (!httpHeaderParsed) { | ||
| const headerEnd = buffer.indexOf('\r\n\r\n'); | ||
| if (headerEnd === -1) return; // need more data for full header | ||
| buffer = buffer.slice(headerEnd + 4); | ||
| httpHeaderParsed = true; |
There was a problem hiding this comment.
response_bytes in trackWebSocketTokenUsage currently counts the initial HTTP 101 upgrade header bytes as well as WebSocket frame bytes. In trackTokenUsage (HTTP) response_bytes counts only the response body, so this makes the field inconsistent across transports. Consider tracking header bytes separately and reporting only post-header WebSocket payload bytes as response_bytes (or renaming/adding a field to clarify semantics).
| if (masked) headerSize += 4; // skip masking key | ||
|
|
||
| const frameEnd = pos + headerSize + payloadLength; | ||
| if (frameEnd > buf.length) break; | ||
|
|
||
| // Extract text frames (opcode 1) with FIN set | ||
| if (opcode === 1 && fin) { | ||
| messages.push(buf.slice(pos + headerSize, frameEnd).toString('utf8')); |
There was a problem hiding this comment.
parseWebSocketFrames advances past the masking key when masked is set, but it does not unmask the payload before decoding it as UTF-8. That means masked text frames would produce corrupted messages. Since this parser is exported and already branches on masked, either properly unmask the payload or explicitly treat masked frames as unsupported and skip returning them.
| if (masked) headerSize += 4; // skip masking key | |
| const frameEnd = pos + headerSize + payloadLength; | |
| if (frameEnd > buf.length) break; | |
| // Extract text frames (opcode 1) with FIN set | |
| if (opcode === 1 && fin) { | |
| messages.push(buf.slice(pos + headerSize, frameEnd).toString('utf8')); | |
| let maskingKey = null; | |
| if (masked) { | |
| // Ensure we have enough bytes for the masking key | |
| if (pos + headerSize + 4 > buf.length) break; | |
| maskingKey = buf.slice(pos + headerSize, pos + headerSize + 4); | |
| headerSize += 4; | |
| } | |
| const payloadStart = pos + headerSize; | |
| const frameEnd = payloadStart + payloadLength; | |
| if (frameEnd > buf.length) break; | |
| // Extract text frames (opcode 1) with FIN set | |
| if (opcode === 1 && fin) { | |
| if (masked && maskingKey) { | |
| const maskedPayload = buf.slice(payloadStart, frameEnd); | |
| const unmaskedPayload = Buffer.allocUnsafe(payloadLength); | |
| for (let i = 0; i < payloadLength; i++) { | |
| unmaskedPayload[i] = maskedPayload[i] ^ maskingKey[i % 4]; | |
| } | |
| messages.push(unmaskedPayload.toString('utf8')); | |
| } else { | |
| messages.push(buf.slice(payloadStart, frameEnd).toString('utf8')); | |
| } |
|
Smoke Test Results — PASS
|
Chroot Version Comparison Results
Overall: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot environments.
|
|
Smoke Test Results — Run 23860683072 ✅ GitHub MCP — Last 2 merged PRs: #1549 "feat: include api-proxy token logs in firewall audit artifact", #1544 "fix: disable IPv6 in agent container to prevent squid proxy bypass" (author: Overall: PASS
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Writes token-diag.log alongside token-usage.jsonl in the mounted log volume. Since api-proxy container stdout is not captured in workflow logs, this file provides visibility into: - Whether trackTokenUsage (HTTP) or trackWebSocketTokenUsage (WS) is called - Content-type, status code, streaming flag for each request - Whether usage data was found and which fields were extracted - Frame counts and message counts for WebSocket tracking This will help diagnose why Claude/Anthropic produces no token records. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Smoke Test Results — Claude (run 23861249879) ✅ GitHub MCP: #1549 feat: include api-proxy token logs in firewall audit artifact / #1544 fix: disable IPv6 in agent container to prevent squid proxy bypass Overall: PASS
|
|
🤖 Smoke test results for ✅ GitHub MCP — Last 2 merged PRs: #1549 "feat: include api-proxy token logs in firewall audit artifact", #1544 "fix: disable IPv6 in agent container to prevent squid proxy bypass" Overall: PASS
|
This comment has been minimized.
This comment has been minimized.
Chroot Version Comparison Results
Overall: FAILED — Python and Node.js versions differ between host and chroot.
|
This comment has been minimized.
This comment has been minimized.
Add first 500 bytes of raw response data to token-diag.log entries. This will reveal the actual SSE format from the Anthropic beta API that the parser is failing to extract usage from. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Smoke Test Results✅ GitHub MCP — Last 2 merged PRs:
✅ Playwright — github.com title contains "GitHub" Overall: PASS
|
Smoke Test Results — Copilot Engine ✅ PASS
Author:
|
This comment has been minimized.
This comment has been minimized.
Chroot Version Comparison Results
Overall: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot.
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The Anthropic API returns gzip-compressed SSE responses (content-encoding: gzip). The token tracker was trying to parse compressed binary data as SSE text, which silently failed to extract any usage information. Changes: - Add gzip/deflate/brotli decompression support in trackTokenUsage() - Create decompression pipeline when content-encoding header is present - Raw compressed bytes still flow to client unchanged via pipe() - Gate diagnostic logging behind AWF_DEBUG_TOKENS=1 env var - Add isCompressedResponse() and createDecompressor() helpers - Add 8 new tests for compressed response handling (gzip SSE, gzip JSON, multi-chunk gzip, backward compat with uncompressed) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Smoke Test Results — Run 23862098126 ✅ GitHub MCP — Last 2 merged PRs: #1549 "feat: include api-proxy token logs in firewall audit artifact" ( Overall: PASS | Author:
|
🏗️ Build Test Suite Results
Overall: 0/8 ecosystems passed — ❌ FAIL Error DetailsAll repository clones failed. The Root cause:
|
|
Smoke Test: PASS
|
Chroot Version Comparison Results
Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot environments.
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
- Set WebSocket record status to 101 instead of 200 - Track header bytes separately; report only WS payload in response_bytes - Properly unmask masked WebSocket frames with XOR key - Sanitize diag() to strip raw_sample before writing to disk (CodeQL) - Add test for masked frame unmasking Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🤖 Smoke test results for
Overall: PASS
|
|
Smoke Test Results — PASS
|
Smoke Test: GitHub Actions Services Connectivity ✅All checks passed:
|
Chroot Version Comparison Results
Result: ❌ FAILED — Python and Node.js versions differ between host and chroot environment.
|
This comment has been minimized.
This comment has been minimized.
|
🔮 The ancient spirits stir at the network boundary; this smoke-test oracle has passed through and marked the run. (Discussion target unavailable in current toolset, so this omen is left on the PR.)
|
Problem
Token tracking was not extracting usage data from Anthropic (Claude) API responses, despite working correctly for OpenAI/Copilot.
Root Cause
The Anthropic API returns gzip-compressed SSE responses (
content-encoding: gzip). The token tracker was trying to parse the raw compressed binary as SSE text, finding0x1F 0x8B 0x08(gzip magic bytes) instead ofdata:lines.This was discovered through diagnostic logging showing
has_usage: falseandmodel: nullfor all Claude requests, withraw_samplecontaining gzip binary data instead of SSE text.Solution
trackTokenUsage()content-encodingheader indicates compression, create a decompression pipelinepipe()— zero impact on proxy latencyAWF_DEBUG_TOKENS=1env var (off by default)Tests
8 new tests for compressed response handling:
isCompressedResponse— gzip, deflate, brotli, identitytrackTokenUsage (compressed)— gzip SSE streaming, gzip JSON, multi-chunk gzip, backward compatAll 187 api-proxy tests pass.
How to verify
Run the Claude smoke test workflow —
token-usage.jsonlshould now contain entries withprovider: anthropic.