Skip to content

feat: add token usage tracking to api-proxy sidecar#1539

Merged
lpcox merged 2 commits intomainfrom
feat/api-proxy-token-tracking
Apr 1, 2026
Merged

feat: add token usage tracking to api-proxy sidecar#1539
lpcox merged 2 commits intomainfrom
feat/api-proxy-token-tracking

Conversation

@lpcox
Copy link
Copy Markdown
Collaborator

@lpcox lpcox commented Apr 1, 2026

Summary

Adds token usage tracking to the api-proxy sidecar, enabling visibility into LLM API token consumption across all providers (OpenAI, Anthropic, Copilot).

Changes

New: containers/api-proxy/token-tracker.js

  • Intercepts LLM API responses (both streaming SSE and non-streaming JSON) to extract token usage data
  • Non-streaming: buffers response body, parses JSON on end to extract usage field
  • Streaming (SSE): scans each chunk for usage events as they pass through (message_start, message_delta, final chunk)
  • Normalizes provider-specific fields into unified format: input_tokens, output_tokens, cache_read_tokens, cache_write_tokens
  • Writes JSONL logs to /var/log/api-proxy/token-usage.jsonl
  • Updates input_tokens_total and output_tokens_total metrics counters
  • Zero latency impact: attaches additional EventEmitter listeners alongside existing proxyRes.pipe(res) — purely observational, no Transform stream needed

Modified: containers/api-proxy/server.js

  • Imports trackTokenUsage and closeLogStream from token-tracker
  • Calls trackTokenUsage(proxyRes, opts) after proxyRes.pipe(res) in the proxy response handler
  • Calls closeLogStream() in SIGTERM/SIGINT shutdown handlers

New: containers/api-proxy/token-tracker.test.js

  • 32 unit tests covering all extraction and normalization paths
  • Tests for OpenAI, Anthropic, and edge cases (invalid JSON, missing usage, non-2xx responses)
  • Integration tests with mock EventEmitter for end-to-end streaming and non-streaming flows

Provider Token Field Mapping

Provider Input Output Cache Read Cache Write
Anthropic usage.input_tokens usage.output_tokens usage.cache_read_input_tokens usage.cache_creation_input_tokens
OpenAI/Copilot usage.prompt_tokens usage.completion_tokens

Testing

  • All 32 new tests pass
  • All 134 existing api-proxy tests pass (166 total)
  • Root test suite: 1229/1232 pass (3 pre-existing failures in docker-manager.test.ts, unrelated)

Closes #1536

Intercept LLM API responses in the api-proxy to extract and log token
usage data. Supports both streaming (SSE) and non-streaming JSON
responses from OpenAI, Anthropic, and Copilot providers.

- Add token-tracker.js module with SSE and JSON usage extraction
- Integrate trackTokenUsage into server.js proxy response pipeline
- Write JSONL logs to /var/log/api-proxy/token-usage.jsonl
- Update input_tokens_total and output_tokens_total metrics
- Add 32 unit tests covering all extraction and normalization paths

Closes #1536

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@lpcox lpcox requested a review from Mossaka as a code owner April 1, 2026 01:14
Copilot AI review requested due to automatic review settings April 1, 2026 01:14
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 82.67% 82.77% 📈 +0.10%
Statements 82.34% 82.43% 📈 +0.09%
Functions 81.22% 81.22% ➡️ +0.00%
Branches 75.94% 76.00% 📈 +0.06%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/docker-manager.ts 85.8% → 86.2% (+0.41%) 85.3% → 85.7% (+0.40%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds provider-agnostic token usage tracking to the api-proxy sidecar by observing upstream responses (JSON and SSE), normalizing usage fields, emitting metrics, and writing JSONL usage logs for later analysis/correlation.

Changes:

  • Added token-tracker.js to extract/normalize token usage from OpenAI/Copilot and Anthropic responses (streaming + non-streaming) and emit logs/metrics.
  • Integrated token tracking into the proxy response path and added graceful shutdown cleanup via closeLogStream().
  • Added a comprehensive Jest test suite for extraction, normalization, and end-to-end tracking behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
containers/api-proxy/token-tracker.js New token usage observer that parses JSON/SSE responses, normalizes fields, writes JSONL logs, and increments token counters.
containers/api-proxy/server.js Hooks token tracking into the proxy response handler and closes the log stream on shutdown signals.
containers/api-proxy/token-tracker.test.js New unit/integration tests covering JSON and SSE parsing and metrics updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

const { generateRequestId, sanitizeForLog, logRequest } = require('./logging');
const metrics = require('./metrics');
const rateLimiter = require('./rate-limiter');
const { trackTokenUsage, closeLogStream } = require('./token-tracker');
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server.js now requires ./token-tracker, but the api-proxy Docker image currently only copies server.js logging.js metrics.js rate-limiter.js (see containers/api-proxy/Dockerfile:18). Without updating the Dockerfile (or switching to COPY . ./), the container will fail to start with a module-not-found error.

Suggested change
const { trackTokenUsage, closeLogStream } = require('./token-tracker');
let trackTokenUsage;
let closeLogStream;
try {
({ trackTokenUsage, closeLogStream } = require('./token-tracker'));
} catch (err) {
if (err && err.code === 'MODULE_NOT_FOUND') {
// Fallback no-op implementations when token-tracker is not available
trackTokenUsage = () => {};
closeLogStream = () => {};
} else {
throw err;
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +866 to +874
process.on('SIGTERM', () => {
logRequest('info', 'shutdown', { message: 'Received SIGTERM, shutting down gracefully' });
closeLogStream();
process.exit(0);
});

process.on('SIGINT', () => {
logRequest('info', 'shutdown', { message: 'Received SIGINT, shutting down gracefully' });
closeLogStream();
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closeLogStream() calls logStream.end() but the SIGTERM/SIGINT handlers call process.exit(0) immediately afterward, which can terminate the process before the stream flushes buffered writes. Consider awaiting the stream 'finish' callback (or using end(() => process.exit(0))) to avoid losing the last token-usage entries.

Suggested change
process.on('SIGTERM', () => {
logRequest('info', 'shutdown', { message: 'Received SIGTERM, shutting down gracefully' });
closeLogStream();
process.exit(0);
});
process.on('SIGINT', () => {
logRequest('info', 'shutdown', { message: 'Received SIGINT, shutting down gracefully' });
closeLogStream();
process.on('SIGTERM', async () => {
logRequest('info', 'shutdown', { message: 'Received SIGTERM, shutting down gracefully' });
await closeLogStream();
process.exit(0);
});
process.on('SIGINT', async () => {
logRequest('info', 'shutdown', { message: 'Received SIGINT, shutting down gracefully' });
await closeLogStream();

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +14
* proxyRes → PassThrough (accumulates chunks) → res (client)
* ↓ on('end')
* parse usage → log to file + metrics
*
* For non-streaming responses: parse the buffered JSON body on 'end'.
* For streaming (SSE) responses: scan each chunk for usage events as they
* pass through, accumulate usage from message_start / message_delta / final
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file header comment describes an architecture using PassThrough (accumulates chunks), but the implementation only attaches data/end listeners and does not use a PassThrough. Updating the comment to match the actual approach will prevent confusion for future maintenance.

Suggested change
* proxyRes PassThrough (accumulates chunks) res (client)
* on('end')
* parse usage log to file + metrics
*
* For non-streaming responses: parse the buffered JSON body on 'end'.
* For streaming (SSE) responses: scan each chunk for usage events as they
* pass through, accumulate usage from message_start / message_delta / final
* proxyRes (LLM response) res (client)
* ├─ on('data'): buffer/inspect chunks for usage extraction
* └─ on('end'): finalize parsing log to file + metrics
*
* For non-streaming responses: buffer the JSON body (up to MAX_BUFFER_SIZE),
* then parse it on 'end' to extract usage fields.
* For streaming (SSE) responses: scan each chunk for usage events as they
* are received, accumulate usage from message_start / message_delta / final

Copilot uses AI. Check for mistakes.
Comment on lines +94 to +117
result.usage = {};
// Anthropic fields
if (typeof json.usage.input_tokens === 'number') {
result.usage.input_tokens = json.usage.input_tokens;
}
if (typeof json.usage.output_tokens === 'number') {
result.usage.output_tokens = json.usage.output_tokens;
}
if (typeof json.usage.cache_creation_input_tokens === 'number') {
result.usage.cache_creation_input_tokens = json.usage.cache_creation_input_tokens;
}
if (typeof json.usage.cache_read_input_tokens === 'number') {
result.usage.cache_read_input_tokens = json.usage.cache_read_input_tokens;
}
// OpenAI/Copilot fields
if (typeof json.usage.prompt_tokens === 'number') {
result.usage.prompt_tokens = json.usage.prompt_tokens;
}
if (typeof json.usage.completion_tokens === 'number') {
result.usage.completion_tokens = json.usage.completion_tokens;
}
if (typeof json.usage.total_tokens === 'number') {
result.usage.total_tokens = json.usage.total_tokens;
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractUsageFromJson sets result.usage = {} whenever json.usage is an object, even if none of the recognized numeric token fields are present. This can cause normalizeUsage({}) to produce all zeros and then log/emit metrics for 0 tokens, polluting logs/metrics. Consider leaving usage as null unless at least one numeric token field was extracted.

Suggested change
result.usage = {};
// Anthropic fields
if (typeof json.usage.input_tokens === 'number') {
result.usage.input_tokens = json.usage.input_tokens;
}
if (typeof json.usage.output_tokens === 'number') {
result.usage.output_tokens = json.usage.output_tokens;
}
if (typeof json.usage.cache_creation_input_tokens === 'number') {
result.usage.cache_creation_input_tokens = json.usage.cache_creation_input_tokens;
}
if (typeof json.usage.cache_read_input_tokens === 'number') {
result.usage.cache_read_input_tokens = json.usage.cache_read_input_tokens;
}
// OpenAI/Copilot fields
if (typeof json.usage.prompt_tokens === 'number') {
result.usage.prompt_tokens = json.usage.prompt_tokens;
}
if (typeof json.usage.completion_tokens === 'number') {
result.usage.completion_tokens = json.usage.completion_tokens;
}
if (typeof json.usage.total_tokens === 'number') {
result.usage.total_tokens = json.usage.total_tokens;
}
const usage = {};
let hasField = false;
// Anthropic fields
if (typeof json.usage.input_tokens === 'number') {
usage.input_tokens = json.usage.input_tokens;
hasField = true;
}
if (typeof json.usage.output_tokens === 'number') {
usage.output_tokens = json.usage.output_tokens;
hasField = true;
}
if (typeof json.usage.cache_creation_input_tokens === 'number') {
usage.cache_creation_input_tokens = json.usage.cache_creation_input_tokens;
hasField = true;
}
if (typeof json.usage.cache_read_input_tokens === 'number') {
usage.cache_read_input_tokens = json.usage.cache_read_input_tokens;
hasField = true;
}
// OpenAI/Copilot fields
if (typeof json.usage.prompt_tokens === 'number') {
usage.prompt_tokens = json.usage.prompt_tokens;
hasField = true;
}
if (typeof json.usage.completion_tokens === 'number') {
usage.completion_tokens = json.usage.completion_tokens;
hasField = true;
}
if (typeof json.usage.total_tokens === 'number') {
usage.total_tokens = json.usage.total_tokens;
hasField = true;
}
if (hasField) {
result.usage = usage;
}

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +65
function writeTokenUsage(record) {
const stream = getLogStream();
if (stream) {
stream.write(JSON.stringify(record) + '\n');
}
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeTokenUsage ignores the return value of stream.write(). If the log volume is slow or temporarily blocked, WriteStream will buffer in memory and could grow without bound under load. Consider handling backpressure (e.g., drop or queue writes when write() returns false, and/or add a bounded in-memory buffer).

Copilot uses AI. Check for mistakes.
Comment on lines +236 to +246
function trackTokenUsage(proxyRes, opts) {
const { requestId, provider, method, path: reqPath, targetHost, startTime, metrics: metricsRef } = opts;
const streaming = isStreamingResponse(proxyRes.headers);

// Accumulate response body for usage extraction
const chunks = [];
let totalBytes = 0;
let overflow = false;

// For streaming: accumulate usage across SSE events
let streamingUsage = {};
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trackTokenUsage accepts method and targetHost but never uses them, and the emitted record hard-codes request_bytes: 0 even though the caller already knows the actual request size (requestBytes in server.js). Either remove unused fields from opts/record to reduce confusion, or plumb through and log method, targetHost, and real request_bytes (and potentially reuse the already-computed responseBytes instead of counting again).

Copilot uses AI. Check for mistakes.
Comment on lines +283 to +301
describe('trackTokenUsage', () => {
test('extracts usage from non-streaming JSON response', (done) => {
const proxyRes = new EventEmitter();
proxyRes.headers = { 'content-type': 'application/json' };
proxyRes.statusCode = 200;

const metricsRef = {
increment: jest.fn(),
};

trackTokenUsage(proxyRes, {
requestId: 'test-123',
provider: 'openai',
method: 'POST',
path: '/v1/chat/completions',
targetHost: 'api.openai.com',
startTime: Date.now(),
metrics: metricsRef,
});
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These integration tests call trackTokenUsage, which will attempt to create/write /var/log/api-proxy/token-usage.jsonl via fs.mkdirSync/createWriteStream. On many developer/CI environments this path is not writable, which can add noisy token_log_init_error warnings to test output. Consider setting process.env.AWF_TOKEN_LOG_DIR to a temp dir for the test suite and/or mocking writeTokenUsage/getLogStream to avoid filesystem I/O in unit tests.

Copilot uses AI. Check for mistakes.
- Add token-tracker.js to Dockerfile COPY step
- Add try/catch require guard for graceful fallback
- Fix header comment to match actual architecture
- Guard empty usage objects from producing 0-token log entries
- Handle write backpressure in JSONL log writer
- Make closeLogStream() async to flush before process.exit
- Remove unused method/targetHost opts; drop hardcoded request_bytes
- Redirect test log output to temp dir to avoid /var/log noise

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Smoke test results

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude for issue #1539

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

🤖 Smoke Test Results

Test Result
GitHub MCP (last 2 merged PRs) #1534 feat: add smoke-services workflow for --allow-host-service-ports e2e testing, #1528 fix: api-proxy auth chain — trim keys, align placeholder format, add diagnostics
Playwright (github.com title) ✅ "GitHub · Change is constant. GitHub keeps you ahead. · GitHub"
File write/read /tmp/gh-aw/agent/smoke-test-copilot-23827810077.txt created and verified
Bash tool

Overall: PASS

PR author: @lpcox | No assignees

📰 BREAKING: Report filed by Smoke Copilot for issue #1539

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3 ❌ NO
Node.js v24.14.0 v20.20.1 ❌ NO
Go go1.22.12 go1.22.12 ✅ YES

Result: ⚠️ Not all tests passed. Python and Node.js versions differ between host and chroot environments.

Tested by Smoke Chroot for issue #1539

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@lpcox lpcox closed this Apr 1, 2026
@lpcox lpcox reopened this Apr 1, 2026
@lpcox lpcox enabled auto-merge (squash) April 1, 2026 02:27
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

🤖 Smoke Test Results

Test Status
GitHub MCP (last 2 merged PRs) #1534 feat: add smoke-services workflow, #1528 fix: api-proxy auth chain
Playwright (github.com title) ✅ "GitHub · Change is constant..."
File write + read /tmp/gh-aw/agent/smoke-test-copilot-23828975666.txt
Bash verification cat confirmed content

Overall: PASS | Author: @lpcox | Assignees: none

📰 BREAKING: Report filed by Smoke Copilot for issue #1539

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Smoke Test Results

✅ GitHub MCP: #1534 feat: add smoke-services workflow for --allow-host-service-ports e2e testing, #1530 [WIP] Fix failing GitHub Actions workflow agent
✅ Playwright: github.com title contains "GitHub"
✅ File write: /tmp/gh-aw/agent/smoke-test-claude-23828975659.txt created
✅ Bash verify: file contents confirmed

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude for issue #1539

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3 ❌ NO
Node.js v24.14.0 v20.20.1 ❌ NO
Go go1.22.12 go1.22.12 ✅ YES

Overall: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot.

Tested by Smoke Chroot for issue #1539

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Smoke Test: GitHub Actions Services Connectivity ✅

All connectivity checks passed.

Service Check Result
Redis PINGhost.docker.internal:6379 PONG
PostgreSQL pg_isreadyhost.docker.internal:5432 ✅ accepting connections
PostgreSQL SELECT 1 on smoketest db as postgres ✅ returned 1

Note: redis-cli was not pre-installed; Redis was verified via raw TCP (nc + RESP protocol), which returned +PONG.

🔌 Service connectivity validated by Smoke Services

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Smoke test results:

  • ✅ GitHub MCP (last 2 merged PRs): "feat: add smoke-services workflow for --allow-host-service-ports e2e testing"; "fix: api-proxy auth chain — trim keys, align placeholder format, add diagnostics"
  • ❌ safeinputs-gh PR query (tool not available in this runtime)
  • ❌ Playwright title check (EACCES writing MCP log file)
  • ❌ Tavily search (tool/server not available)
  • ✅ File write + cat readback
  • ❌ Discussion query/comment flow (github-discussion-query unavailable)
  • npm ci && npm run build
    Overall status: FAIL

🔮 The oracle has spoken through Smoke Codex

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx All passed ✅ PASS
Node.js execa All passed ✅ PASS
Node.js p-limit All passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Note: Java required mvn -Dmaven.repo.local=/tmp/... workaround because ~/.m2 was owned by root (no write access), preventing Maven from creating its local repository at the default path. All tests passed once the local repo path was redirected.

Generated by Build Test Suite for issue #1539 ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: API proxy token usage tracking and conversation cost analysis

3 participants