Skip to content

aigw: split access logs and map request headers#1797

Merged
mathetake merged 6 commits intoenvoyproxy:mainfrom
codefromthecrypt:pr-split-logs
Jan 22, 2026
Merged

aigw: split access logs and map request headers#1797
mathetake merged 6 commits intoenvoyproxy:mainfrom
codefromthecrypt:pr-split-logs

Conversation

@codefromthecrypt
Copy link
Copy Markdown
Contributor

@codefromthecrypt codefromthecrypt commented Jan 21, 2026

Description

Split stdout/file access logs by request type using CEL over request headers (x-ai-eg-model for LLM, x-ai-eg-mcp-backend for MCP) so MCP-only fields never appear on LLM logs and vice‑versa. This avoids relying on /mcp paths, which are not present on backend‑listener requests.

Add OTEL request-header mapping env vars for users: OTEL_AIGW_REQUEST_HEADER_ATTRIBUTES (base mapping) and OTEL_AIGW_LOG_REQUEST_HEADER_ATTRIBUTES (log override). These are merged and wired through ext_proc and the Envoy Gateway extension server so access logs can include session.id without per‑request app changes. MCP uses JSON‑RPC params._meta for POST requests, but GET streams have no JSON‑RPC payload, so compose examples also send HTTP headers for access-log mapping.

Rename access‑log keys to the OTEL‑style gen_ai.* and mcp.provider.name while preserving gateway‑specific metadata fields, add request.path to the common access‑log fields, and update examples, fixtures, and docs accordingly. Compose examples pass user/session IDs via compose args so log output can be verified end‑to‑end (append --debug to the aigw command for verbose logs).

Ensure original downstream paths are preserved in access logs by setting x-envoy-original-path and x-ai-eg-original-path from the incoming request path across LLM and MCP flows.

Related Issues/PRs (if applicable)

Related: #1303

Special notes for reviewers (if applicable)

Access log samples from docker compose runs (Envoy stdout):

{"bytes_received":126,"bytes_sent":311,"connection_termination_details":null,"downstream_local_address":"172.18.0.2:1975","downstream_remote_address":"172.18.0.3:36684","duration":118,"gen_ai.provider.name":"default/openai/route/aigw-run/rule/0/ref/0","gen_ai.request.model":"qwen2.5:0.5b","gen_ai.response.model":"qwen2.5:0.5b","gen_ai.usage.input_tokens":44,"gen_ai.usage.output_tokens":6,"method":"POST","request.path":"/v1/chat/completions","response_code":200,"session.id":"session-123","start_time":"2026-01-21T01:27:54.561Z","upstream_cluster":"httproute/default/aigw-run/rule/0","upstream_host":"192.168.5.2:11434","upstream_local_address":"172.18.0.2:38068","upstream_transport_failure_reason":null,"user-agent":"curl/8.14.1","x-envoy-origin-path":"/v1/chat/completions","x-request-id":"9c58c39a-d61f-48db-80e9-befb570f51d4"}
{"bytes_received":213,"bytes_sent":13623,"connection_termination_details":null,"downstream_local_address":"127.0.0.1:10088","downstream_remote_address":"127.0.0.1:56250","duration":1274,"jsonrpc.request.id":"3","mcp.method.name":"tools/call","mcp.provider.name":"kiwi","mcp.session.id":"d0234260-8c4e-49f5-8fb6-8e359194d7cc","method":"POST","request.path":"/","response_code":200,"session.id":"session-123","start_time":"2026-01-21T01:28:12.653Z","upstream_cluster":"httproute/default/ai-eg-mcp-br-mcp-route-kiwi/rule/0","upstream_host":"151.101.195.52:443","upstream_local_address":"172.18.0.2:37250","upstream_transport_failure_reason":null,"user-agent":"Go-http-client/1.1","x-envoy-origin-path":"/mcp","x-request-id":"2e7f4f6d-1b77-4f61-94b9-bc03a960ce9b"}
{"bytes_received":0,"bytes_sent":0,"connection_termination_details":null,"downstream_local_address":"127.0.0.1:10088","downstream_remote_address":"127.0.0.1:56238","duration":2561,"jsonrpc.request.id":null,"mcp.method.name":null,"mcp.provider.name":"kiwi","mcp.session.id":"d0234260-8c4e-49f5-8fb6-8e359194d7cc","method":"GET","request.path":"/","response_code":0,"session.id":"session-123","start_time":"2026-01-21T01:28:11.396Z","upstream_cluster":"httproute/default/ai-eg-mcp-br-mcp-route-kiwi/rule/0","upstream_host":"151.101.67.52:443","upstream_local_address":"172.18.0.2:41090","upstream_transport_failure_reason":null,"user-agent":"Go-http-client/1.1","x-envoy-origin-path":"/mcp","x-request-id":"ee852e6b-be32-4947-bedc-12e4215df52f"}

Split Envoy access logs by request type (LLM vs MCP) using CEL matchers, add request-header→attribute mapping env vars for logs/spans/metrics, and update examples/tests to validate session.id propagation.

Signed-off-by: Adrian Cole <adrian@tetrate.io>
@codefromthecrypt
Copy link
Copy Markdown
Contributor Author

@mathetake @nacx might be glitches here I need to look at again tomorrow, but hopefully the overall direction is sensible. if not, lemme know.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 85.10638% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.06%. Comparing base (5f98617) to head (17b07cd).

Files with missing lines Patch % Lines
internal/extensionserver/header_to_metadata.go 82.69% 9 Missing and 9 partials ⚠️
internal/extproc/processor_impl.go 80.43% 5 Missing and 4 partials ⚠️
internal/mcpproxy/mcpproxy.go 87.50% 3 Missing and 3 partials ⚠️
internal/extensionserver/post_translate_modify.go 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1797      +/-   ##
==========================================
+ Coverage   84.04%   84.06%   +0.02%     
==========================================
  Files         117      118       +1     
  Lines       12990    13213     +223     
==========================================
+ Hits        10917    11108     +191     
- Misses       1418     1433      +15     
- Partials      655      672      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@codefromthecrypt codefromthecrypt marked this pull request as ready for review January 21, 2026 21:58
@codefromthecrypt codefromthecrypt requested a review from a team as a code owner January 21, 2026 21:58
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jan 21, 2026
@codefromthecrypt
Copy link
Copy Markdown
Contributor Author

@nacx @mathetake once this is in, I can add OTLP access log from gateway in aigw/standalone mode without a massive PR

@mathetake mathetake requested a review from nacx January 22, 2026 01:46
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@mathetake mathetake enabled auto-merge (squash) January 22, 2026 19:23
@mathetake
Copy link
Copy Markdown
Member

/retest

@mathetake mathetake merged commit d39631a into envoyproxy:main Jan 22, 2026
76 of 80 checks passed
@codefromthecrypt codefromthecrypt deleted the pr-split-logs branch January 22, 2026 22:37
mathetake pushed a commit that referenced this pull request Jan 26, 2026
**Description**

Default span/log request‑header mappings to
`agent-session-id:session.id` so agent frameworks like Goose get session
correlation with zero config, while still allowing explicit overrides
(different mapping or empty to disable). Metrics never default to
session IDs because they are high cardinality.

The default mapping is in the new ENV variable
`OTEL_AIGW_REQUEST_HEADER_ATTRIBUTES`, so those who want no
`agent-session-id:session.id` should set
`OTEL_AIGW_REQUEST_HEADER_ATTRIBUTES=` (empty string) to clear it.

Refactor request‑header mapping handling so defaults/merging live only
in extproc; `aigw` and controller/helm just pass flags through. Ordering
is normalized everywhere (request → span → metrics → log) and
docs/examples describe defaults without explicitly setting
`agent-session-id:session.id`.

**Related Issues/PRs (if applicable)**

Related: #1797

**Special notes for reviewers (if applicable)**

Ran the examples/goose with OTEL console env and --debug.

Since goose now propagates agent-session-id by default, we can see in
the telemetry `agent-session-id=20260123_19`:

MCP span (`tools/list`) showing `session.id` set from the header:
```
{"Name":"ListTools","SpanContext":{"TraceID":"6412e470a771ede413f3318b984b65f5","SpanID":"bf7a7b2b20ce85a9","TraceFlags":"01","TraceState":"","Remote":false},"Parent":{"TraceID":"00000000000000000000000000000000","SpanID":"0000000000000000","TraceFlags":"00","TraceState":"","Remote":false},"SpanKind":3,"StartTime":"2026-01-23T16:01:20.669792+09:00","EndTime":"2026-01-23T16:01:21.313477459+09:00","Attributes":[{"Key":"mcp.protocol.version","Value":{"Type":"STRING","Value":"2025-06-18"}},{"Key":"mcp.transport","Value":{"Type":"STRING","Value":"http"}},{"Key":"mcp.request.id","Value":{"Type":"STRING","Value":"{1}"}},{"Key":"mcp.method.name","Value":{"Type":"STRING","Value":"tools/list"}},{"Key":"session.id","Value":{"Type":"STRING","Value":"20260123_19"}}],"Events":[{"Name":"route to backend","Attributes":[{"Key":"mcp.backend.name","Value":{"Type":"STRING","Value":"kiwi"}},{"Key":"mcp.session.id","Value":{"Type":"STRING","Value":"f9e80f73-bc48-4797-afae-045ef0e57e7d"}},{"Key":"mcp.session.new","Value":{"Type":"BOOL","Value":false}}],"DroppedAttributeCount":0,"Time":"2026-01-23T16:01:21.303264+09:00"}],"Links":null,"Status":{"Code":"Ok","Description":""},"DroppedAttributes":0,"DroppedEvents":0,"DroppedLinks":0,"ChildSpanCount":0,"Resource":[{"Key":"service.name","Value":{"Type":"STRING","Value":"ai-gateway"}},{"Key":"telemetry.sdk.language","Value":{"Type":"STRING","Value":"go"}},{"Key":"telemetry.sdk.name","Value":{"Type":"STRING","Value":"opentelemetry"}},{"Key":"telemetry.sdk.version","Value":{"Type":"STRING","Value":"1.39.0"}}],"InstrumentationScope":{"Name":"envoyproxy/ai-gateway","Version":"","SchemaURL":"","Attributes":null},"InstrumentationLibrary":{"Name":"envoyproxy/ai-gateway","Version":"","SchemaURL":"","Attributes":null}}
```

MCP access log showing `session.id` on a tool call:
```
{"bytes_received":341,"bytes_sent":8720,"connection_termination_details":null,"downstream_local_address":"127.0.0.1:10088","downstream_remote_address":"127.0.0.1:50643","duration":1247,"jsonrpc.request.id":"4","mcp.method.name":"tools/call","mcp.provider.name":"kiwi","mcp.session.id":"f9e80f73-bc48-4797-afae-045ef0e57e7d","method":"POST","request.path":"/","response_code":200,"session.id":"20260123_19","start_time":"2026-01-23T07:01:33.553Z","upstream_cluster":"httproute/default/ai-eg-mcp-br-mcp-route-kiwi/rule/0","upstream_host":"146.75.115.52:443","upstream_local_address":"192.168.23.60:50644","upstream_transport_failure_reason":null,"user-agent":"Go-http-client/1.1","x-envoy-origin-path":"/mcp","x-envoy-upstream-service-time":"613","x-forwarded-for":null,"x-request-id":"bd29074f-3ab0-41b3-a184-e0ec87a3809b"}
```

LLM access log showing `session.id` on a chat completion:
```
{"bytes_received":14807,"bytes_sent":47214,"connection_termination_details":null,"downstream_local_address":"127.0.0.1:1975","downstream_remote_address":"127.0.0.1:50651","duration":3560,"gen_ai.provider.name":"default/openai/route/aigw-run/rule/0/ref/0","gen_ai.request.model":"qwen3:1.7b","gen_ai.response.model":"qwen3:1.7b","gen_ai.usage.input_tokens":3227,"gen_ai.usage.output_tokens":253,"method":"POST","request.path":"/v1/chat/completions","response_code":200,"session.id":"20260123_19","start_time":"2026-01-23T07:01:29.980Z","upstream_cluster":"httproute/default/aigw-run/rule/0","upstream_host":"127.0.0.1:11434","upstream_local_address":"127.0.0.1:50653","upstream_transport_failure_reason":null,"user-agent":null,"x-envoy-origin-path":"/v1/chat/completions","x-envoy-upstream-service-time":null,"x-forwarded-for":"192.168.23.60","x-request-id":"2b430167-040d-43ef-a48e-de0ebaa0fcdc"}
```

Minor improvements:
- normalized all example header/attributes and order of trace, metrics
and logs
- Add `AIGW_DEBUG` so docker compose examples can actually show debug
output
- align data‑plane tests to use the aigw func‑e download location
instead of re-downloading

---------

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants