aigw: default session.id request header mapping by codefromthecrypt · Pull Request #1808 · envoyproxy/ai-gateway

codefromthecrypt · 2026-01-23T07:35:16Z

Description

Default span/log request‑header mappings to agent-session-id:session.id so agent frameworks like Goose get session correlation with zero config, while still allowing explicit overrides (different mapping or empty to disable). Metrics never default to session IDs because they are high cardinality.

The default mapping is in the new ENV variable OTEL_AIGW_REQUEST_HEADER_ATTRIBUTES, so those who want no agent-session-id:session.id should set OTEL_AIGW_REQUEST_HEADER_ATTRIBUTES= (empty string) to clear it.

Refactor request‑header mapping handling so defaults/merging live only in extproc; aigw and controller/helm just pass flags through. Ordering is normalized everywhere (request → span → metrics → log) and docs/examples describe defaults without explicitly setting agent-session-id:session.id.

Related Issues/PRs (if applicable)

Related: #1797

Special notes for reviewers (if applicable)

Ran the examples/goose with OTEL console env and --debug.

Since goose now propagates agent-session-id by default, we can see in the telemetry agent-session-id=20260123_19:

MCP span (tools/list) showing session.id set from the header:

{"Name":"ListTools","SpanContext":{"TraceID":"6412e470a771ede413f3318b984b65f5","SpanID":"bf7a7b2b20ce85a9","TraceFlags":"01","TraceState":"","Remote":false},"Parent":{"TraceID":"00000000000000000000000000000000","SpanID":"0000000000000000","TraceFlags":"00","TraceState":"","Remote":false},"SpanKind":3,"StartTime":"2026-01-23T16:01:20.669792+09:00","EndTime":"2026-01-23T16:01:21.313477459+09:00","Attributes":[{"Key":"mcp.protocol.version","Value":{"Type":"STRING","Value":"2025-06-18"}},{"Key":"mcp.transport","Value":{"Type":"STRING","Value":"http"}},{"Key":"mcp.request.id","Value":{"Type":"STRING","Value":"{1}"}},{"Key":"mcp.method.name","Value":{"Type":"STRING","Value":"tools/list"}},{"Key":"session.id","Value":{"Type":"STRING","Value":"20260123_19"}}],"Events":[{"Name":"route to backend","Attributes":[{"Key":"mcp.backend.name","Value":{"Type":"STRING","Value":"kiwi"}},{"Key":"mcp.session.id","Value":{"Type":"STRING","Value":"f9e80f73-bc48-4797-afae-045ef0e57e7d"}},{"Key":"mcp.session.new","Value":{"Type":"BOOL","Value":false}}],"DroppedAttributeCount":0,"Time":"2026-01-23T16:01:21.303264+09:00"}],"Links":null,"Status":{"Code":"Ok","Description":""},"DroppedAttributes":0,"DroppedEvents":0,"DroppedLinks":0,"ChildSpanCount":0,"Resource":[{"Key":"service.name","Value":{"Type":"STRING","Value":"ai-gateway"}},{"Key":"telemetry.sdk.language","Value":{"Type":"STRING","Value":"go"}},{"Key":"telemetry.sdk.name","Value":{"Type":"STRING","Value":"opentelemetry"}},{"Key":"telemetry.sdk.version","Value":{"Type":"STRING","Value":"1.39.0"}}],"InstrumentationScope":{"Name":"envoyproxy/ai-gateway","Version":"","SchemaURL":"","Attributes":null},"InstrumentationLibrary":{"Name":"envoyproxy/ai-gateway","Version":"","SchemaURL":"","Attributes":null}}

MCP access log showing session.id on a tool call:

{"bytes_received":341,"bytes_sent":8720,"connection_termination_details":null,"downstream_local_address":"127.0.0.1:10088","downstream_remote_address":"127.0.0.1:50643","duration":1247,"jsonrpc.request.id":"4","mcp.method.name":"tools/call","mcp.provider.name":"kiwi","mcp.session.id":"f9e80f73-bc48-4797-afae-045ef0e57e7d","method":"POST","request.path":"/","response_code":200,"session.id":"20260123_19","start_time":"2026-01-23T07:01:33.553Z","upstream_cluster":"httproute/default/ai-eg-mcp-br-mcp-route-kiwi/rule/0","upstream_host":"146.75.115.52:443","upstream_local_address":"192.168.23.60:50644","upstream_transport_failure_reason":null,"user-agent":"Go-http-client/1.1","x-envoy-origin-path":"/mcp","x-envoy-upstream-service-time":"613","x-forwarded-for":null,"x-request-id":"bd29074f-3ab0-41b3-a184-e0ec87a3809b"}

LLM access log showing session.id on a chat completion:

{"bytes_received":14807,"bytes_sent":47214,"connection_termination_details":null,"downstream_local_address":"127.0.0.1:1975","downstream_remote_address":"127.0.0.1:50651","duration":3560,"gen_ai.provider.name":"default/openai/route/aigw-run/rule/0/ref/0","gen_ai.request.model":"qwen3:1.7b","gen_ai.response.model":"qwen3:1.7b","gen_ai.usage.input_tokens":3227,"gen_ai.usage.output_tokens":253,"method":"POST","request.path":"/v1/chat/completions","response_code":200,"session.id":"20260123_19","start_time":"2026-01-23T07:01:29.980Z","upstream_cluster":"httproute/default/aigw-run/rule/0","upstream_host":"127.0.0.1:11434","upstream_local_address":"127.0.0.1:50653","upstream_transport_failure_reason":null,"user-agent":null,"x-envoy-origin-path":"/v1/chat/completions","x-envoy-upstream-service-time":null,"x-forwarded-for":"192.168.23.60","x-request-id":"2b430167-040d-43ef-a48e-de0ebaa0fcdc"}

Minor improvements:

normalized all example header/attributes and order of trace, metrics and logs
Add AIGW_DEBUG so docker compose examples can actually show debug output
align data‑plane tests to use the aigw func‑e download location instead of re-downloading

codefromthecrypt · 2026-01-23T07:47:16Z

after this I would like to switch to otlp hopefully if EG completes merging my outstanding PR 🤞

codecov-commenter · 2026-01-23T08:11:24Z

Codecov Report

❌ Patch coverage is 96.72131% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.12%. Comparing base (6c351f0) to head (084c668).

Files with missing lines	Patch %	Lines
internal/extensionserver/extensionserver.go	66.66%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1808      +/-   ##
==========================================
+ Coverage   84.08%   84.12%   +0.04%     
==========================================
  Files         118      119       +1     
  Lines       13235    13283      +48     
==========================================
+ Hits        11128    11174      +46     
- Misses       1434     1435       +1     
- Partials      673      674       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codefromthecrypt

notes

api/v1alpha1/ai_gateway_route.go

codefromthecrypt · 2026-01-23T08:23:36Z

cmd/aigw/docker-compose-otel.yaml

+      - AIGW_DEBUG
      # session.id is used in logs and traces (not metrics; high-cardinality)
-      - OTEL_AIGW_REQUEST_HEADER_ATTRIBUTES=x-user-id:user.id
-      - OTEL_AIGW_LOG_REQUEST_HEADER_ATTRIBUTES=x-session-id:session.id


spans and logs are fine with request scope, so that's why session.id makes sense

codefromthecrypt · 2026-01-23T08:24:17Z

cmd/aigw/main.go

 	// cmdRun corresponds to `aigw run` command.
 	cmdRun struct {
-		Debug     bool   `help:"Enable debug logging emitted to stderr."`
+		Debug     bool   `env:"AIGW_DEBUG" help:"Enable debug logging emitted to stderr."`


docker compose up cannot add args, so our instructions were busted. env is the easy way out

codefromthecrypt · 2026-01-26T05:48:10Z

api/v1alpha1/ai_gateway_route.go

 	// ```
 	// Then, with the following BackendTrafficPolicy of Envoy Gateway, you can have three
-	// rate limit buckets for each unique x-user-id header value. One bucket is for the input token,
+	// rate limit buckets for each unique x-tenant-id header value. One bucket is for the input token,


our examples used a combination of x-user-id, x-team-id, x-tenant and the most common attribute is x-tenant-id so settled on this for a coarse grained example

mathetake

Not following on why the pointer to String is necessary but harmless it seems

mathetake · 2026-01-26T21:12:55Z

E2E rate limit failure seems legit

model","gen_ai_token_type":"cached_input"},"value":[1769407055.349,"20"]},{"metric":{"gen_ai_request_model":"rate-limit-funky-model","gen_ai_token_type":"input"},"value":[1769407055.349,"10000"]},{"metric":{"gen_ai_request_model":"rate-limit-funky-model","gen_ai_token_type":"output"},"value":[1769407055.349,"10020"]}]}}
2026-01-26T05:57:35.3561114Z     token_ratelimit_test.go:196: 
2026-01-26T05:57:35.3562199Z         	Error Trace:	/home/runner/work/ai-gateway/ai-gateway/tests/e2e/token_ratelimit_test.go:196
2026-01-26T05:57:35.3564076Z         	            				/opt/hostedtoolcache/go/1.25.6/x64/src/runtime/asm_amd64.s:1693
2026-01-26T05:57:35.3564840Z         	Error:      	Should be true
2026-01-26T05:57:35.3565581Z         	Test:       	Test_Examples_TokenRateLimit
2026-01-26T05:57:35.3566340Z         	Messages:   	team_id should be present in the metric
2026-01-26T05:59:35.1544412Z     token_ratelimit_test.go:153: 
2026-01-26T05:59:35.1547945Z         	Error Trace:	/home/runner/work/ai-gateway/ai-gateway/tests/e2e/token_ratelimit_test.go:153
2026-01-26T05:59:35.1556398Z         	Error:      	Condition never satisfied
2026-01-26T05:59:35.1560438Z         	Test:       	Test_Examples_TokenRateLimit
2026-01-26T05:59:35.2244768Z Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
2026-01-26T05:59:35.2391895Z namespace "redis-system" force deleted
2026-01-26T05:59:35.2494692Z service "redis" force deleted from redis-system namespace
2026-01-26T05:59:35.2533523Z deployment.apps "redis" force deleted from redis-system namespace
2026-01-26T05:59:40.4901923Z Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
2026-01-26T05:59:40.5063737Z gatewayclass.gateway.networking.k8s.io "envoy-ai-gateway-token-ratelimit" force deleted
2026-01-26T05:59:40.5155514Z gateway.gateway.networking.k8s.io "envoy-ai-gateway-token-ratelimit" force deleted from default namespace
2026-01-26T05:59:40.5247621Z aigatewayroute.aigateway.envoyproxy.io "envoy-ai-gateway-token-ratelimit" force deleted from default namespace
2026-01-26T05:59:40.5356794Z aiservicebackend.aigateway.envoyproxy.io "envoy-ai-gateway-token-ratelimit-testupstream" force deleted from default namespace
2026-01-26T05:59:40.5484656Z backend.gateway.envoyproxy.io "envoy-ai-gateway-token-ratelimit-testupstream" force deleted from default namespace
2026-01-26T05:59:40.5612382Z backendtrafficpolicy.gateway.envoyproxy.io "envoy-ai-gateway-token-ratelimit-policy" force deleted from default namespace
2026-01-26T05:59:40.5733996Z deployment.apps "envoy-ai-gateway-token-ratelimit-tesetupstream" force deleted from default namespace
2026-01-26T05:59:40.6086493Z service "envoy-ai-gateway-token-ratelimit-tesetupstream" force deleted from default namespace
2026-01-26T05:59:40.6154408Z envoyproxy.gateway.envoyproxy.io "envoy-ai-gateway-token-ratelimit" force deleted from default namespace
2026-01-26T05:59:40.6533064Z --- FAIL: Test_Examples_TokenRateLimit (171.31s)

codefromthecrypt · 2026-01-26T21:43:07Z

Not following on why the pointer to String is necessary but harmless it seems

it is to know the difference between not set and set. for example, if you want no defaults, you set the header to empty. Without handling tri-state boolean it would be hard to unset everything.

Signed-off-by: Adrian Cole <adrian@tetrate.io>

codefromthecrypt · 2026-01-26T21:46:35Z

E2E rate limit failure seems legit

yep sorry about that, missed a find/replace

codefromthecrypt · 2026-01-26T21:49:22Z

updated the PR desc on how to clear the default (via OTEL_AIGW_REQUEST_HEADER_ATTRIBUTES= empty string)

codefromthecrypt force-pushed the normalize-req-attrs branch from f5e8e58 to 8c3307e Compare January 23, 2026 07:46

codefromthecrypt mentioned this pull request Jan 23, 2026

feat(telemetry): add resourceAttributes to OTLP backends envoyproxy/gateway#7972

Merged

codefromthecrypt force-pushed the normalize-req-attrs branch from 8c3307e to fe73225 Compare January 23, 2026 08:08

codefromthecrypt marked this pull request as ready for review January 23, 2026 08:21

codefromthecrypt requested a review from a team as a code owner January 23, 2026 08:21

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jan 23, 2026

codefromthecrypt force-pushed the normalize-req-attrs branch from fe73225 to 1faa0a6 Compare January 23, 2026 08:42

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Jan 23, 2026

codefromthecrypt force-pushed the normalize-req-attrs branch 3 times, most recently from 95bd3b8 to 6dc63e3 Compare January 26, 2026 05:47

codefromthecrypt commented Jan 26, 2026

View reviewed changes

mathetake approved these changes Jan 26, 2026

View reviewed changes

codefromthecrypt added 2 commits January 27, 2026 06:46

aigw: default session.id request header mapping

87303d5

Signed-off-by: Adrian Cole <adrian@tetrate.io>

fix-e2e

084c668

Signed-off-by: Adrian Cole <adrian@tetrate.io>

codefromthecrypt force-pushed the normalize-req-attrs branch from 6dc63e3 to 084c668 Compare January 26, 2026 21:46

mathetake enabled auto-merge (squash) January 26, 2026 21:49

mathetake merged commit 3b16648 into envoyproxy:main Jan 26, 2026
36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aigw: default session.id request header mapping#1808

aigw: default session.id request header mapping#1808
mathetake merged 2 commits intoenvoyproxy:mainfrom
codefromthecrypt:normalize-req-attrs

codefromthecrypt commented Jan 23, 2026 •

edited

Loading

Uh oh!

codefromthecrypt commented Jan 23, 2026

Uh oh!

codecov-commenter commented Jan 23, 2026 •

edited

Loading

Uh oh!

codefromthecrypt left a comment

Uh oh!

Uh oh!

codefromthecrypt Jan 23, 2026

Uh oh!

codefromthecrypt Jan 23, 2026

Uh oh!

codefromthecrypt Jan 26, 2026

Uh oh!

mathetake left a comment

Uh oh!

mathetake commented Jan 26, 2026

Uh oh!

codefromthecrypt commented Jan 26, 2026

Uh oh!

codefromthecrypt commented Jan 26, 2026

Uh oh!

codefromthecrypt commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

codefromthecrypt commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codefromthecrypt commented Jan 23, 2026

Uh oh!

codecov-commenter commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codefromthecrypt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codefromthecrypt Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

codefromthecrypt Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

codefromthecrypt Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

mathetake left a comment

Choose a reason for hiding this comment

Uh oh!

mathetake commented Jan 26, 2026

Uh oh!

codefromthecrypt commented Jan 26, 2026

Uh oh!

codefromthecrypt commented Jan 26, 2026

Uh oh!

codefromthecrypt commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codefromthecrypt commented Jan 23, 2026 •

edited

Loading

codecov-commenter commented Jan 23, 2026 •

edited

Loading