Skip to content

[Feature] Add API lifecycle events and request attributes (PR #8)#16

Merged
sriumcp merged 2 commits intomainfrom
pr8ofjourney
Jan 27, 2026
Merged

[Feature] Add API lifecycle events and request attributes (PR #8)#16
sriumcp merged 2 commits intomainfrom
pr8ofjourney

Conversation

@sriumcp
Copy link
Copy Markdown

@sriumcp sriumcp commented Jan 27, 2026

Summary

Implements journey tracing PR #8, adding API lifecycle events and request attributes to span instrumentation.

Changes

New Events:

  • HANDOFF_TO_CORE: Emitted after submitting request to engine (engine.generate())
  • FIRST_RESPONSE_FROM_CORE: Emitted when engine first responds (both streaming and non-streaming paths)

Request Attributes:

  • Model name (gen_ai.response.model)
  • Prompt token count (gen_ai.usage.prompt_tokens)
  • Sampling parameters: temperature, top_p, max_tokens, n (when non-None)

Infrastructure:

  • Added EVENT_TS_MONOTONIC attribute for monotonic timestamps on API events
  • Added _update_first_response_time() helper to track first response timing in _api_spans tuple

Guarantees

G1: EVENT_TS_MONOTONIC attribute defined
G2: HANDOFF_TO_CORE emitted after engine.generate() (verified by code inspection)
G3: FIRST_RESPONSE_FROM_CORE emitted exactly once per request
G4: First response time persisted in _api_spans tuple
G5: Request attributes set on API span (conditionally for non-None values)
G6: Early exit when span not recording (zero overhead)
G7: All span operations wrapped defensively (failures never break requests)

Testing

  • 12 behavioral tests in test_api_additional_events.py
  • Tests cover G1, G3-G7 (G2 verified by code inspection)
  • All existing API span lifecycle tests pass (17/17)
  • No regressions

Files Changed

  • vllm/tracing.py: Added EVENT_TS_MONOTONIC attribute
  • vllm/entrypoints/openai/engine/serving.py: Added _update_first_response_time() helper
  • vllm/entrypoints/openai/chat_completion/serving.py: Event emissions and attribute setting
  • tests/entrypoints/openai/test_api_additional_events.py: Test coverage (new file)

🤖 Generated with Claude Code

sriumcp and others added 2 commits January 27, 2026 18:50
Implements journey tracing PR #8:
- Add EVENT_TS_MONOTONIC attribute for API event timestamps
- Emit HANDOFF_TO_CORE event after engine.generate()
- Emit FIRST_RESPONSE_FROM_CORE event on first response (streaming and non-streaming)
- Set request attributes on API spans (model, prompt tokens, sampling params)
- Add _update_first_response_time() helper to track first response timing
- All span operations wrapped defensively (G7 compliance)
- Zero overhead when span not recording (G6 compliance)
- 12 behavioral tests covering G1, G3-G7 (G2 verified by code inspection)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sriumcp sriumcp merged commit 959dd77 into main Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant