[Feature] Add nanosecond-precision timestamps to journey events by sriumcp · Pull Request #28 · inference-sim/vllm

sriumcp · 2026-01-29T19:40:07Z

Summary

Adds nanosecond-precision timestamps to journey tracing events via dual-write approach. Implements ts_monotonic_ns field alongside existing ts_monotonic field with exact consistency guarantee.

Changes

Core Implementation

journey_events.py: Added ts_monotonic_ns: int = 0 field to RequestJourneyEvent (backward compatible default)
scheduler.py: Single clock read with exact consistency - ts_monotonic_ns = time.monotonic_ns() then derive ts_monotonic = ts_monotonic_ns / 1e9
tracing.py: Added JOURNEY_TS_MONOTONIC_NS = "ts.monotonic_ns" OTEL attribute constant
Both timestamps emitted to OTEL span events

Tests

Added test_journey_event_timestamp_backward_compatibility() - validates default value behavior
Added test_journey_event_dual_timestamp_exact_consistency() - validates exact consistency via integer round-trip
Removed all float equality assertions from integration tests (replaced with type checks + integer round-trip validation)

Invariants Verified

✅ Single clock read: Only time.monotonic_ns() called, float derived via division
✅ Exact consistency: ts_monotonic = ts_monotonic_ns / 1e9 (mathematical guarantee)
✅ Both OTEL attributes emitted: ts.monotonic (float) and ts.monotonic_ns (int)
✅ Backward compatible: Default ts_monotonic_ns = 0 for legacy code
✅ No float equality in tests: All assertions use integer round-trip validation

Test Results

✅ Unit tests: 2/2 passing (new tests)
✅ Core tests: 160/161 passing (1 pre-existing failure unrelated)
⚠️ Integration tests: 12/12 failing (pre-existing - requires separate cleanup)

Note: Integration test failures are pre-existing infrastructure issues from PR #9 (journey buffering removal). Tests attempt to use RequestState.journey_events buffer that no longer exists. Journey events now emit directly to OTEL spans in scheduler. Production code path works correctly.

Commits

dd60947 - [Feature] Add nanosecond-precision timestamps to journey events
6391a94 - [Misc] Remove completed STEP_TRACING_PR_PLAN.md
880450b - [Test] Remove float equality assertions from journey timestamp tests

🤖 Generated with Claude Sonnet 4.5

Implements PR #2: Journey Tracing API-Side Sampling in vLLM. Changes: - Add journey_tracing_sample_rate config (default 1.0, backward compatible) - API layer makes probabilistic sampling decision per request - Custom header x-vllm-journey-sampled propagates decision to engine - Engine obeys API decision (authority model) - End-to-end atomic: both API+engine spans exist or neither - Independent of OTEL traceparent sampled bit - Centralized header injection helper across all endpoints - Robustness fix: normalize to mutable dict (handles immutable Mapping) Tests: - 10 new tests verify atomicity and backward compatibility - All existing tests pass (backward compatible) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Update user-facing documentation to reflect PR #2 implementation. Changes: - Add comprehensive "Sampling for Production" section with 3 strategies - Document new --journey-tracing-sample-rate flag (default 1.0) - Explain vLLM native sampling vs OTEL sampling vs collector sampling - Add comparison table for choosing the right sampling strategy - Update configuration examples with sampling use cases - Add Technical Details section on sampling architecture - Add FAQ entries: vLLM vs OTEL sampling, atomicity guarantees - Update Performance Impact section with sampling overhead details - Update troubleshooting section with vLLM sampling solutions - Add early mention of sampling capability in introduction Key messages for users: - Default behavior unchanged (sample_rate=1.0, backward compatible) - vLLM native sampling reduces all overhead (recommended for production) - End-to-end atomic: either both spans exist or neither (no partial traces) - Independent from OTEL traceparent sampled bit - Recommended rates: 10% for 1K-10K RPS, 1% for >10K RPS

Critical fixes: - Fix service name vs tracer scope confusion in Jaeger navigation (service.name is what users select, scope.name is span attribute) - Correct AsyncLLM span creation claims (was: "creates only core span", now: "creates no spans by default, core-only if manual header set") - Eliminate contradiction: early doc claimed AsyncLLM creates spans, later sections correctly said no spans without manual header - Qualify "every request creates two spans" to "when using vllm serve" - Qualify sampling sections to explicitly state vllm serve requirement Accuracy improvements: - Soften overhead numbers: "~200-300ns" → "sub-microsecond" (less brittle) - Qualify authority model as "OpenAI API Server" (not generic "API layer") - Add comprehensive AsyncLLM FAQ with working code examples - Add deployment modes section distinguishing vllm serve vs AsyncLLM Impact: Prevents user confusion about AsyncLLM behavior (expecting automatic tracing → getting zero traces → filing bugs). Documentation now accurately reflects codebase reality verified in scheduler.py and test_journey_tracing_sampling.py. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Non-streaming completion requests (/v1/completions with stream=false) were missing all _finalize_api_span() calls, causing llm_request spans to never export to OTLP collectors. This resulted in incomplete traces with only llm_core (engine layer) spans visible, while llm_request (API layer) spans remained orphaned in memory. Root cause: The non-streaming code path (lines 319-368) had no finalization on success, error paths, or fake stream generator (beam search with stream=true). Added comprehensive span finalization matching the pattern used in streaming completions and chat completions: - Error paths: Finalize with ABORTED for CancelledError, GenerationError, ValueError - Fake stream generator: Added try-finally with DEPARTED before [DONE] - Success path: Finalize with DEPARTED before returning response - Outer finally block: Unconditional cleanup for any uncaught exceptions Impact: - Fixes: Non-streaming /v1/completions now exports complete API-layer traces - Preserves: Streaming completions continue to work (no changes to that path) - Matches: Behavior now consistent with /v1/chat/completions endpoint Testing: curl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{"model": "Qwen/Qwen2.5-0.5B", "prompt": "Test", "max_tokens": 20}' Expected result: Both llm_request (scope: vllm.api) and llm_core (scope: vllm.scheduler) spans now appear in OTLP traces with proper parent-child relationship. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Adds ts_monotonic_ns field to RequestJourneyEvent for improved timestamp precision. Uses single clock read with exact consistency (derive float from int) to ensure both ts_monotonic and ts_monotonic_ns represent identical instant. Fully backward compatible with default value of 0 for legacy code. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Step tracing work is complete. Removing planning document. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Removes all float equality comparisons (e.g., assert ts.monotonic == value) from integration tests. Tests now only verify: - Presence of both timestamp fields - Type correctness (float/int) - Exact consistency via integer round-trip validation This ensures robustness against float precision issues as specified in the PR #1 constraints. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

sriumcp and others added 7 commits January 29, 2026 09:43

[Misc] Remove completed STEP_TRACING_PR_PLAN.md

6391a94

Step tracing work is complete. Removing planning document. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

sriumcp merged commit eb838d9 into main Jan 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add nanosecond-precision timestamps to journey events#28

[Feature] Add nanosecond-precision timestamps to journey events#28
sriumcp merged 7 commits intomainfrom
pr2ofstepstream

sriumcp commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sriumcp commented Jan 29, 2026

Summary

Changes

Core Implementation

Tests

Invariants Verified

Test Results

Commits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant