Skip to content

[CI] Add Docker build and push workflow#9

Merged
sriumcp merged 5 commits intomainfrom
ci
Jan 26, 2026
Merged

[CI] Add Docker build and push workflow#9
sriumcp merged 5 commits intomainfrom
ci

Conversation

@sriumcp
Copy link
Copy Markdown

@sriumcp sriumcp commented Jan 26, 2026

Summary

Add GitHub Actions workflow to automatically build and push Docker images to GitHub Container Registry (ghcr.io).

Changes

  • Add .github/workflows/docker-build-push.yml - GitHub Actions workflow for building and pushing Docker images
  • Add README-DOCKER.md - Documentation for Docker build and deployment

Workflow Triggers

  • Push to main: Creates latest and main-<sha> tags
  • Version tags (v)*: Creates versioned releases (e.g., v1.0.0)
  • Pull requests: Builds for validation (does not push)

Build Configuration

  • CUDA: 12.9.1 (H100 optimized)
  • Python: 3.12
  • GPU Architectures: 7.0, 7.5, 8.0, 8.9, 9.0, 10.0, 12.0
    • Architecture 9.0 = H100/H200 GPUs
  • Target: vllm-openai (production server)

Before Merging

Ensure GitHub Actions permissions are enabled:

  1. Go to Settings → Actions → General
  2. Select "Read and write permissions"
  3. Check "Allow GitHub Actions to create and approve pull requests"
  4. Save

Test Plan

  • Workflow YAML is valid
  • Build completes successfully (will verify in CI)
  • Images are pushed to ghcr.io (after merge to main)

🤖 Generated with Claude Code

sriumcp and others added 5 commits January 26, 2026 11:54
Add GitHub Actions workflow to automatically build and push Docker images
to GitHub Container Registry (ghcr.io). Workflow triggers on:
- Push to main branch (creates latest and main-<sha> tags)
- Version tags (v*) for releases
- Pull requests (validation only, no push)

Images are optimized for H100 GPUs with CUDA 12.9.1 and architecture 9.0.
Build configuration mirrors vLLM's production setup with layer caching
for faster subsequent builds.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Change sha tag prefix from {{branch}}- to sha- to avoid invalid tag
format when branch name is empty or evaluates incorrectly.

Fixes: ERROR: invalid tag "ghcr.io/inference-sim/vllm:-a951e86"

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add id-token and attestations write permissions to workflow.
This ensures the workflow has all necessary permissions even when
repository settings are inherited from organization level.

Permissions now explicitly include:
- contents: read - for checking out code
- packages: write - for pushing to ghcr.io
- id-token: write - for build attestations
- attestations: write - for build provenance

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Reduce max_jobs from 16 to 2 and nvcc_threads from 8 to 1 to prevent
GitHub Actions runners from running out of memory during CUDA compilation.

Standard GitHub runners have 7GB RAM and 2 CPU cores, which is
insufficient for highly parallel CUDA compilation. This will make
builds slower (~60-90 min) but prevent runner crashes.

Fixes: The hosted runner lost communication with the server

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Only run Docker builds on:
- Push to main branch (creates latest + main-<sha> tags)
- Version tags (v*) (creates versioned releases)
- Manual workflow_dispatch trigger

This avoids expensive 60-90 minute builds on every PR and saves
GitHub Actions minutes. Docker builds can still be manually triggered
if needed for testing.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sriumcp sriumcp merged commit 18f0d2f into main Jan 26, 2026
1 of 2 checks passed
@sriumcp sriumcp deleted the ci branch January 26, 2026 18:21
sriumcp added a commit that referenced this pull request Jan 27, 2026
Updates to reflect PR #7 completion:
- PR sequence table: Mark #7 as COMPLETED with 12 tests
- Dependency chain: Mark #6 and #7 as COMPLETED
- PR #7 section: Add completion status with commit hashes
- Document deliverables: inject_trace_context(), tests, guarantees

Remaining: PRs #8 (API events), #9 (remove buffering)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 27, 2026
…/9) (#15)

* [Feature] Add API↔Engine context propagation for journey tracing (PR #7/9)

This PR implements W3C Trace Context propagation from API spans to core spans,
enabling parent-child linkage in distributed traces. Completes the handshake
between PR #6 (API span lifecycle) and PR #2 (core span lifecycle).

Changes:
- Add inject_trace_context() helper to vllm/tracing.py
- Inject API span context into trace_headers after span creation
- Context flows to engine.generate() and scheduler for parent-child linkage
- Defensive error handling: injection failures never break requests
- Zero overhead when tracing disabled (early return)

Behavioral guarantees verified by tests:
- G1: Trace ID continuity (API and core spans share same trace_id)
- G2: W3C Trace Context format (traceparent header valid)
- G3: Trace continuation (trace_id preserved through Client→API→Core)
- G4: Graceful degradation (request continues on injection failure)
- G5: No exception propagation (injection failures caught)
- G6: Conditional injection (only when API span exists)

Invariants:
- I1: Backward compatibility (early return when tracing disabled)
- I2: Zero overhead when disabled (no propagator/allocation access)
- I3: No resource leaks (only modifies existing trace_headers dict)

Test coverage:
- 12 new tests (100% pass) covering all unit-testable properties
- 17 existing API span lifecycle tests pass (no regressions)
- Tests focus on behavioral properties, not implementation details

Safety properties:
- Zero new resources (only modifies existing dict)
- No cleanup obligations (dict managed by request lifecycle)
- Stateless transformation (span context → headers)
- Single injection point (strict ordering preserved)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Polish] Improve inject_trace_context docstring and strengthen test

Two quality improvements following code review:

1. Clarify inject_trace_context() docstring:
   - Previous: "or None if injection failed" (misleading)
   - Now: Explicitly documents when carrier is returned unchanged
   - Details all three early-return paths (OTEL unavailable, span None, exception)

2. Strengthen test_trace_id_preserved_through_chain():
   - Mock propagator now actually reads span.get_span_context()
   - Extracts trace_id and span_id from span context
   - Generates traceparent using those values (simulates real OTEL behavior)
   - Asserts get_span_context() was called
   - Better proves G1/G3 guarantees without requiring real OTLP exporter

Test results: All 29 tests pass (12 context propagation + 17 lifecycle)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Docs] Mark PR #7 as completed in journey tracing plan

Updates to reflect PR #7 completion:
- PR sequence table: Mark #7 as COMPLETED with 12 tests
- Dependency chain: Mark #6 and #7 as COMPLETED
- PR #7 section: Add completion status with commit hashes
- Document deliverables: inject_trace_context(), tests, guarantees

Remaining: PRs #8 (API events), #9 (remove buffering)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* removing PR7_summary

Signed-off-by: Srinivasan Parthasarathy <spartha@us.ibm.com>

---------

Signed-off-by: Srinivasan Parthasarathy <spartha@us.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 28, 2026
Completes migration to OTEL-based journey tracing by removing all intermediate
buffering and export mechanisms. Journey events are now emitted exclusively as
OTEL spans in real-time, while Prometheus metrics capture timestamps directly
on Request objects using monotonic time.

Changes:
- Remove journey event buffer dictionary and flushing logic from scheduler
- Remove journey event export from output processor
- Add direct timestamp capture (queued_ts, scheduled_ts) to Request
- Preserve backward compatibility with deprecated journey_events parameters
- Add 16 comprehensive tests verifying no buffering, span infrastructure,
  metrics independence, and backward compatibility

All 16 PR #9 tests pass. All existing scheduler tests pass.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 28, 2026
…tats

Update plan document to reflect actual implementation results vs estimates:

Changes:
- Update total line counts: ~7,528 added / ~1,116 removed (was ~618/~280)
- Update PR #9 stats: 16 tests, ~478 added / ~389 removed (was 4-5 tests)
- Update total test count: 27+ journey tracing tests (was 77)
- Add implementation timeline: Jan 23-27, 2026
- Add "Implementation Status" section with all completed PRs
- Update PR #0 description to clarify metrics restoration evolution
- Add timestamp propagation path diagram for PR #9
- Clarify that journey event buffering removed in PR #9

All stats now match actual git history and test counts.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 28, 2026
Additional documentation improvements:

- Add "Implementation Status" section with all completed PRs (PR #0-9)
  with commit hashes and PR numbers
- Add timestamp propagation path diagram showing Request → EngineCoreOutput
  → OutputProcessor → req_state.stats flow
- Update PR #0 description to clarify metrics restoration evolution
  (journey events were interim, replaced by direct capture in PR #9)
- Clarify timeline and completion status

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 28, 2026
* [Feature] Remove journey event buffering (PR #9/9)

Completes migration to OTEL-based journey tracing by removing all intermediate
buffering and export mechanisms. Journey events are now emitted exclusively as
OTEL spans in real-time, while Prometheus metrics capture timestamps directly
on Request objects using monotonic time.

Changes:
- Remove journey event buffer dictionary and flushing logic from scheduler
- Remove journey event export from output processor
- Add direct timestamp capture (queued_ts, scheduled_ts) to Request
- Preserve backward compatibility with deprecated journey_events parameters
- Add 16 comprehensive tests verifying no buffering, span infrastructure,
  metrics independence, and backward compatibility

All 16 PR #9 tests pass. All existing scheduler tests pass.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Docs] Update JOURNEY_TRACING_PR_PLAN.md with actual implementation stats

Update plan document to reflect actual implementation results vs estimates:

Changes:
- Update total line counts: ~7,528 added / ~1,116 removed (was ~618/~280)
- Update PR #9 stats: 16 tests, ~478 added / ~389 removed (was 4-5 tests)
- Update total test count: 27+ journey tracing tests (was 77)
- Add implementation timeline: Jan 23-27, 2026
- Add "Implementation Status" section with all completed PRs
- Update PR #0 description to clarify metrics restoration evolution
- Add timestamp propagation path diagram for PR #9
- Clarify that journey event buffering removed in PR #9

All stats now match actual git history and test counts.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Docs] Add implementation status and timestamp propagation to plan

Additional documentation improvements:

- Add "Implementation Status" section with all completed PRs (PR #0-9)
  with commit hashes and PR numbers
- Add timestamp propagation path diagram showing Request → EngineCoreOutput
  → OutputProcessor → req_state.stats flow
- Update PR #0 description to clarify metrics restoration evolution
  (journey events were interim, replaced by direct capture in PR #9)
- Clarify timeline and completion status

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 28, 2026
Address review feedback on journey tracing documentation:

- Fix PR count: clarify 10 PRs total (PR #0 prerequisite + PRs #1-#9)
- Correct test counts: 88 new tests (was inconsistently stated as 27+/45+)
- Add event naming clarification (api.ARRIVED, journey.QUEUED prefixes)
- Fix PR #6 streaming snippet to show finalize before yield [DONE]
- Label overhead numbers as ballpark estimates
- Clarify time domain usage (monotonic vs epoch, seconds vs nanoseconds)
- Explain trace context propagation (HTTP headers vs internal dict)
- Document error flow edge cases (truncated core events on early abort)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 28, 2026
Remove two failing tests that reference the legacy journey event buffering
system removed in PR #9 (commit 1d9b9f3):

- test_no_events_when_span_none: Referenced _journey_events_buffer_by_client
- test_legacy_buffering_still_works: Tested parallel buffering (no longer exists)

These tests validated the legacy buffering pathway that was intentionally
removed. Comprehensive coverage of the new span-based tracing exists in
tests/v1/core/test_pr9_no_buffering.py (16 tests, 337 lines).

Add REGRESSION_AUDIT_REPORT.md documenting comprehensive regression analysis
from v0.0.1 to HEAD:
- 42 files changed analyzed (10,824 insertions, 1,074 deletions)
- All production code paths verified safe
- Zero regressions to existing functionality
- Proper backward compatibility maintained
- OTEL imports optional and safe
- Metrics work independently of tracing

Test Results: 99 passed (all non-journey scheduler tests)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 28, 2026
…it (#18)

* [Docs] Fix journey tracing documentation inconsistencies

Address review feedback on journey tracing documentation:

- Fix PR count: clarify 10 PRs total (PR #0 prerequisite + PRs #1-#9)
- Correct test counts: 88 new tests (was inconsistently stated as 27+/45+)
- Add event naming clarification (api.ARRIVED, journey.QUEUED prefixes)
- Fix PR #6 streaming snippet to show finalize before yield [DONE]
- Label overhead numbers as ballpark estimates
- Clarify time domain usage (monotonic vs epoch, seconds vs nanoseconds)
- Explain trace context propagation (HTTP headers vs internal dict)
- Document error flow edge cases (truncated core events on early abort)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Tests] Remove obsolete journey buffering tests and add regression audit

Remove two failing tests that reference the legacy journey event buffering
system removed in PR #9 (commit 1d9b9f3):

- test_no_events_when_span_none: Referenced _journey_events_buffer_by_client
- test_legacy_buffering_still_works: Tested parallel buffering (no longer exists)

These tests validated the legacy buffering pathway that was intentionally
removed. Comprehensive coverage of the new span-based tracing exists in
tests/v1/core/test_pr9_no_buffering.py (16 tests, 337 lines).

Add REGRESSION_AUDIT_REPORT.md documenting comprehensive regression analysis
from v0.0.1 to HEAD:
- 42 files changed analyzed (10,824 insertions, 1,074 deletions)
- All production code paths verified safe
- Zero regressions to existing functionality
- Proper backward compatibility maintained
- OTEL imports optional and safe
- Metrics work independently of tracing

Test Results: 99 passed (all non-journey scheduler tests)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 30, 2026
…ests

This commit fixes failing integration tests by removing tests for obsolete
buffering architecture and adding 2 NEW focused integration tests that verify
the post-PR#9 cross-component contracts.

CHANGES:
1. Deleted tests/v1/engine/test_journey_tracing_integration.py (12 tests)
   - All tests were testing removed buffering architecture (RequestState.journey_events)

2. Updated test_journey_events_populate_prometheus_timestamps
   - Changed to use EngineCoreOutput timestamp fields (new architecture)
   - Removed deprecated journey_events parameter usage

3. Added 2 NEW integration tests (focused, high-signal):
   a) test_timestamp_propagation_batch_processing
      - Verifies timestamps propagate correctly for multiple concurrent requests
      - Tests cross-component wiring: EngineCoreOutput → OutputProcessor → stats
      - Why integration: Tests batch coordination and per-request isolation

   b) test_request_finish_metrics_completeness
      - Verifies RequestOutput.metrics is complete when request finishes
      - Tests finish flow: EngineCore → OutputProcessor → API
      - Why integration: Tests finish trigger mechanism across components

ROOT CAUSE:
PR #9 removed journey event buffering architecture. Journey events now emit
directly to OTEL spans in the scheduler. Metrics timestamps flow through
EngineCoreOutput fields, not journey_events parameter.

NEW INTEGRATION CONTRACTS (Post-PR#9):
1. Timestamp Propagation: EngineCoreOutput → OutputProcessor → RequestState.stats
   - Works for batch processing (multiple concurrent requests)
   - Maintains per-request isolation
   - Independent of journey tracing

2. Metrics Completeness: RequestOutput.metrics fully populated on finish
   - Timing metrics (arrival, queued, scheduled, first/last token)
   - Token count metrics
   - Finish reason propagated correctly

3. Deprecated API Inertness: journey_events parameter ignored (backward compat)
   - Already tested by updated test_journey_events_populate_prometheus_timestamps

WHY THESE TESTS (Not 1:1 Porting):
- Deleted tests verified buffering (intentionally removed)
- New tests verify cross-component wiring (post-PR#9 architecture)
- Focused on stable public contracts, not internal implementation
- Complement existing coverage:
  * test_pr9_no_buffering.py: Scheduler-level behavior (16 tests)
  * New integration tests: Cross-component wiring (2 tests)

MAINTAINER NARRATIVE:
Why integration tests instead of unit tests?
- Timestamp batch processing: Unit tests verify single request, integration
  verifies batch coordination and per-request isolation
- Finish metrics: Unit tests verify individual calculations, integration
  verifies finish trigger mechanism across EngineCore→OutputProcessor→API
- These test WIRING, not just individual component behavior

Test Results:
- test_timestamp_propagation_batch_processing: PASSED
- test_request_finish_metrics_completeness: PASSED
- test_pr9_no_buffering.py (16 tests): PASSED
- test_journey_events_populate_prometheus_timestamps: PASSED

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 30, 2026
…used post-PR#9 contracts (#30)

* [Tests] Remove obsolete tests and add focused post-PR#9 integration tests

This commit fixes failing integration tests by removing tests for obsolete
buffering architecture and adding 2 NEW focused integration tests that verify
the post-PR#9 cross-component contracts.

CHANGES:
1. Deleted tests/v1/engine/test_journey_tracing_integration.py (12 tests)
   - All tests were testing removed buffering architecture (RequestState.journey_events)

2. Updated test_journey_events_populate_prometheus_timestamps
   - Changed to use EngineCoreOutput timestamp fields (new architecture)
   - Removed deprecated journey_events parameter usage

3. Added 2 NEW integration tests (focused, high-signal):
   a) test_timestamp_propagation_batch_processing
      - Verifies timestamps propagate correctly for multiple concurrent requests
      - Tests cross-component wiring: EngineCoreOutput → OutputProcessor → stats
      - Why integration: Tests batch coordination and per-request isolation

   b) test_request_finish_metrics_completeness
      - Verifies RequestOutput.metrics is complete when request finishes
      - Tests finish flow: EngineCore → OutputProcessor → API
      - Why integration: Tests finish trigger mechanism across components

ROOT CAUSE:
PR #9 removed journey event buffering architecture. Journey events now emit
directly to OTEL spans in the scheduler. Metrics timestamps flow through
EngineCoreOutput fields, not journey_events parameter.

NEW INTEGRATION CONTRACTS (Post-PR#9):
1. Timestamp Propagation: EngineCoreOutput → OutputProcessor → RequestState.stats
   - Works for batch processing (multiple concurrent requests)
   - Maintains per-request isolation
   - Independent of journey tracing

2. Metrics Completeness: RequestOutput.metrics fully populated on finish
   - Timing metrics (arrival, queued, scheduled, first/last token)
   - Token count metrics
   - Finish reason propagated correctly

3. Deprecated API Inertness: journey_events parameter ignored (backward compat)
   - Already tested by updated test_journey_events_populate_prometheus_timestamps

WHY THESE TESTS (Not 1:1 Porting):
- Deleted tests verified buffering (intentionally removed)
- New tests verify cross-component wiring (post-PR#9 architecture)
- Focused on stable public contracts, not internal implementation
- Complement existing coverage:
  * test_pr9_no_buffering.py: Scheduler-level behavior (16 tests)
  * New integration tests: Cross-component wiring (2 tests)

MAINTAINER NARRATIVE:
Why integration tests instead of unit tests?
- Timestamp batch processing: Unit tests verify single request, integration
  verifies batch coordination and per-request isolation
- Finish metrics: Unit tests verify individual calculations, integration
  verifies finish trigger mechanism across EngineCore→OutputProcessor→API
- These test WIRING, not just individual component behavior

Test Results:
- test_timestamp_propagation_batch_processing: PASSED
- test_request_finish_metrics_completeness: PASSED
- test_pr9_no_buffering.py (16 tests): PASSED
- test_journey_events_populate_prometheus_timestamps: PASSED

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Tests] Strengthen integration test assertions and documentation

Improvements based on rigorous maintainer review and code verification:

P0 - CRITICAL (Missing Coverage):
- Add finish_reason propagation validation in test_request_finish_metrics_completeness
  * Verify FinishReason.LENGTH → "length" string conversion (via __str__)
  * Assert finish_reason appears in CompletionOutput.finish_reason field
  * Assert RequestOutput.finished reflects completion state
  * Verified: FinishReason.LENGTH stringifies to "length" via FINISH_REASON_STRINGS[1]

P1 - HIGH (Strengthen Weak Assertions):
- Replace `> 0.0` checks with exact equality for first_token_ts and last_token_ts
  * first_token_ts == 1100.0 (set on first update_from_output while is_prefilling=True)
  * last_token_ts == 1170.0 (set from engine_core_timestamp in final step)
  * Verified: Both are direct assignments from engine_core_timestamp (no transformation)
- Add timing invariants: queued_ts <= scheduled_ts, first_token_ts <= last_token_ts
  * Removed scheduled_ts <= first_token_ts (not guaranteed in this test scenario)

P2 - MEDIUM (Documentation):
- Enhanced test_journey_events_populate_prometheus_timestamps docstring
  * Document first-write-wins conditions: source > 0.0 AND dest == 0.0
  * Reference exact code locations (output_processor.py:535,537)
  * Add comment explaining copy condition sensitivity
- Enhanced test_timestamp_propagation_batch_processing docstring
  * Define "isolation": each request's stats contain only its own timestamps
  * Clarify no cross-contamination between requests in batch
- Add clock domain documentation to test_request_finish_metrics_completeness
  * arrival_time: wall-clock (time.time()) - verified from input_processor.py:477
  * queued_ts/scheduled_ts/first/last_token_ts: monotonic - verified from scheduler.py
  * Warning: DO NOT compare across domains in production code

P3 - LOW (Internal State Check):
- Label request_states cleanup check as "leak-guard invariant"
  * Document guarantee: output_processor.py:587 unconditional pop on finish
  * Valid contract verification, not just implementation detail

All changes verified against actual code behavior. No assumptions or speculation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Tests] Clarify first_token_ts semantic confusion in test documentation

CRITICAL CLARIFICATION based on code verification:

The field name "first_token_ts" is semantically misleading. It does NOT mean
"timestamp when first token is generated" but rather "timestamp of first
prefill update" (even if no tokens generated).

Code Evidence (stats.py:288-289):
```python
if is_prefilling:
    req_stats.first_token_ts = engine_core_timestamp
```

Condition: ONLY checks `is_prefilling` flag (no check on token emission)

Order of Operations (output_processor.py:541-552):
1. _update_stats_from_output() called (sets first_token_ts)
2. THEN is_prefilling = False

Test Execution:
- Step 1 (new_token_ids=[], timestamp=1100.0): first_token_ts = 1100.0 ✓
- Step 2 (new_token_ids=[42], timestamp=1150.0): first_token_ts unchanged (is_prefilling=False)

Assertion is CORRECT (1100.0), but semantic meaning needed clarification.

Changes:
- Enhanced inline comment explaining condition (stats.py:288 reference)
- Note semantic confusion: "first_token_ts" ≠ "first token generated"
- Clarify it means "first prefill update timestamp"
- Add docstring section explaining this behavior
- Reference exact test steps to avoid future confusion

No behavior changes. Documentation only.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Tests] Use maintainer-neutral language in first_token_ts documentation

Revised wording to be neutral and professional:
- "Semantic confusion" → "Note on first_token_ts definition"
- "doesn't mean" → "Currently defined as"
- "Despite the name" → "Currently defined as"
- "Does NOT check" → "Condition checks ... only"
- "NOT 1150.0" → "not step 2"

Preserves all technical accuracy while avoiding judgmental language.

No behavioral changes. Documentation wording only.

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant