[V1][Spec Decode] Fix greedy temperature detection after sampler refactor by Pradyun92 · Pull Request #27077 · vllm-project/vllm

Pradyun92 · 2025-10-17T05:54:38Z

Description

Fixes division by zero and negative acceptance values in speculative decoding caused by incomplete temperature migration in commit a676e66.

Issue

Commit a676e66 changed greedy sampling temperature from -1.0 to 0.0 in gpu_input_batch.py, but several components still checking for temperature == -1 were not updated, causing:

Division by zero in rejection sampler
NaN/Inf logits
Empty generated_token_ids
Negative acceptance values (len([]) - 1 = -1)
Prometheus metrics crash: "ValueError: Gauge metric cannot decrease"

Root Cause

Multiple components were not updated after the temperature refactor:

TPU input batch (tpu_input_batch.py): Still using -1.0 sentinel value
Eagle (eagle.py): Hardcoded temperature == -1 check in unused function
Rejection sampler (rejection_sampler.py): GREEDY_TEMPERATURE constant set to -1

The rejection sampler's expand_batch_to_tokens() function replaces GREEDY_TEMPERATURE (-1) with 1.0 before division, but when actual temperature is 0.0, the replacement doesn't happen, causing division by zero.

Impact

Affects ALL models using speculative decoding with rejection sampling:

Eagle/Eagle3
deepseek_mtp
ernie_mtp
qwen3_next_mtp
longcat_flash_mtp

Solution

Complete the temperature migration consistently across all components:

tpu_input_batch.py (line 219): Change temperature = -1.0 → 0.0 for consistency with GPU
eagle.py (lines 40, 1144-1151):
- Import _SAMPLING_EPS from sampler.py
- Use epsilon comparison: temperature < _SAMPLING_EPS instead of == -1
- Add if not all_random: check before protection (consistent with sampler.py)
rejection_sampler.py (line 18): Change GREEDY_TEMPERATURE constant from -1 → 0

Testing

Verified Eagle speculation works correctly with temperature 0.0
No longer produces negative acceptance errors
Prometheus metrics no longer crash

Code Review

This pull request addresses a critical bug in speculative decoding that could lead to division-by-zero errors. The issue stemmed from an incomplete migration of the greedy sampling temperature representation from -1.0 to 0.0 in a previous refactor. Your changes correctly propagate this update across all affected components:

In rejection_sampler.py, GREEDY_TEMPERATURE is correctly updated to 0.
In eagle.py, the check for greedy sampling is updated to use an epsilon comparison, which is more robust and consistent with the rest of the codebase.
In tpu_input_batch.py, the greedy temperature is set to 0.0, aligning it with the GPU implementation.

These changes are well-targeted, correct, and crucial for fixing the reported crashes and incorrect behavior. The pull request is well-documented and the solution is sound.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR.

@codex fix this CI failure
@codex address that feedback

Pradyun92 · 2025-10-17T06:04:58Z

Thank you for the detailed review! You're absolutely correct - I've now fixed the TPU sampler division by zero issue.

Changes Made

Updated the commit to include:

vllm/v1/sample/tpu/sampler.py - Added epsilon guard to apply_temperature():

def apply_temperature(self, logits, temp, all_random=False):
    # Avoid division by zero for greedy sampling (temperature ~ 0.0).
    if not all_random:
        temp = torch.where(temp < _SAMPLING_EPS, 1.0, temp)
    return logits.div_(temp.unsqueeze(dim=1))

vllm/v1/sample/tpu/metadata.py - Added all_random property to TPUSupportedSamplingMetadata so the sampler can conditionally apply the epsilon guard.

This follows the same pattern as vllm/v1/sample/sampler.py (lines 127-131) which already has this protection.

The fix ensures that:

Greedy TPU requests clamp temperature to 1.0 before division (avoiding NaN/Inf logits)
Random sampling requests pass through unchanged
Consistent behavior across GPU and TPU code paths

All ruff checks pass. Let me know if you'd like any other changes!

njhill

Thanks @Pradyun92 for catching and fixing this. The fixes look great.

Could you rebase on latest main? That should hopefully fix the doc build error.

…ctor ## Issue Commit a676e66 changed greedy sampling temperature from -1.0 to 0.0, but several components checking for temperature == -1 were not updated: - TPU input batch (tpu_input_batch.py) still using -1.0 - Eagle's unused function (eagle.py) with hardcoded temperature == -1 - Rejection sampler (rejection_sampler.py) GREEDY_TEMPERATURE constant = -1 - TPU sampler (tpu/sampler.py) missing epsilon guard causing division by zero ## Impact Affects ALL models using speculative decoding with rejection sampling (Eagle/Eagle3, deepseek_mtp, ernie_mtp, qwen3_next_mtp, longcat_flash_mtp): - Division by zero in rejection sampler and TPU sampler - NaN/Inf logits - Empty generated_token_ids - Negative acceptance values (len([]) - 1 = -1) - Prometheus metrics crash ## Solution Complete the temperature migration consistently: 1. tpu_input_batch.py: Change temperature -1.0 → 0.0 2. eagle.py: Use epsilon comparison (temperature < _SAMPLING_EPS) 3. rejection_sampler.py: Change GREEDY_TEMPERATURE constant -1 → 0 4. tpu/sampler.py: Add epsilon guard before division 5. tpu/metadata.py: Add all_random property for sampler Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>

Issue: TPU sampler and Eagle code had two separate but related issues: 1. TPU sampler divides by zero for greedy requests (temperature=0.0) 2. Eagle code triggers mypy type errors due to missing None check Root Cause: - TPU sampler's apply_temperature() method lacks epsilon guard to prevent division by zero when temperature=0.0 (greedy sampling) - Eagle's compute_probs_and_sample_next_token() uses temperature without asserting it's not None, causing mypy type errors Impact: - TPU: Division by zero produces NaN/Inf logits, breaking speculative decoding on TPU platforms for all models using Eagle/rejection sampling - Eagle: mypy type checking failures prevent pre-commit hooks from passing Fix: 1. TPU Sampler (vllm/v1/sample/tpu/sampler.py): - Add all_random parameter to apply_temperature() method - Add epsilon guard: if not all_random: temp = torch.where(temp < _SAMPLING_EPS, 1.0, temp) - Update call site to pass sampling_metadata.all_random 2. TPU Metadata (vllm/v1/sample/tpu/metadata.py): - Add all_random property to TPUSupportedSamplingMetadata - Populate all_random from input_batch in from_input_batch() 3. Eagle (vllm/v1/spec_decode/eagle.py): - Add assert sampling_metadata.temperature is not None after all_greedy early return - Matches sampler.py pattern (line 162) for type safety Files Modified: - vllm/v1/sample/tpu/sampler.py: Epsilon guard in apply_temperature() - vllm/v1/sample/tpu/metadata.py: Added all_random property - vllm/v1/spec_decode/eagle.py: Added temperature None assertion - CLAUDE.md: Updated modification vllm-project#11 to document fixes This addresses PR vllm-project#27077 reviewer feedback and resolves mypy type errors. Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>

benchislett

LGTM

njhill · 2025-10-17T16:08:41Z

Side note is that we clearly have some CI gaps here.

…ctor (vllm-project#27077) Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com> Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com>

…ctor (vllm-project#27077) Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com> Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

…ctor (vllm-project#27077) Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com> Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com>

### What this PR does / why we need it? fix greedy temperature detection from vllm-project/vllm#27077 - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com>

### What this PR does / why we need it? fix greedy temperature detection from vllm-project/vllm#27077 - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? fix greedy temperature detection from vllm-project/vllm#27077 - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com>

### What this PR does / why we need it? fix greedy temperature detection from vllm-project/vllm#27077 - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

Pradyun92 requested review from 22quinn, NickLucche, benchislett, houseroad, luccafong and njhill as code owners October 17, 2025 05:54

mergify bot added speculative-decoding v1 tpu Related to Google TPUs labels Oct 17, 2025

gemini-code-assist bot reviewed Oct 17, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 17, 2025

View reviewed changes

Comment thread vllm/v1/worker/tpu_input_batch.py

Pradyun92 force-pushed the fix-eagle-temperature-detection branch from c5669f8 to bacd60b Compare October 17, 2025 06:04

Pradyun92 force-pushed the fix-eagle-temperature-detection branch from fb38090 to 3afb34a Compare October 17, 2025 06:15

njhill approved these changes Oct 17, 2025

View reviewed changes

njhill added the bug Something isn't working label Oct 17, 2025

njhill added this to the v0.11.1 milestone Oct 17, 2025

Pradyun Ramadorai added 2 commits October 17, 2025 10:41

Pradyun92 force-pushed the fix-eagle-temperature-detection branch from 3afb34a to fbf572f Compare October 17, 2025 14:42

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 17, 2025

benchislett approved these changes Oct 17, 2025

View reviewed changes

Merge branch 'main' into fix-eagle-temperature-detection

565f20e

njhill merged commit acedc74 into vllm-project:main Oct 17, 2025
46 checks passed

Pradyun92 mentioned this pull request Oct 23, 2025

[Bug]: vLLM engine fails with error: 'ValueError: Counters can only be incremented by non-negative amounts.' #26372

Closed

1 task

This was referenced Dec 27, 2025

[Bugfix] fix greedy temperature detection vllm-project/vllm-ascend#5417

Merged

[0.11.0]fix greedy temp detection vllm-project/vllm-ascend#5418

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[V1][Spec Decode] Fix greedy temperature detection after sampler refactor#27077

[V1][Spec Decode] Fix greedy temperature detection after sampler refactor#27077
njhill merged 3 commits intovllm-project:mainfrom
Pradyun92:fix-eagle-temperature-detection

Pradyun92 commented Oct 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Pradyun92 commented Oct 17, 2025

Uh oh!

njhill left a comment

Uh oh!

benchislett left a comment

Uh oh!

njhill commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Pradyun92 commented Oct 17, 2025

Description

Issue

Root Cause

Impact

Solution

Testing

Related

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Pradyun92 commented Oct 17, 2025

Changes Made

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

njhill commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants