Skip to content

[Model Runner V2] Multiple prompt logprobs support#39937

Merged
yewentao256 merged 6 commits into
mainfrom
wentao-prompt-logprobs-support
Apr 21, 2026
Merged

[Model Runner V2] Multiple prompt logprobs support#39937
yewentao256 merged 6 commits into
mainfrom
wentao-prompt-logprobs-support

Conversation

@yewentao256

Copy link
Copy Markdown
Member

Purpose

Part of the #39337

Multiple prompt logprobs support

Test

VLLM_USE_V2_MODEL_RUNNER=1 pytest tests/v1/sample/test_logprobs.py -k prompt_logprobs_with_chunking_and_preemption

Originnaly

__________________________________ test_prompt_logprobs_with_chunking_and_preemption ___________________________________

    def test_prompt_logprobs_with_chunking_and_preemption():
        """Test that prompt logprobs are correctly returned when using
        both chunked prefill and preemption.
    
        This test ensures that the num_prompt_logprobs tracking persists
        across preemptions and prefill chunks.
        """
    
        # Create prompts that will trigger chunking and preemption
        prompts = [
            "The following numbers of the sequence "
            + ", ".join(str(i) for i in range(10))
            + " are:",
            "In one word, the capital of France is ",
        ] + [f"Tell me about the number {i}: " for i in range(32)]
    
        sampling_params = SamplingParams(
            temperature=0.0,
            max_tokens=40,
            min_tokens=20,
            prompt_logprobs=2,  # Request prompt logprobs
        )
    
        with VllmRunner(
            "Qwen/Qwen3-0.6B",
            max_model_len=512,
            enable_chunked_prefill=True,
            max_num_batched_tokens=48,  # Force prefill chunking
            num_gpu_blocks_override=32,  # Force preemptions
            disable_log_stats=False,
            gpu_memory_utilization=0.25,
        ) as vllm_model:
            metrics_before = vllm_model.llm.get_metrics()
    
            # Generate with prompt logprobs using generate_w_logprobs which
            # returns (output_ids, output_str, output_logprobs, prompt_logprobs)
            outputs = vllm_model.generate_w_logprobs(
                prompts, sampling_params=sampling_params, include_prompt_token_ids=True
            )
    
            # Verify that all outputs have prompt logprobs
            for i, output in enumerate(outputs):
                _, _, _, prompt_token_ids, prompt_logprobs = output
                assert prompt_logprobs is not None and len(prompt_logprobs) > 0, (
                    f"Output {i} missing prompt logprobs"
                )
                assert len(prompt_logprobs) == len(prompt_token_ids), (
                    "Unexpected number of prompt logprob positions"
                )
    
                # Each position should have the requested number of logprobs
                for pos, logprobs_dict in enumerate(prompt_logprobs):
                    if logprobs_dict is not None:  # First token may be None
>                       assert (
                            sampling_params.prompt_logprobs
                            <= len(logprobs_dict)
                            <= sampling_params.prompt_logprobs + 1
                        ), (
                            f"Output {i} position {pos} has {len(logprobs_dict)} "
                            f"logprobs, expected {sampling_params.prompt_logprobs}"
                        )
E                       AssertionError: Output 0 position 1 has 1 logprobs, expected 2
E                       assert 2 <= 1
E                        +  where 2 = SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, t...mpt_logprobs=2, skip_special_tokens=True, spaces_between_special_tokens=True, structured_outputs=None, extra_args=None).prompt_logprobs
E                        +  and   1 = len({2701: Logprob(logprob=-10.656400680541992, rank=5307, decoded_token=' following')})

tests/v1/sample/test_logprobs.py:1216: AssertionError

Now

======================= 1 passed, 52 deselected, 17 warnings in 14.63s =======================

CC @WoosukKwon

Signed-off-by: yewentao256 <zhyanwentao@126.com>
@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 15, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the prompt logprob computation logic to handle varying logprob request counts within a single batch. It introduces a mechanism to track the number of logprobs per request and utilizes the batch's maximum requested count during the chunked computation process. Feedback indicates that in mixed batches, requests currently receive the batch-wide maximum number of logprobs rather than their specific requested amount, which may cause assertion failures. A code suggestion was provided to slice the resulting tensors to match each request's individual requirements.

Comment thread vllm/v1/worker/gpu/sample/prompt_logprob.py

@njhill njhill left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yewentao256 this looks good to me, just minor simplifications

Comment thread vllm/v1/worker/gpu/sample/prompt_logprob.py Outdated
Comment thread vllm/v1/worker/gpu/sample/prompt_logprob.py Outdated
@@ -17,13 +17,18 @@ def __init__(self, max_num_reqs: int):
self.max_num_reqs = max_num_reqs

self.uses_prompt_logprobs = np.zeros(self.max_num_reqs, dtype=bool)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this if we are introducing the counts?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need uses_prompt_logprobs because prompt_logprobs=0 is a valid enabled case.

Comment thread vllm/v1/worker/gpu/sample/prompt_logprob.py Outdated
yewentao256 and others added 3 commits April 20, 2026 21:36
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
@vllm-project vllm-project deleted a comment from mergify Bot Apr 20, 2026

@njhill njhill left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yewentao256 just thought of one more small simplification

Comment thread vllm/v1/worker/gpu/sample/prompt_logprob.py Outdated
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
@yewentao256 yewentao256 enabled auto-merge (squash) April 20, 2026 23:38
Signed-off-by: yewentao256 <zhyanwentao@126.com>
@vllm-project vllm-project deleted a comment from mergify Bot Apr 20, 2026
@yewentao256 yewentao256 merged commit 66cc3fa into main Apr 21, 2026
61 checks passed
@yewentao256 yewentao256 deleted the wentao-prompt-logprobs-support branch April 21, 2026 15:49
Copilot AI pushed a commit to hongbolv/vllm that referenced this pull request Apr 22, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Apr 23, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Apr 23, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Yifan <yzong@redhat.com>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Adrian <info@zzit.ch>
Copilot AI pushed a commit to hongbolv/vllm that referenced this pull request May 7, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill added the v2 label May 20, 2026
brian-dellabetta pushed a commit to neuralmagic/vllm that referenced this pull request May 29, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1 v2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants