[Model Runner V2] Multiple prompt logprobs support by yewentao256 · Pull Request #39937 · vllm-project/vllm

yewentao256 · 2026-04-15T19:06:53Z

Purpose

Part of the #39337

Multiple prompt logprobs support

Test

VLLM_USE_V2_MODEL_RUNNER=1 pytest tests/v1/sample/test_logprobs.py -k prompt_logprobs_with_chunking_and_preemption

Originnaly

__________________________________ test_prompt_logprobs_with_chunking_and_preemption ___________________________________

    def test_prompt_logprobs_with_chunking_and_preemption():
        """Test that prompt logprobs are correctly returned when using
        both chunked prefill and preemption.
    
        This test ensures that the num_prompt_logprobs tracking persists
        across preemptions and prefill chunks.
        """
    
        # Create prompts that will trigger chunking and preemption
        prompts = [
            "The following numbers of the sequence "
            + ", ".join(str(i) for i in range(10))
            + " are:",
            "In one word, the capital of France is ",
        ] + [f"Tell me about the number {i}: " for i in range(32)]
    
        sampling_params = SamplingParams(
            temperature=0.0,
            max_tokens=40,
            min_tokens=20,
            prompt_logprobs=2,  # Request prompt logprobs
        )
    
        with VllmRunner(
            "Qwen/Qwen3-0.6B",
            max_model_len=512,
            enable_chunked_prefill=True,
            max_num_batched_tokens=48,  # Force prefill chunking
            num_gpu_blocks_override=32,  # Force preemptions
            disable_log_stats=False,
            gpu_memory_utilization=0.25,
        ) as vllm_model:
            metrics_before = vllm_model.llm.get_metrics()
    
            # Generate with prompt logprobs using generate_w_logprobs which
            # returns (output_ids, output_str, output_logprobs, prompt_logprobs)
            outputs = vllm_model.generate_w_logprobs(
                prompts, sampling_params=sampling_params, include_prompt_token_ids=True
            )
    
            # Verify that all outputs have prompt logprobs
            for i, output in enumerate(outputs):
                _, _, _, prompt_token_ids, prompt_logprobs = output
                assert prompt_logprobs is not None and len(prompt_logprobs) > 0, (
                    f"Output {i} missing prompt logprobs"
                )
                assert len(prompt_logprobs) == len(prompt_token_ids), (
                    "Unexpected number of prompt logprob positions"
                )
    
                # Each position should have the requested number of logprobs
                for pos, logprobs_dict in enumerate(prompt_logprobs):
                    if logprobs_dict is not None:  # First token may be None
>                       assert (
                            sampling_params.prompt_logprobs
                            <= len(logprobs_dict)
                            <= sampling_params.prompt_logprobs + 1
                        ), (
                            f"Output {i} position {pos} has {len(logprobs_dict)} "
                            f"logprobs, expected {sampling_params.prompt_logprobs}"
                        )
E                       AssertionError: Output 0 position 1 has 1 logprobs, expected 2
E                       assert 2 <= 1
E                        +  where 2 = SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, t...mpt_logprobs=2, skip_special_tokens=True, spaces_between_special_tokens=True, structured_outputs=None, extra_args=None).prompt_logprobs
E                        +  and   1 = len({2701: Logprob(logprob=-10.656400680541992, rank=5307, decoded_token=' following')})

tests/v1/sample/test_logprobs.py:1216: AssertionError

Now

======================= 1 passed, 52 deselected, 17 warnings in 14.63s =======================

CC @WoosukKwon

Signed-off-by: yewentao256 <zhyanwentao@126.com>

gemini-code-assist

Code Review

This pull request enhances the prompt logprob computation logic to handle varying logprob request counts within a single batch. It introduces a mechanism to track the number of logprobs per request and utilizes the batch's maximum requested count during the chunked computation process. Feedback indicates that in mixed batches, requests currently receive the batch-wide maximum number of logprobs rather than their specific requested amount, which may cause assertion failures. A code suggestion was provided to slice the resulting tensors to match each request's individual requirements.

njhill

Thanks @yewentao256 this looks good to me, just minor simplifications

njhill · 2026-04-20T21:19:19Z

@@ -17,13 +17,18 @@ def __init__(self, max_num_reqs: int):
        self.max_num_reqs = max_num_reqs

        self.uses_prompt_logprobs = np.zeros(self.max_num_reqs, dtype=bool)


Do we still need this if we are introducing the counts?

We still need uses_prompt_logprobs because prompt_logprobs=0 is a valid enabled case.

Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com>

njhill

Thanks @yewentao256 just thought of one more small simplification

Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Yifan <yzong@redhat.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Adrian <info@zzit.ch>

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

multiple prompt logprobs support

9a6e5ea

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested review from WoosukKwon and njhill as code owners April 15, 2026 19:06

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 15, 2026

gemini-code-assist Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread vllm/v1/worker/gpu/sample/prompt_logprob.py

mergify Bot added the v1 label Apr 15, 2026

yewentao256 mentioned this pull request Apr 15, 2026

[Model Runner v2] Oracle for model runner v2 - qwen3 dense model by default [1/N] #39337

Merged

njhill reviewed Apr 20, 2026

View reviewed changes

yewentao256 and others added 3 commits April 20, 2026 21:36

Merge branch 'main' into wentao-prompt-logprobs-support

680471a

Update vllm/v1/worker/gpu/sample/prompt_logprob.py

294e15b

Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

update

229d6c9

Signed-off-by: yewentao256 <zhyanwentao@126.com>

vllm-project deleted a comment from mergify Bot Apr 20, 2026

njhill approved these changes Apr 20, 2026

View reviewed changes

Comment thread vllm/v1/worker/gpu/sample/prompt_logprob.py Outdated

Update vllm/v1/worker/gpu/sample/prompt_logprob.py

efe54e4

Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

yewentao256 enabled auto-merge (squash) April 20, 2026 23:38

fix precommit

db91a9b

Signed-off-by: yewentao256 <zhyanwentao@126.com>

vllm-project deleted a comment from mergify Bot Apr 20, 2026

yewentao256 merged commit 66cc3fa into main Apr 21, 2026
61 checks passed

yewentao256 deleted the wentao-prompt-logprobs-support branch April 21, 2026 15:49

yewentao256 mentioned this pull request Apr 29, 2026

[Feature]: Migration from Model Runner v1 to Model Runner v2 #41286

Open

30 tasks

njhill added the v2 label May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model Runner V2] Multiple prompt logprobs support#39937

[Model Runner V2] Multiple prompt logprobs support#39937
yewentao256 merged 6 commits into
mainfrom
wentao-prompt-logprobs-support

yewentao256 commented Apr 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Uh oh!

njhill Apr 20, 2026

Uh oh!

yewentao256 Apr 20, 2026

Uh oh!

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -17,13 +17,18 @@ def __init__(self, max_num_reqs: int):
		self.max_num_reqs = max_num_reqs

		self.uses_prompt_logprobs = np.zeros(self.max_num_reqs, dtype=bool)

Uh oh!

Conversation

yewentao256 commented Apr 15, 2026

Purpose

Test

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

njhill Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

yewentao256 Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants