[Model Runner V2] Fix `openai.InternalServerError: Error code: 500 - 'list index out of range'` by yewentao256 · Pull Request #45467 · vllm-project/vllm

yewentao256 · 2026-06-12T22:36:31Z

Purpose

VLLM_USE_V2_MODEL_RUNNER=1 pytest tests/entrypoints/openai/chat_completion/test_chat_echo.py::test_top_logprobs -xvs

Will raise error

>       completion = await client.chat.completions.create(
            model=MODEL_NAME,
            messages=messages,
            max_tokens=1,
            extra_body={
                "top_logprobs": -1,
                "logprobs": "true",
            },
        )

tests/entrypoints/openai/chat_completion/test_chat_echo.py:117: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../.venv/lib/python3.12/site-packages/openai/resources/chat/completions/completions.py:2814: in create
    return await self._post(
../.venv/lib/python3.12/site-packages/openai/_base_client.py:1931: in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <openai.AsyncOpenAI object at 0x7f9947390fb0>
cast_to = <class 'openai.types.chat.chat_completion.ChatCompletion'>
options = FinalRequestOptions(method='post', url='/chat/completions', params={}, headers=NOT_GIVEN, max_retries=NOT_GIVEN, timeo...ountry?'}], 'model': 'Qwen/Qwen2-1.5B-Instruct', 'max_tokens': 1}, extra_json={'top_logprobs': -1, 'logprobs': 'true'})

    async def request(
        self,
        cast_to: Type[ResponseT],
        options: FinalRequestOptions,
        *,
        stream: bool = False,
        stream_cls: type[_AsyncStreamT] | None = None,
    ) -> ResponseT | _AsyncStreamT:
        if self._platform is None:
            # `get_platform` can make blocking IO calls so we
            # execute it earlier while we are in an async context
            self._platform = await asyncify(get_platform)()
    
        cast_to = self._maybe_override_cast_to(cast_to, options)
    
        # create a copy of the options we were given so that if the
        # options are mutated later & we then retry, the retries are
        # given the original options
        input_options = model_copy(options)
        if input_options.idempotency_key is None and input_options.method.lower() != "get":
            # ensure the idempotency key is reused between requests
            input_options.idempotency_key = self._idempotency_key()
    
        response: httpx.Response | None = None
        max_retries = input_options.get_max_retries(self.max_retries)
    
        retries_taken = 0
        for retries_taken in range(max_retries + 1):
            options = model_copy(input_options)
            options = await self._prepare_options(options)
    
            remaining_retries = max_retries - retries_taken
            request = self._build_request(options, retries_taken=retries_taken)
            await self._prepare_request(request)
    
            kwargs: HttpxSendArgs = {}
            if self.custom_auth is not None:
                kwargs["auth"] = self.custom_auth
    
            if options.follow_redirects is not None:
                kwargs["follow_redirects"] = options.follow_redirects
    
            log.debug("Sending HTTP Request: %s %s", request.method, request.url)
    
            response = None
            try:
                response = await self._send_request(
                    request,
                    stream=stream or self._should_stream_response_body(request=request),
                    **kwargs,
                )
            except httpx.TimeoutException as err:
                log.debug("Encountered httpx.TimeoutException", exc_info=True)
    
                if remaining_retries > 0:
                    await self._sleep_for_retry(
                        retries_taken=retries_taken,
                        max_retries=max_retries,
                        options=input_options,
                        response=None,
                    )
                    continue
    
                log.debug("Raising timeout error")
                raise APITimeoutError(request=request) from err
            except OpenAIError as err:
                # Propagate OpenAIErrors as-is, without retrying or wrapping in APIConnectionError
                raise err
            except Exception as err:
                log.debug("Encountered Exception", exc_info=True)
    
                if remaining_retries > 0:
                    await self._sleep_for_retry(
                        retries_taken=retries_taken,
                        max_retries=max_retries,
                        options=input_options,
                        response=None,
                    )
                    continue
    
                log.debug("Raising connection error")
                raise APIConnectionError(request=request) from err
    
            log.debug(
                'HTTP Response: %s %s "%i %s" %s',
                request.method,
                request.url,
                response.status_code,
                response.reason_phrase,
                response.headers,
            )
            log.debug("request_id: %s", response.headers.get("x-request-id"))
    
            try:
                response.raise_for_status()
            except httpx.HTTPStatusError as err:  # thrown on 4xx and 5xx status code
                log.debug("Encountered httpx.HTTPStatusError", exc_info=True)
    
                if remaining_retries > 0 and self._should_retry(err.response):
                    await err.response.aclose()
                    await self._sleep_for_retry(
                        retries_taken=retries_taken,
                        max_retries=max_retries,
                        options=input_options,
                        response=response,
                    )
                    continue
    
                # If the response is streamed then we need to explicitly read the response
                # to completion before attempting to access the response text.
                if not err.response.is_closed:
                    await err.response.aread()
    
                log.debug("Re-raising status error")
>               raise self._make_status_error_from_response(err.response) from None
E               openai.InternalServerError: Error code: 500 - {'error': {'message': 'list index out of range', 'type': 'InternalServerError', 'param': None, 'code': 500}}

../.venv/lib/python3.12/site-packages/openai/_base_client.py:1716: InternalServerError

This is becuase for -1 for logprobs meaning all instead of None, this PR fixed the bug

Now

=============================== 1 passed, 16 warnings in 38.45s ===============================

Signed-off-by: yewentao256 <zhyanwentao@126.com>

…'list index out of range'` (vllm-project#45467) Signed-off-by: yewentao256 <zhyanwentao@126.com>

fix num_logprobs==-1

61beacd

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested review from WoosukKwon and njhill as code owners June 12, 2026 22:36

yewentao256 changed the title ~~[Model Runner V2] Fix num_logprobs==-1~~ [Model Runner V2] Fix openai.InternalServerError: Error code: 500 - 'list index out of range' Jun 12, 2026

mergify Bot added the v1 label Jun 12, 2026

yewentao256 mentioned this pull request Jun 12, 2026

[Feature]: Migration from Model Runner v1 to Model Runner v2 #41286

Open

30 tasks

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 12, 2026

njhill approved these changes Jun 13, 2026

View reviewed changes

njhill added the v2 label Jun 13, 2026

mgoin approved these changes Jun 13, 2026

View reviewed changes

vllm-bot merged commit 2ecf7d0 into main Jun 13, 2026
82 of 84 checks passed

vllm-bot deleted the wentao-fix-num_logprobs-minus1 branch June 13, 2026 08:44

Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026

[Model Runner V2] Fix `openai.InternalServerError: Error code: 500 - …

8d7f3fa

…'list index out of range'` (vllm-project#45467) Signed-off-by: yewentao256 <zhyanwentao@126.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model Runner V2] Fix `openai.InternalServerError: Error code: 500 - 'list index out of range'`#45467

[Model Runner V2] Fix `openai.InternalServerError: Error code: 500 - 'list index out of range'`#45467
vllm-bot merged 1 commit into
mainfrom
wentao-fix-num_logprobs-minus1

yewentao256 commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

yewentao256 commented Jun 12, 2026

Purpose

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants