Skip to content

[Model Runner V2] Fix openai.InternalServerError: Error code: 500 - 'list index out of range'#45467

Merged
vllm-bot merged 1 commit into
mainfrom
wentao-fix-num_logprobs-minus1
Jun 13, 2026
Merged

[Model Runner V2] Fix openai.InternalServerError: Error code: 500 - 'list index out of range'#45467
vllm-bot merged 1 commit into
mainfrom
wentao-fix-num_logprobs-minus1

Conversation

@yewentao256

Copy link
Copy Markdown
Member

Purpose

VLLM_USE_V2_MODEL_RUNNER=1 pytest tests/entrypoints/openai/chat_completion/test_chat_echo.py::test_top_logprobs -xvs

Will raise error

>       completion = await client.chat.completions.create(
            model=MODEL_NAME,
            messages=messages,
            max_tokens=1,
            extra_body={
                "top_logprobs": -1,
                "logprobs": "true",
            },
        )

tests/entrypoints/openai/chat_completion/test_chat_echo.py:117: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../.venv/lib/python3.12/site-packages/openai/resources/chat/completions/completions.py:2814: in create
    return await self._post(
../.venv/lib/python3.12/site-packages/openai/_base_client.py:1931: in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <openai.AsyncOpenAI object at 0x7f9947390fb0>
cast_to = <class 'openai.types.chat.chat_completion.ChatCompletion'>
options = FinalRequestOptions(method='post', url='/chat/completions', params={}, headers=NOT_GIVEN, max_retries=NOT_GIVEN, timeo...ountry?'}], 'model': 'Qwen/Qwen2-1.5B-Instruct', 'max_tokens': 1}, extra_json={'top_logprobs': -1, 'logprobs': 'true'})

    async def request(
        self,
        cast_to: Type[ResponseT],
        options: FinalRequestOptions,
        *,
        stream: bool = False,
        stream_cls: type[_AsyncStreamT] | None = None,
    ) -> ResponseT | _AsyncStreamT:
        if self._platform is None:
            # `get_platform` can make blocking IO calls so we
            # execute it earlier while we are in an async context
            self._platform = await asyncify(get_platform)()
    
        cast_to = self._maybe_override_cast_to(cast_to, options)
    
        # create a copy of the options we were given so that if the
        # options are mutated later & we then retry, the retries are
        # given the original options
        input_options = model_copy(options)
        if input_options.idempotency_key is None and input_options.method.lower() != "get":
            # ensure the idempotency key is reused between requests
            input_options.idempotency_key = self._idempotency_key()
    
        response: httpx.Response | None = None
        max_retries = input_options.get_max_retries(self.max_retries)
    
        retries_taken = 0
        for retries_taken in range(max_retries + 1):
            options = model_copy(input_options)
            options = await self._prepare_options(options)
    
            remaining_retries = max_retries - retries_taken
            request = self._build_request(options, retries_taken=retries_taken)
            await self._prepare_request(request)
    
            kwargs: HttpxSendArgs = {}
            if self.custom_auth is not None:
                kwargs["auth"] = self.custom_auth
    
            if options.follow_redirects is not None:
                kwargs["follow_redirects"] = options.follow_redirects
    
            log.debug("Sending HTTP Request: %s %s", request.method, request.url)
    
            response = None
            try:
                response = await self._send_request(
                    request,
                    stream=stream or self._should_stream_response_body(request=request),
                    **kwargs,
                )
            except httpx.TimeoutException as err:
                log.debug("Encountered httpx.TimeoutException", exc_info=True)
    
                if remaining_retries > 0:
                    await self._sleep_for_retry(
                        retries_taken=retries_taken,
                        max_retries=max_retries,
                        options=input_options,
                        response=None,
                    )
                    continue
    
                log.debug("Raising timeout error")
                raise APITimeoutError(request=request) from err
            except OpenAIError as err:
                # Propagate OpenAIErrors as-is, without retrying or wrapping in APIConnectionError
                raise err
            except Exception as err:
                log.debug("Encountered Exception", exc_info=True)
    
                if remaining_retries > 0:
                    await self._sleep_for_retry(
                        retries_taken=retries_taken,
                        max_retries=max_retries,
                        options=input_options,
                        response=None,
                    )
                    continue
    
                log.debug("Raising connection error")
                raise APIConnectionError(request=request) from err
    
            log.debug(
                'HTTP Response: %s %s "%i %s" %s',
                request.method,
                request.url,
                response.status_code,
                response.reason_phrase,
                response.headers,
            )
            log.debug("request_id: %s", response.headers.get("x-request-id"))
    
            try:
                response.raise_for_status()
            except httpx.HTTPStatusError as err:  # thrown on 4xx and 5xx status code
                log.debug("Encountered httpx.HTTPStatusError", exc_info=True)
    
                if remaining_retries > 0 and self._should_retry(err.response):
                    await err.response.aclose()
                    await self._sleep_for_retry(
                        retries_taken=retries_taken,
                        max_retries=max_retries,
                        options=input_options,
                        response=response,
                    )
                    continue
    
                # If the response is streamed then we need to explicitly read the response
                # to completion before attempting to access the response text.
                if not err.response.is_closed:
                    await err.response.aread()
    
                log.debug("Re-raising status error")
>               raise self._make_status_error_from_response(err.response) from None
E               openai.InternalServerError: Error code: 500 - {'error': {'message': 'list index out of range', 'type': 'InternalServerError', 'param': None, 'code': 500}}

../.venv/lib/python3.12/site-packages/openai/_base_client.py:1716: InternalServerError

This is becuase for -1 for logprobs meaning all instead of None, this PR fixed the bug

Now

=============================== 1 passed, 16 warnings in 38.45s ===============================

Signed-off-by: yewentao256 <zhyanwentao@126.com>
@yewentao256 yewentao256 changed the title [Model Runner V2] Fix num_logprobs==-1 [Model Runner V2] Fix openai.InternalServerError: Error code: 500 - 'list index out of range' Jun 12, 2026
@mergify mergify Bot added the v1 label Jun 12, 2026
@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 12, 2026
@njhill njhill added the v2 label Jun 13, 2026
@vllm-bot vllm-bot merged commit 2ecf7d0 into main Jun 13, 2026
82 of 84 checks passed
@vllm-bot vllm-bot deleted the wentao-fix-num_logprobs-minus1 branch June 13, 2026 08:44
Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026
…'list index out of range'` (vllm-project#45467)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1 v2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants