Skip to content

[feature suggestion for vllm/vllm benchmark_serving] #357

@asb

Description

@asb

The various request functions in backend_request.func.py will set output.success = False if they don't get a HTTP 200 status code back for a request. There is no logic to retry a refused request and metrics will be calculated skipping any failed requests. This means an overloaded server will perform better on this benchmark for metrics like E2E latency and TTFT if it refuses requests rather than accept them and serve them slowly. As the number of failed requests isn't included in the results json it's not easy to tell if this is a factor for any benchmarks.

If one setup refuses requests under load and another accepts them there doesn't seem to be a fair way to directly compare these metrics. But hopefully this isn't happening. Adding the failure rate to the results output would mean this can be checked and investigated if it does happen.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions