Skip to content

Benchmarked ISL and OSL averages 0.9*target_length meaning results are over-optimistic #356

@asb

Description

@asb

One of the arguments that is always passed to the underlying bench_serving script is --random-range-ratio, passed in the wrapper in benchmark_lib.sh. This is set by default to 0.8 in benchmark-tmpl.yml and benchmark-multinode-tmpl.yml and is not overridden elsewhere. This argument is ultimately used in sample_random_requests to sample input and output lengths between range_ratio * {input,output}_len and {input,output}_len.

The result of this is that the average lengths will be ~90% of the advertised figure. e.g. a workload advertised as having 8k input or output tokens (8192) will actually average ~7373 tokens. The throughput figures are calculated using the actual input and output sequence lengths so the throughput figures do match what was observed. But the overall consequence is:

  • The reported end to end latency is misleading, as it represents the latency for shorter sequence lengths than advertised.
  • As cost of serving a query doesn't necessarily scale with O(n), the tokens/second and throughput figures are also going to be slightly better than if the input sequence length and output sequence length had averaged the reported number of tokens for the workload.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions