Benchmarked ISL and OSL averages 0.9*target_length meaning results are over-optimistic

One of the arguments that is always passed to the underlying bench_serving script is `--random-range-ratio`, passed in the [wrapper in `benchmark_lib.sh`](https://github.com/InferenceMAX/InferenceMAX/blob/84320a0aadacae1114265b553830f48b56231817/benchmarks/benchmark_lib.sh#L107). This is set by default to 0.8 in [benchmark-tmpl.yml](https://github.com/InferenceMAX/InferenceMAX/blob/84320a0aadacae1114265b553830f48b56231817/.github/workflows/benchmark-tmpl.yml#L56) and [benchmark-multinode-tmpl.yml](https://github.com/InferenceMAX/InferenceMAX/blob/84320a0aadacae1114265b553830f48b56231817/.github/workflows/benchmark-multinode-tmpl.yml#L49) and is not overridden elsewhere. This argument is ultimately used in [sample_random_requests](https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L366) to sample input and output lengths between `range_ratio * {input,output}_len` and `{input,output}_len`.

The result of this is that the average lengths will be ~90% of the advertised figure. e.g. a workload advertised as having 8k input or output tokens (8192) will actually average ~7373 tokens. The throughput figures are [calculated using the actual input and output sequence lengths](https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L498) so the throughput figures do match what was observed. But the overall consequence is:
* The reported end to end latency is misleading, as it represents the latency for shorter sequence lengths than advertised.
* As cost of serving a query doesn't necessarily scale with O(n), the tokens/second and throughput figures are also going to be slightly better than if the input sequence length and output sequence length had averaged the reported number of tokens for the workload.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmarked ISL and OSL averages 0.9*target_length meaning results are over-optimistic #356

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmarked ISL and OSL averages 0.9*target_length meaning results are over-optimistic #356

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions