Token throughput per MW is described as reflecting the generated tokens but is actually processed+generated tokens

The only reference that provides clarity on the exact meaning of the reported results is the [linked article introducing InferenceMAX](https://newsletter.semianalysis.com/p/inferencemax-open-source-inference).

The article states "Throughput is the rate at which each GPU can generate tokens (tok/s/gpu)".

In line with this, under the [performance per MW section](https://newsletter.semianalysis.com/p/inferencemax-open-source-inference?open=false#%C2%A7performance-per-mw-results) it consistently refers to figures in terms of the GPU __generating__ that number of tokens (i.e. output):
* "...the MI355X is able to generate 2,550,000 token/s per all in provisioned MW"
* "...H100 can generate 900,000 token/s per MW while a B200 can generate 2.8M token/s per MW"
* "...GB200 NVL72 delivers an ~8x improvement in token/s generated per all-in provisioned MW"

These all indicate that the throughput results correspond to __token generation__ as opposed to the sum of token generation and prompt processing. But cross-referencing the json from a workflow run to the presented results, the figure being displayed appears to be the `tput_per_gpu` (i.e. combining input and output throughput). This conflicts with what is documented. Personally I think updating the graphs to show the metric that was documented (generated tokens) would be most helpful and intuitive, but fixing the documentation is of course an alternative fix.

In addition I think it would be worth having a glossary of terms and short description of each metric on the page in order to avoid confusion and to provide the information needed to interpret the graphs close to their presentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Token throughput per MW is described as reflecting the generated tokens but is actually processed+generated tokens #293

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Token throughput per MW is described as reflecting the generated tokens but is actually processed+generated tokens #293

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions