Presented input/output token throughput per GPU for disaggregated setups not usefully comparable to standard multi-gpu

**Describe the bug**
The figure for input or output token throughput per GPU for disaggregated isn't _wrong_ as such, but I don't think it's the correct figure to plot alongside the same metric for standard multi-GPU setups. For disaggregated setups the reported input/output throughput appears to be the figure _per prefill GPU_ and _per decode GPU_ respectively, which is why the throughput per GPU isn't the sum of the output and input throughput figures. The graph tooltip doesn't provide the context of how many GPUs are dedicated to prefill vs decode, and I think the value that is appropriate for plotting alongside the non-disaggregated setups would be to calculate the per-GPU output/input throughput by summing the throughput of the respective workers and dividing by the _total_ GPUs. i.e. the output/input throughput per GPU average across the whole cluster rather than just averaged over the GPUs dedicated for prefill/decode. If I'm trying to compare and see which setup is better (and how much better), it's the per-GPU figure averaged across the whole GPU count that matters.

**To Reproduce**
Steps to reproduce the behavior:
1. Go to inferencemax.semianalysis.com
2. View results including a disaggregated config, e.g. R1 FP8 1k/1k
3. Switch the y-axis metric between the different token throughput metrics

**Expected behavior**
See issue introduction.

But specifically looking at e.g. the results for R1 fp8 gb200 nvl72 conc=4096 1k/1k results. We currently get:
* total per gpu
  * 3822.27
* output per gpu
  * 2867.344
* input per gpu
  * 5732.173

I believe the more comparable metric would be:
 * total per gpu
   * 3822.27
* output per gpu
  * 2867.344 * (48/72) = 1911.56
* input per gpu
  * 5732.173 * (24/72) = 1910.72

Now just like the non-disaggregated setups, total throughput per gpu is the sum of the output and input per GPU totals, and is representative of the average across the whole cluster. In the ideal case, the tooltip for the input/output per GPU figures for disaggregated datapoints would have a footnote that clarifies how it was averaged, such as:

```
Input token throughput per GPU*: 1911.56

*: Averaged across the whole 72-node cluster
```

For completeness, let's look at a datapoint from a non-disaggregated setup. Taking the mi325x conc=64 figures for example:
  * mi325x conc=64
    * total per gpu
      * 412.874 (= sum of output and input per GPU)
    * output per gpu
      * 206.483
    * input per gpu
      * 206.39


**Screenshots**

<img width="680" height="380" alt="Image" src="https://github.com/user-attachments/assets/8375b45d-4572-474e-8062-dbd9144cbb64" />

<img width="623" height="278" alt="Image" src="https://github.com/user-attachments/assets/b60f461f-394b-453d-96cc-3070ef15e6a2" />

<img width="453" height="386" alt="Image" src="https://github.com/user-attachments/assets/4359160f-e1b6-4f26-bdf6-0a9e747cc9c7" />

**Desktop (please complete the following information):**
 N/A


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Presented input/output token throughput per GPU for disaggregated setups not usefully comparable to standard multi-gpu #299

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Presented input/output token throughput per GPU for disaggregated setups not usefully comparable to standard multi-gpu #299

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions