-
Notifications
You must be signed in to change notification settings - Fork 70
Open
Description
process_result.py calculates interactivity metrics from the time per output token (tpot) metrics:
for key, value in bmk_result.items():
if key.endswith('ms'):
data[key.replace('_ms', '')] = float(value) / 1000.0
if 'tpot' in key:
data[key.replace('_ms', '').replace('tpot', 'intvty')] = 1000.0 / float(value)This is incorrect for the standard deviation - when the standard deviation is low, we'll produce a huge standard deviation of interactivity. Here's an example from a recent run:
{
"hw": "gb200",
"tp": 36,
"ep": 1,
"dp_attention": "false",
"conc": 512,
"model": "/mnt/lustre01/models/deepseek-r1-0528-fp4-v2",
"framework": "dynamo-trtllm",
"precision": "fp4",
"isl": 1024,
"osl": 1024,
"tput_per_gpu": 2423.747715243624,
"output_tput_per_gpu": 1363.5939611144877,
"input_tput_per_gpu": 10904.977748276717,
"disagg": true,
"num_prefill_gpu": 4,
"num_decode_gpu": 32,
"mtp": "on",
"mean_ttft": 1.2919637345420085,
"median_ttft": 0.4378697440261021,
"std_ttft": 2.1574805073166776,
"p99_ttft": 9.712858405703447,
"mean_tpot": 0.009590339660491956,
"mean_intvty": 104.27159364538116,
"median_tpot": 0.009667698638864516,
"median_intvty": 103.43723334320352,
"std_tpot": 0.0009682636562182605,
"std_intvty": 1032.7765516942895,
"p99_tpot": 0.011633036075231341,
"p99_intvty": 85.96208191335067,
"mean_itl": 0.4402040011098412,
"median_itl": 0.4408050079946406,
"std_itl": 0.0884514747228438,
"p99_itl": 0.8289496807963588,
"mean_e2el": 10.116033398757338,
"median_e2el": 9.590883667988237,
"std_e2el": 2.3359472701507817,
"p99_e2el": 18.67461088597775
},
Unless you're able to get the figure from the benchmark harness, the simplest fix is probably to avoid calculating and emitting that field altogether.
Metadata
Metadata
Assignees
Labels
No labels