[2025-04-08 23:08:18 TP0] Prefill batch. #new-seq: 1, #new-token: 3296, #cached-token: 1, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-04-08 23:08:22 TP0] Decode batch. #running-req: 1, #token: 3330, token usage: 0.00, gen throughput (token/s): 1.28, #queue-req: 0,
[2025-04-08 23:08:46 TP0] Prefill batch. #new-seq: 1, #new-token: 6958, #cached-token: 3292, token usage: 0.00, #running-req: 0, #queue-req: 0,
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [39,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [40,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [41,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [42,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [43,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [44,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [45,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [46,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [47,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [48,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [49,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [50,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [51,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [52,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [53,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [54,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [55,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [20,0,0], thread: [56,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: operator(): block: [20: block: [20,0,0,0,0], thread: [10], thread: [57,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [58], thread: [11,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [59], thread: [12,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [60], thread: [13,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0: block: [20,0,0], thread: [14,0,0,0], thread: [61] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [20:93,0: operator(),0: block: [20], thread: [15,0,0,0,0], thread: [62] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0,0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(): block: [20,0,0,0,0], thread: [16], thread: [63,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0: block: [20,0,0,0], thread: [64], thread: [17,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [65], thread: [18,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [66], thread: [19,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [67], thread: [20,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [21], thread: [68,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0: block: [20,0,0], thread: [22,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0: block: [20,0,0], thread: [23,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0,0: block: [20], thread: [24,0,0,0,0], thread: [71] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [20:93,0: operator(),0: block: [20], thread: [25,0,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0: block: [20,0,0], thread: [26,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [20: operator(),0: block: [20,0,0], thread: [27,0,0], thread: [74,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [28], thread: [75,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [29], thread: [76,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [30], thread: [77,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [20,0,0,0,0], thread: [31], thread: [78,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [20,0,0], thread: [64,0,0], thread: [79,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0], thread: [65: block: [20,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [80` failed.
,0,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0: block: [20], thread: [66,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [81` failed.
,0,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0], thread: [67: block: [20,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [82` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
: block: [21../aten/src/ATen/native/cuda/IndexKernel.cu,0:93,0: operator()], thread: [68: block: [20,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [83` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
: block: [21../aten/src/ATen/native/cuda/IndexKernel.cu,0:93,0: operator()], thread: [69: block: [20,0,0,0,0], thread: [84] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21,0: operator(),0: block: [20], thread: [70,0,0,0,0], thread: [85] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0: block: [20], thread: [71,0,0,0,0], thread: [86] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0: block: [20], thread: [72,0,0,0,0], thread: [87] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0], thread: [73: block: [20,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [88` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
: block: [21../aten/src/ATen/native/cuda/IndexKernel.cu,0:93,0: operator()], thread: [74: block: [20,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [89` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
: block: [21../aten/src/ATen/native/cuda/IndexKernel.cu,0:93,0: operator()], thread: [75: block: [20,0,0,0,0], thread: [90] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0: block: [20], thread: [76,0,0,0,0], thread: [91] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [20,0,0], thread: [77,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): operator(): block: [21: block: [20,0,0,0,0], thread: [78,0], thread: [93,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [21: block: [20,0,0,0,0], thread: [94], thread: [79,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93:93: operator(): operator(): block: [20: block: [21,0,0,0,0], thread: [95], thread: [80,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [81,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [82,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(): block: [23,0,0,0,0], thread: [83], thread: [98,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: operator(): block: [21: block: [23,0,0,0,0], thread: [84], thread: [99,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [85,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [86,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [87,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [88,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu` failed.
:93../aten/src/ATen/native/cuda/IndexKernel.cu: operator():93: block: [21: operator(),0: block: [23,0,0], thread: [89,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"../aten/src/ATen/native/cuda/IndexKernel.cu:93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0: block: [23], thread: [90,0,0,0,0], thread: [105] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds":93` failed.
: operator()../aten/src/ATen/native/cuda/IndexKernel.cu: block: [21:93,0: operator(),0: block: [23], thread: [91,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [106` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
: block: [21../aten/src/ATen/native/cuda/IndexKernel.cu,0:93,0: operator()], thread: [92: block: [23,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds",0` failed.
], thread: [107,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
: block: [21../aten/src/ATen/native/cuda/IndexKernel.cu,0:93,0: operator()], thread: [93: block: [23,0,0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"], thread: [108` failed.
,0../aten/src/ATen/native/cuda/IndexKernel.cu,0:93] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds": operator()` failed.
...
Error log is too long so I just included the first few lines. Let me know if more info is needed!
This server can handle short prompts around 3k tokens without any problems, but once I send a request of 10k input tokens, it crashes immediately.
Checklist
Describe the bug
I built the latest SGLang docker image from head and used it to serve the
meta-llama/Llama-4-Scout-17B-16E-Instructmodel on an 8xH100 machine, sending requests to the/chat/completionsendpoint. It handles requests of 3k input tokens without any problems, but once I send a request of ~10k input tokens, it crashes with a CUDA assertion error. Once I remove the--attention-backend=fa3flag, the long requests can be served successfully.Error log is too long so I just included the first few lines. Let me know if more info is needed!
Reproduction
Build a docker from head and run the server:
This server can handle short prompts around 3k tokens without any problems, but once I send a request of 10k input tokens, it crashes immediately.
Environment