Exception in ModelRpcClient:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 184, in exposed_step
self.forward_step()
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 211, in forward_step
self.forward_decode_batch(self.running_batch)
File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 505, in forward_decode_batch
next_token_ids, _ = batch.sample(logits)
^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/infer_batch.py", line 476, in sample
sampled_index = torch.multinomial(probs_sort, num_samples=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
I am unsure if this is compat issue due to sglang or flashinfer 0.0.3.
Using flashinfer 0.0.3 requires one line change #282 but there is a compat issue where same model runs fine on 0.0.2 but under 0.0.3 throws an infinite loop of the following on sglang:
I am unsure if this is compat issue due to sglang or flashinfer 0.0.3.
@merrymercy @yzh119