From CUDA Graph Crash to Tensor Core: Tracing a vLLM INT8 Bug
01. Intro While deploying an INT8-quantized LLM with vLLM on NVIDIA A100 GPU, I ran into this error during startup — pytorch/ao#2376: RuntimeError: self.size(0) needs to be greater than 16, but g...