We need more tests for this (it's a 5090) - Understand what is being tested right now - What do we need to add? Ideas (E2E) (small models) - Attention (MLA, MHA, GDN models), MoE, quantization (FP8, NVFP4) - https://huggingface.co/nvidia/Llama-3.1-8B-Instruct-NVFP4
We need more tests for this (it's a 5090)
Ideas (E2E) (small models)