[CI] Add more tests to `stage-b-test-small-1-gpu` (SM120)

We need more tests for this (it's a 5090)

- Understand what is being tested right now
- What do we need to add?

Ideas (E2E) (small models)
- Attention (MLA, MHA, GDN models), MoE, quantization (FP8, NVFP4)
- https://huggingface.co/nvidia/Llama-3.1-8B-Instruct-NVFP4