probably with lm-eval
- create PoC to experiment with gsm8k h100
- write design doc on how to integrate eval quality clean-ly into the codebase keeping in mind we want to support multiple types of evals but for this issue, just intergrate gsm8k and how to support all GPU SKUs
- implement the eval quality workflows
@Oseltamivir to work on this, @cquil11 to help