Skip to content

add GSM8k eval quality benchmark CI #123

@functionstackx

Description

@functionstackx

probably with lm-eval

  1. create PoC to experiment with gsm8k h100
  2. write design doc on how to integrate eval quality clean-ly into the codebase keeping in mind we want to support multiple types of evals but for this issue, just intergrate gsm8k and how to support all GPU SKUs
  3. implement the eval quality workflows

@Oseltamivir to work on this, @cquil11 to help

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions