Skip to content

Re-implement and refactor the verifier.#7724

Merged
ysiraichi merged 16 commits intomasterfrom
ysiraichi/refactor-verifier
Jul 24, 2024
Merged

Re-implement and refactor the verifier.#7724
ysiraichi merged 16 commits intomasterfrom
ysiraichi/refactor-verifier

Conversation

@ysiraichi
Copy link
Copy Markdown
Collaborator

This PR re-implements the existing verifier, making the following improvements:

  • Enabling the verification of more models on inference
    • Previously, it expected the models to return a single output tensor
  • Enabling the verification of training
    • There was a plumbing bug in TorchBenchModel.train
    • Running training for a few iterations
  • Model cleanup
    • Delete each used model so that we don't run out of memory on large models
  • Use PyTorch functions for checking whether the accuracy is acceptable

In order to do so, here's a summary of the changes:

  • Introduce BenchmarkModel methods: tolerance(), use_cosine_similarity, and skip_verifier()
    • The logic is taken from PyTorch, which also uses torchbench.yaml
  • Introduce force_dtype parameter when loading a model
    • So that we can run models on eager fp64
  • More meaningful verification codes
  • Move reset_rng_state and cleanup to util.py
  • Change how we access the YAML configuration file, replacing the raw strings

cc @miladm @zpcore

@ysiraichi ysiraichi requested review from miladm and zpcore July 23, 2024 00:27
@ysiraichi ysiraichi force-pushed the ysiraichi/refactor-verifier branch from 25942f7 to 97402f6 Compare July 23, 2024 00:30
Comment thread benchmarks/util.py
Comment thread benchmarks/util.py
@zpcore
Copy link
Copy Markdown
Member

zpcore commented Jul 23, 2024

Thanks for adding the verifier!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants