Compare optimized models vs. transformers models #194
Compare optimized models vs. transformers models #194mfuntowicz merged 31 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
|
@lhoestq jfyi, about the latency/throughput measurement, here's what I got: https://github.com/fxmarty/optimum/blob/a111cfee49afc9bed68e18865442f2454d2556c3/optimum/runs_base.py#L172-L242 . Borrowed from https://github.com/huggingface/tune |
bf9bd2e to
2a120e4
Compare
docs/source/benchmark.mdx
Outdated
|
|
||
| ## RunConfig | ||
|
|
||
| [[autodoc]] optimum.utils.runs.RunConfig |
There was a problem hiding this comment.
To generate these docs, you'll need to:
- add
pydanticto the required deps - update the
__init__.pyfile underutils
For the second point, this works (if you also exclude optimum.runs_base.Run):
from .runs import RunConfig, Calibration, DatasetArgs, TaskArgs
from .preprocessing.base import DatasetProcessingThere was a problem hiding this comment.
Actually the first point was enough, and the doc for optimum.runs_base.Run is well generated as well.
Just a doubt on whether adding an additional dependency is good, I think keeping install_requires to the bare minimum is best?
There was a problem hiding this comment.
Generally, we try to keep external dependencies to a minimum, so ideally we would drop pydantic as a requirement if possible.
If not, you could add an new extras dep for e.g. benchmarks that users can install with pip install optimum['benchmarks']
There was a problem hiding this comment.
Understood, for now I did your later suggestion as a matter of time, in a next PR I will replace pydantic by dataclasses altogether.
Feedback welcome, notably for the design, code quality, etc.
This PR aims at introducing an unified way to benchmark transformers vs. optimized models, backend-independent (in the sense of, any backend can be plugged for inference and evaluation), code-free (in the sense of, the user does not need to code to start runs and evaluate them).
The two main contributions is to introduce helper classes, methods for data preprocessing, inference, evaluation. In several files:
*
optimum/runs_base.py: general methods, this should be backend-agnostic.*
optimum/utils/preprocessing/: handle loading and preprocessing datasets, running inference with pipelines, running evaluation. This should be backend-agnostic.*
optimum/onnxruntime/runs/: OnnxRuntime specific methodsFor now, dataset preprocessing and evaluation are task-specific, the supported tasks are:
text-classificationtoken-classificationquestion-answeringAs for evaluation of transformers models, I believe there is some duplicate work with what exists in the AutoTrain backend and what is being done in https://github.com/huggingface/evaluate. However, my understanding being that it is not a priority to support Optimum-based inference within AutoTrain, it makes sense to me to have a common implementation to evaluate transformers/optimized models for them to be comparable. I hope we can make it such that we minimize duplicate efforts.
I used pipelines for inference for the general metrics, and
ORTModel.forward()to measure latency/throughput.Tasks before (or after) merge
Trainer.evaluate()instead of an explicit loop for evaluation --> I think it doesn't, there is a lot of abstraction in pipelines already, we should make use of it.train-eval-indexmetadata from datasets to auto-infer data, label columns (see e.g. Autoeval config datasets#4234) (left to next PR)optimum/optimum/onnxruntime/modeling_ort.py
Lines 323 to 327 in cf91bd7
This, with some additional work, should close #128 .