Now that we have tight Hub integration coming via #113, it could be useful to implement a simple benchmarking suite that allows users to:
- Select a dataset on the Hub
- Select a metric on the Hub
- Select N models (could already be optimised models)
- Optimise the models (if needed)
- Report a table of results comparing the gains in latency and impact on the model metric
In a first step, we might simply want to benchmark latency with some dummy input at various sequence lengths etc.