[MLPerf project] modularize MLPerf inference benchmark and automate submission

# Motivation

This project aims at decomposing MLPerf inference benchmarking into a database of reusable, portable, customizable and deterministic scripts with a unified CLI, common Python API and extensible JSON/YAML meta descriptions using the 2nd generation of the CK framework. 

The first goal is to simplify the development of this benchmark, make it easier to extend and run it across continuously changing ML tasks, models, data sets, engines, software and hardware, and automate all the manual steps of the submission process. 

The second goal is to enable automatic and continuous design space exploration of ML systems across all ML tasks, models, data set, engines, libraries and platforms based on MLPerf loadgen, and selection of Pareto-optimal configurations based on user constraints (latency, throughput, accuracy, energy, model size, memory usage, device cost, etc).

The third goal is to show researchers and engineers that it is possible to reuse portable ML scripts (to detect, download and install models, data sets, engines, libraries, tools) in their own research projects to avoid reinventing the wheel and use the solid MLPerf benchmarking methodology.

# Technology

This project is based on the [CK2 automation framework](https://github.com/mlcommons/ck/tree/master/cm) and on our practical experience [reproducing 150+ ML and Systems papers](https://www.youtube.com/watch?v=7zpeIVwICa4) and [automating MLPerf inference submissions](https://github.com/mlcommons/ck/tree/master/docs/mlperf-automation):

* CM framework (the 2nd generation of the CK framework aka CK2) is used to organize ML projects as a database of reusable and portable components (tasks, models, datasets, engines, libraries, hardware descriptions): [GitHub](https://github.com/mlcommons/ck/tree/master/cm), [motivation paper](https://arxiv.org/pdf/2011.01149.pdf).

* CM automation called "script" is used to wrap native scripts with a unified CLI, Python API and JSON/YAML meta descriptions with a unique ID, list of tags, dependency on other CM scripts and any other information required to make any ad-hoc script reusable, portable, customizable and deterministic: [Python automation code](https://github.com/mlcommons/ck/tree/master/cm-mlops/automation/script)

* CM scripts to automate detection, download, installation and pre/post-processing of all ML artifacts required to run any ML task on any platform natively or inside containers (models, data sets, engines, libraries, tools ...): [Github with current scripts (under community development) ](https://github.com/mlcommons/ck/tree/master/cm-mlops/script)

See CM tutorials to learn more about [reusable CM scripts](https://github.com/mlcommons/ck/blob/master/cm/docs/tutorial-scripts.md) and [CM database format for ML projects](https://github.com/mlcommons/ck/blob/master/cm/docs/tutorial-concept.md).

This is a part of our [CM (CK2) roadmap development for 2022](https://github.com/mlcommons/ck/issues/189)

# People

## Developers

* [Arjun Suresh](https://www.linkedin.com/in/arjunsuresh) (OctoML / MLCommons)
* [Thomas Zhu](https://github.com/hanwenzhu) (Oxford University / OctoML intern)
* [Grigori Fursin](https://fursin.net) (OctoML / cTuning foundation / MLCommons)

## Feedback
* David Kanter
* Peter Mattson
* Vijay Janapa Reddi
* Thierry Moreau (@tmoreau89)
* ***Please add yourself or get in touch if you would like to provide your feedback!***

# Tasks and timeline

## Q3 2022

- [x] Develop and stabilize CM core to treat R&D projects as a database of components and automations
- [x] Develop CM scripts to detect and/or install all ML artifacts (platform description, OS scripts, ML frameworks, models, data sets, libraries, pre-/post-processing scripts, benchmarks, etc): [List](https://github.com/mlcommons/ck/tree/master/cm-mlops/script)
- [ ] Replicate CK-based MLPerf inference v1.1 submission with out-of-the-box image classification with imagenet, ONNX and some cloud platform using CM scripts: [original study](https://github.com/mlcommons/ck/blob/master/docs/mlperf-automation/reproduce/ck-3e0ad4b09998375d.md)
  - [x] Prepare GCP n2-standard-80 platform that we used for previous submission
    - [x] Test out-of-the-box CK2(CM) workflow to run image classification ```cm run script --tags=app,image-classification,onnx,python --quiet```
    - [x] Organize SSH access to @arjunsuresh and @hanwenzhu .
  - [x] Convert outdated CK automation to prepare and submit MLPerf inference benchmark results to CM scripts: [GitHub with CK automations](https://github.com/mlcommons/ck-mlops/tree/main/module)
  - [ ] Prepare dummy for the reproducibility report similar to above
  - [ ] Describe how to install CK with minimal system deps
  - [ ] Add CM scripts to detect host and target platforms
  - [x] Add CM scripts to install system dependencies
  - [x] Add CM scripts to prepare Python virtual env
  - [x] Add CM scripts to activate Python virtual env
  - [x] Add CM scripts to get and build MLPerf inference src
  - [x] Add CM scripts to get and build MLPerf loadgen
  - [x] Add CM scripts to get MLPerf inference v2.1 submission repo
  - [x] Add CM scripts to detect/install ImageNet
  - [x] Add CM scripts to install RESNET50
  - [x] Add CM scripts to run Offline image classification scenario
    - [x] Accuracy
    - [x] Performance
  - [ ] Add CM scripts to describe submitter
  - [x] Add CM scripts to describe platform
  - [x] Add CM scripts to run full benchmark
  - [x] Add CM scripts to validate submission
  - [x] Add CM scripts to truncate results
  - [x] Add CM scripts to pack results
  - [x] Add CM scripts to run 3 other scenarios
- [ ] Replicate CK-based MLPerf inference v1.1 submission with out-of-the-box image classification with imagenet, TVM and some cloud platform using CM scripts: [original study](https://github.com/mlcommons/ck/blob/master/docs/mlperf-automation/reproduce/ck-94cc7bdd1f23cce3.md)
  - [x] Add/test CM script to build LLVM with dependencies
  - [x] Add/test CM script to build DNNL
  - [x] Add/test CM script to build TVM with required LLVM and DNNL
  - [x] Add/test TVM backend to MLPerf image classification
  - [ ] Reproduce results from MLPerf v1.1 and test with MLPerf v2.1
- [x] Prepare test submission to MLPerf inference v2.1 to evaluate the use of the CM (CK2) automation framework for the MLPerf inference benchmark
- [ ] Convert outdated CK-based MLPerf inference benchmark v1.1 automations (developed during Thomas Zhu's first internship last year) to the CM: [CK program templates -> CM scripts](https://github.com/mlcommons/ck-mlops/tree/main/program.template), [CK MLPerf program workflows (mlperf-inference-bench-*) into CM scripts](https://github.com/mlcommons/ck-mlops/tree/main/program)
- [ ] Prepare CK2(CM)-based tutorial to modularize and automate MLPerf: [dummy](https://github.com/mlcommons/ck/blob/master/cm/docs/tutorial-modular-mlperf.md)
- [ ] Remove outdated CK tutorials and CK2 (CM) based tutorials to MLPerf inference docs: [GitHub](https://github.com/mlcommons/inference/tree/master/tools)
- [x] Archive legacy CK in "ck1" directory and move "cm" to the root
- [ ] Prepare universal ML benchmarking with loadgen and CM scripts for different models, data sets, engines, platforms: [GitHub issue](https://github.com/mlcommons/ck/issues/265)
  
## Q4 2022 / Q1 2023
- [ ] Get the feedback from the MLCommons community and organize collaborative developments with interested members using [our MLPerf education workgroup](https://github.com/mlcommons/ck/blob/master/docs/mlperf-education-workgroup.md)
- [ ] Implement CM-based "experiment" automation (record/reproduce/compare/visualize)
- [ ] Implement CM-based [MLPerf Design Space Explorer (including NAS)](https://cknowledge.io/?q=%22mlperf-inference-all%22)
- [ ] Convert all MLPerf inference reference benchmarks to CM scripts
- [ ] Generate containers and MLCubes for CM-based MLPerf inference reference benchmarks



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLPerf project] modularize MLPerf inference benchmark and automate submission #261

Motivation

Technology

People

Developers

Feedback

Tasks and timeline

Q3 2022

Q4 2022 / Q1 2023

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[MLPerf project] modularize MLPerf inference benchmark and automate submission #261

Description

Motivation

Technology

People

Developers

Feedback

Tasks and timeline

Q3 2022

Q4 2022 / Q1 2023

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions