Skip to content

crsegerie/automl

 
 

Repository files navigation

pre-commit Code style: black Code style: flake8

Kili AutoML

AutoML is a lightweight library to create ML models in a data-centric AI way:

  1. Label on Kili
  2. Train a model with AutoML and evaluate its performance in one line of code
  3. Push predictions to Kili to accelerate the labeling in one line of code
  4. Prioritize labeling on Kili to label the data that will improve your model the most first

Iterate.

Once you are satisfied with the performance, in one line of code, serve the model and monitor the performance keeping a human in the loop with Kili.

Installation

git clone https://github.com/kili-technology/automl.git
cd automl
git submodule update --init

then

pip install -r requirements.txt -r utils/ultralytics/yolov5/requirements.txt

Usage

We made AutoML very simple to use. The main methods are:

Train a model

python train.py \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID

Retrieve the annotated data from the project and specialize the best model among the following ones on each task:

  • Hugging Face (NER, Text Classification)
  • YOLOv5 (Object Detection)
  • spaCy (coming soon)
  • Simple Transformers (coming soon)
  • Catalyst (coming soon)
  • XGBoost & LightGBM (coming soon)

Compute model loss to infer when you can stop labeling.

Train a model

Push predictions to Kili

python predict.py \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID

Use trained models to push pre-annotations onto unlabeled assets. Typically speeds up labeling by 10% with each iteration.

Predict a model

Prioritize labeling on Kili

Where is the model confident or confused today?

python prioritize.py \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID
    --sampling uncertainty
    --method least-confidence-sampling

How can we sample the optimal unlabeled data points for human review?

python prioritize.py \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID
    --sampling diversity
    --method model-based-outlier

Label errors on Kili

Note: for image classfication projects only.

python label_errors.py \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID

Serve a model (coming soon)

python serve.py \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID

Serve trained models while pushing assets and predictions to Kili for continuous labeling. Allows monitoring the model drift.

Serve a model

Disclaimer

AutoML is a utility library that trains and serves models. It is your responsibility to determine whether the model performance is high enough or not.

Don't hesitate to contribute!

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%