[Discussion][Machine Learning] Support AI task and the open source project about MLops

### Search before asking

- [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement.


### Description

I have seen a Machine Learning Platform post on Medium. The post talk about Lizhi Machine Learning Platform&Apache DolphinScheduler.
https://medium.com/@DolphinScheduler/a-formidable-combination-of-lizhi-machine-learning-platform-dolphinscheduler-creates-new-paradigm-e445938f1af

Like this, I try to do something like this. MLflow, sklearn, LightGBM, Xgboost, and DolphinScheduler are used.
Figure 1 shows the training workflow startup screen
<img width="496" alt="image" src="https://user-images.githubusercontent.com/31528124/164979694-b55493e8-c8e7-4fda-9489-5d23b929cf05.png">
In this workflow, I implemented four algorithms (SVM, LR, LGBM, XGboost) using the API of Sklearn, Lightgbm, and Xgboost.
Every algorithm's parameters can fill in the value of key "params". In this case, the parameters of LGBM is "n_estimators=200;num_leaves=20".

The experiment tracking module is supported by MLFlow.The picture below shows the report of the experiment.
![image](https://user-images.githubusercontent.com/31528124/164980165-41b03a61-e830-4bfd-9138-2a4335a67f7d.png)
I register the model every time I run it.
![image](https://user-images.githubusercontent.com/31528124/164980244-74ed3f86-d9d2-4f82-9263-daa4ab5ddc3a.png)


When the model is trained, run the deployment workflow. Like this:

<img width="499" alt="image" src="https://user-images.githubusercontent.com/31528124/164981446-4ed0ec0c-6c7b-4ffa-9db4-c839560740fa.png">

We can deploy the version 2 model to the k8s cluster.

And then we can see the deployment and pods
![image](https://user-images.githubusercontent.com/31528124/164980439-bb6634fd-e777-4aa2-927e-948c83cb6006.png)

At the same time, we can access the service through the interface.
![image](https://user-images.githubusercontent.com/31528124/164980478-b84c088a-1980-4fe1-8a29-2a728b587599.png)


BTW, we can also connect the training workflow with the deployment workflow as a sub-workflow, like this.

<img width="613" alt="image" src="https://user-images.githubusercontent.com/31528124/164980627-63517bcd-aca0-4e74-b16e-0113381ef279.png">



The training workflow contains one task. The code is as follows

```
data_path=${data_path}
export MLFLOW_TRACKING_URI=${MLFLOW_TRACKING_URI}
echo $data_path
repo=https://github.com/jieguangzhou/mlflow_sklearn_gallery.git
mlflow run $repo -P algorithm=${algorithm} -P data_path=$data_path -P params="${params}" -P param_file=${param_file} -P model_name=${model_name} --experiment-name=${experiment_name}

echo "training finish"
```

The deployment workflow contains two task.
<img width="639" alt="image" src="https://user-images.githubusercontent.com/31528124/164980832-fd6a524e-daeb-4e93-a62d-3fcbcfe3a74b.png">

The code of the "build docker" workflow is as follows

```
eval $(minikub -p minikube docker-env)
export MLFLOW_TRACKING_URI=${MLFLOW_TRACKING_URI}
image_name=mlflow/${model_name}:${version}
echo $image_name
mlflow models build-docker -m "models:/${model_name}/${version}" -n $image_name --enable-mlserver
```

The code of the "create deployment" workflow which deploys the model to the k8s cluster is as follows

```
version_lower=$(echo "${version}" | tr '[:upper:]' '[:lower:]')
kubectl apply -f - << END
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow-${model_name}-$version_lower
spec:
  selector:
    matchLabels:
      app: mlflow
  replicas: 3 # tells deployment to run 2 pods matching the template
  template:
    metadata:
      labels:
        app: mlflow
    spec:
      containers:
      - name: mlflow-iris
        image: mlflow/${model_name}:${version}
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080

---
apiVersion: v1
kind: Service
metadata:
  name: mlflow-${model_name}-$version_lower
spec:
  ports:
  - port: 8080
    targetPort: 8080
  selector:
    app: mlflow
END

sleep 5s

kubectl port-forward deployment/mlflow-${model_name}-$version_lower ${deployment_port}:8080
```


The above workflow is based on the Shell task. But it is too complex to ml engineer. I hope to write new types of tasks that make them easier for users to use.

Future work:
- [ ] Add new types of tasks, such as data management, feature engine, and monitoring
- [ ] Support more open source projects, such as kubeflow, ClearML, Bentoml, Seldon core, etc.
- [ ] Support AI cloud providers like SageMaker and Vertex AI.


### Use case

_No response_

### Related issues

_No response_

### Are you willing to submit a PR?

- [X] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion][Machine Learning] Support AI task and the open source project about MLops #9725

Search before asking

Description

Use case

Related issues

Are you willing to submit a PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Discussion][Machine Learning] Support AI task and the open source project about MLops #9725

Description

Search before asking

Description

Use case

Related issues

Are you willing to submit a PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions