Use HyperGBM with Command Line
HyperGBM offers command line tool hypergbm to perform model training, evaluation and prediction. The following code enables the user to view command line help:
hypergm -h
usage: hypergbm [-h] [--log-level LOG_LEVEL] [-error] [-warn] [-info] [-debug]
[--verbose VERBOSE] [-v] [--enable-dask ENABLE_DASK] [-dask]
[--overload OVERLOAD]
{train,evaluate,predict} ...
hypergbm offers three commands: train, evaluate and predict. To get more information, one can use hypergbm <command> -h:
hypergbm train -h
usage: hypergbm train [-h] --train-data TRAIN_DATA [--eval-data EVAL_DATA]
[--test-data TEST_DATA]
[--train-test-split-strategy {None,adversarial_validation}]
[--target TARGET]
[--task {binary,multiclass,regression}]
[--max-trials MAX_TRIALS] [--reward-metric METRIC]
[--cv CV] [-cv] [-cv-] [--cv-num-folds NUM_FOLDS]
[--pos-label POS_LABEL]
...
Prepare the Data
When training model with command line, the training data must be saved in a file of form of csv or parque. The returned model is in the form of pickle whoes file ends with .pkl.
For an example of training Bank Marketing data, one can prepare the data as follows:
from hypernets.tabular.datasets import dsutils
from sklearn.model_selection import train_test_split
df = dsutils.load_bank().head(10000)
df_train, df_test = train_test_split(df, test_size=0.3, random_state=9527)
df_train.to_csv('bank_train.csv', index=None)
df_test.to_csv('bank_eval.csv', index=None)
df_test.pop('y')
df_test.to_csv('bank_to_pred.csv', index=None)
where
bank_train.csv is used for training
bank_eval.csv is used for evaluating the model
bank_to_pred.csv is data without targets for predicting
Train the Model
After preparing the data, one can also perform model training with command line:
hypergbm train --train-data bank_train.csv --target y --model-file model.pkl
one will see model.pkl after this process
ls -l model.pkl
rw-rw-r-- 1 xx xx 9154959 17:09 model.pkl
Evaluate the Model
The trained model can be evaluated with the evaluation data:
hypergbm evaluate --model model.pkl --data bank_eval.csv --metric f1 recall auc
{'f1': 0.7993779160186626, 'recall': 0.7099447513812155, 'auc': 0.9705420982746849}
Predict the Test Data
The trained model can be used for predicting a given data as follows:
hypergbm predict --model model.pkl --data bank_to_pred.csv --output bank_output.csv
where the predicting result will be saved to bank_output.csv.
To add other columns of your predicted data to the above file, one can use the parameter --with-data explicitly:
hypergbm predict --model model.pkl --data bank_to_pred.csv --output bank_output.csv --with-data id
head bank_output.csv
id,y
1563,no
124,no
218,no
463,no
...
Furthermore, including all columns of the test data besides the predicting results to the file bank_output.csv can be done by setting --with-data as “*”:
hypergbm predict --model model.pkl --data bank_to_pred.csv --output bank_output.csv --with-data '*'
head bank_output.csv
id,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
1563,55,entrepreneur,married,secondary,no,204,no,no,cellular,14,jul,455,13,-1,0,unknown,no
124,51,management,single,tertiary,yes,-55,yes,no,cellular,11,may,281,2,266,6,failure,no
218,49,blue-collar,married,primary,no,305,yes,yes,telephone,10,jul,834,10,-1,0,unknown,no
463,35,blue-collar,divorced,secondary,no,3102,yes,no,cellular,20,nov,138,1,-1,0,unknown,no
2058,50,management,divorced,tertiary,no,201,yes,no,cellular,24,jul,248,1,-1,0,unknown,no
...