Description
The DecisionTreeClassifier's User Guide says that
scikit-learn uses an optimised version of the CART algorithm
but official CART algorithm distinguishes between categorical and continuous variables. This is explained in the original Breiman's definition of CART Classification and Regression Trees
and also in Wikipedia.
According to the documentation, the training input samples are converted to np.float32
The training input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csc_matrix.
So it is clearly not supporting categorical variables.
I don't know if this is a problem of the algorithm implementation or if it is the intended behaviour and so the documentation should be changed.
Steps/Code to Reproduce
Expected Results
Either not saying that
scikit-learn uses an optimised version of the CART algorithm
or distinguish between categorical and numerical variables.
Actual Results
Documentation states the CART is used but that is not true
Versions
pip: 9.0.1
setuptools: 40.4.3
sklearn: 0.20.0
numpy: 1.15.2
scipy: 1.1.0
Cython: None
pandas: 0.23.4
Description
The DecisionTreeClassifier's User Guide says that
but official CART algorithm distinguishes between categorical and continuous variables. This is explained in the original Breiman's definition of CART Classification and Regression Trees
and also in Wikipedia.
According to the documentation, the training input samples are converted to
np.float32So it is clearly not supporting categorical variables.
I don't know if this is a problem of the algorithm implementation or if it is the intended behaviour and so the documentation should be changed.
Steps/Code to Reproduce
Expected Results
Either not saying that
or distinguish between categorical and numerical variables.
Actual Results
Documentation states the CART is used but that is not true
Versions
pip: 9.0.1
setuptools: 40.4.3
sklearn: 0.20.0
numpy: 1.15.2
scipy: 1.1.0
Cython: None
pandas: 0.23.4