-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
BUG: DecisionTreeRegressor: invalid impurity for criterion="poisson" with missing values #32870
Copy link
Copy link
Closed
Labels
Description
Describe the bug
When missing values are present in X, DecisionTreeRegressor(criterion="poisson", ...) sometimes computes invalid impurities.
Impurity should match with half-poisson deviance according to the documentation. And it is indeed the case when they are no missing values. But when missing values are present, something goes wrong, and you can even get negative impurities.
I found this bug thanks to my big test from this draft PR #32193. This same test shows that impurity does match sklearn.metrics.mean_poisson_deviance(...) / 2 when no missing values are present.
Steps/Code to Reproduce
from sklearn.tree import DecisionTreeRegressor, plot_tree
X = np.array([np.nan, 1, 2, 3, np.nan]).reshape(-1, 1)
y = [0.49, 0.5, 0.7, 1.5, 0.8]
tree = DecisionTreeRegressor(criterion='poisson', max_depth=1)
tree.fit(X, y)
plot_tree(tree);Expected Results
Impurities should be positive. Impurity of the right children should be 0, as it has only one sample.
Actual Results
Some impurities are negative:
Versions
System:
python: 3.12.11 (main, Aug 18 2025, 19:19:11) [Clang 20.1.4 ]
executable: /home/arthur/dev-perso/scikit-learn/sklearn-env/bin/python
machine: Linux-6.14.0-36-generic-x86_64-with-glibc2.39
Python dependencies:
sklearn: 1.9.dev0
pip: None
setuptools: 80.9.0
numpy: 2.3.5
scipy: 1.16.3
Cython: 3.2.1
pandas: 2.3.3
matplotlib: 3.10.7
joblib: 1.5.2
threadpoolctl: 3.6.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
num_threads: 16
prefix: libscipy_openblas
filepath: /home/arthur/dev-perso/scikit-learn/sklearn-env/lib/python3.12/site-packages/numpy.libs/libscipy_openblas64_-fdde5778.so
version: 0.3.30
threading_layer: pthreads
architecture: Haswell
user_api: blas
internal_api: openblas
num_threads: 16
prefix: libscipy_openblas
filepath: /home/arthur/dev-perso/scikit-learn/sklearn-env/lib/python3.12/site-packages/scipy.libs/libscipy_openblas-b75cc656.so
version: 0.3.29.dev
threading_layer: pthreads
architecture: Haswell
user_api: openmp
internal_api: openmp
num_threads: 16
prefix: libgomp
filepath: /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0
version: NoneReactions are currently unavailable