Skip to content

BUG: DecisionTreeRegressor: invalid impurity for criterion="poisson" with missing values #32870

@cakedev0

Description

@cakedev0

Describe the bug

When missing values are present in X, DecisionTreeRegressor(criterion="poisson", ...) sometimes computes invalid impurities.

Impurity should match with half-poisson deviance according to the documentation. And it is indeed the case when they are no missing values. But when missing values are present, something goes wrong, and you can even get negative impurities.

I found this bug thanks to my big test from this draft PR #32193. This same test shows that impurity does match sklearn.metrics.mean_poisson_deviance(...) / 2 when no missing values are present.

Steps/Code to Reproduce

from sklearn.tree import DecisionTreeRegressor, plot_tree

X = np.array([np.nan, 1, 2, 3, np.nan]).reshape(-1, 1)
y = [0.49, 0.5, 0.7, 1.5, 0.8]

tree = DecisionTreeRegressor(criterion='poisson', max_depth=1)
tree.fit(X, y)

plot_tree(tree);

Expected Results

Impurities should be positive. Impurity of the right children should be 0, as it has only one sample.

Actual Results

Some impurities are negative:

Image

Versions

System:
    python: 3.12.11 (main, Aug 18 2025, 19:19:11) [Clang 20.1.4 ]
executable: /home/arthur/dev-perso/scikit-learn/sklearn-env/bin/python
   machine: Linux-6.14.0-36-generic-x86_64-with-glibc2.39

Python dependencies:
      sklearn: 1.9.dev0
          pip: None
   setuptools: 80.9.0
        numpy: 2.3.5
        scipy: 1.16.3
       Cython: 3.2.1
       pandas: 2.3.3
   matplotlib: 3.10.7
       joblib: 1.5.2
threadpoolctl: 3.6.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libscipy_openblas
       filepath: /home/arthur/dev-perso/scikit-learn/sklearn-env/lib/python3.12/site-packages/numpy.libs/libscipy_openblas64_-fdde5778.so
        version: 0.3.30
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libscipy_openblas
       filepath: /home/arthur/dev-perso/scikit-learn/sklearn-env/lib/python3.12/site-packages/scipy.libs/libscipy_openblas-b75cc656.so
        version: 0.3.29.dev
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 16
         prefix: libgomp
       filepath: /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0
        version: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions