Skip to content

Decision/Extra trees & missing values: split ignored when NaNs + single unique value in feature #32272

@cakedev0

Description

@cakedev0

Describe the bug

When feature values consist of only NaNs and another value, the feature is considered as constant by this if:

        if (
            # All values for this feature are missing, or
            end_non_missing == start or
            # This feature is considered constant (max - min <= FEATURE_THRESHOLD)
            feature_values[end_non_missing - 1] <= feature_values[start] + FEATURE_THRESHOLD
        ):

and not considered for the split. This if considers [0 0 0 NaN NaN] as constant (which is false, you can make a split between NaNs and non-NaNs and such a split might make a lot of sense in some datasets).

Same problem in the node_split_random for Extra trees.

Steps/Code to Reproduce

import numpy as np
from sklearn.tree import DecisionTreeClassifier

X = [0, 0, 0, np.nan, np.nan]
y = [0, 0, 0, 1     , 1     ]
X = np.array(X).reshape(-1, 1)
tree = DecisionTreeClassifier().fit(X, y)
print(tree.predict(X))  
# prints: [0 0 0 0 0]
print(tree.tree_.node_count)
# prints: 1

Expected Results

A split is made, the tree has 3 nodes (2 leaves) and predicts [0 0 0 1 1].

Actual Results

No split was made, the tree is a single node.

Versions

System:
    python: 3.12.11 (main, Aug 18 2025, 19:19:11) [Clang 20.1.4 ]
executable: /home/arthur/dev-perso/scikit-learn/sklearn-env/bin/python
   machine: Linux-6.14.0-29-generic-x86_64-with-glibc2.39

Python dependencies:
      sklearn: 1.8.dev0
          pip: None
   setuptools: 80.9.0
        numpy: 2.3.3
        scipy: 1.16.2
       Cython: 3.1.4
       pandas: 2.3.2
   matplotlib: 3.10.6
       joblib: 1.5.2
threadpoolctl: 3.6.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libscipy_openblas
       filepath: /home/arthur/dev-perso/scikit-learn/sklearn-env/lib/python3.12/site-packages/numpy.libs/libscipy_openblas64_-8fb3d286.so
        version: 0.3.30
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libscipy_openblas
       filepath: /home/arthur/dev-perso/scikit-learn/sklearn-env/lib/python3.12/site-packages/scipy.libs/libscipy_openblas-b75cc656.so
        version: 0.3.29.dev
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 16
         prefix: libgomp
       filepath: /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0
        version: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions