Skip to content

Should train_test_split warn or error out on single sample array? #11028

@vivekk0903

Description

@vivekk0903

Description

train_test_split splits the single sample data such that train part has 0 samples and test has that sample. Also this behaviour is not affected by setting the test_size to any value.

Steps/Code to Reproduce

import numpy as np
from sklearn.model_selection import train_test_split

data = np.random.normal(0, 1, [1, 100])
print(A.shape)
#Output:  (1, 100)

data_train, data_test = train_test_split(data)
print(data_train.shape, data_test.shape)
#Output:  ((0, 100), (1, 100))

Expected Results

I am not sure of expected results as this seems like an unintended usage. But still think that at-least a warning (if not error) should be given when splitting.

Versions

Linux-3.16.0-77-generic-x86_64-with-Ubuntu-14.04-trusty
('Python', '2.7.6 (default, Nov 23 2017, 15:49:48) \n[GCC 4.8.4]')
('NumPy', '1.14.2')
('SciPy', '1.0.1')
('Scikit-Learn', '0.19.1')

I am sorry if its a duplicate. I tried searching for similar issues but could not find (even though I thought that this would have been discussed somewhere).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions