Skip to content

Learner._check_input_formatting() does not work for dense featuresets #656

@desilinguist

Description

@desilinguist

This method is called by Learner._train_setup() and it checks that regression labels are not strings and that feature values (for both classification and regression) are not strings. However, this method does not work as expected if the featureset is read in as dense rather than sparse. Here's a minimal test case:

>>> from skll.data import NDJReader
>>> fs1 = NDJReader.for_path("examples/boston/train/example_boston_features.jsonlines", sparse=False).read()
>>> l1 = Learner('LinearRegression')
>>> fs2 = NDJReader.for_path("examples/iris/train/example_iris_features.jsonlines", sparse=False).read()
>>> l2 = Learner('LogisticRegression')
>>> l1.train(fs1, grid_search=False)
...
~/work/skll/skll/learner/__init__.py in _check_input_formatting(self, examples)
    664         # make sure that feature values are not strings
    665         # we need to check this for both sparse and dense arrays
--> 666         for val in examples.features.data:
    667             if isinstance(val, str):
    668                 raise TypeError("You have feature values that are strings.  "

NotImplementedError: multi-dimensional sub-views are not implemented

>>> l2.train(fs2, grid_search=False)
....
~/work/skll/skll/learner/__init__.py in _check_input_formatting(self, examples)
    664         # make sure that feature values are not strings
    665         # we need to check this for both sparse and dense arrays
--> 666         for val in examples.features.data:
    667             if isinstance(val, str):
    668                 raise TypeError("You have feature values that are strings.  "

NotImplementedError: multi-dimensional sub-views are not implemented

The solution is to explicitly reshape the dense feature array into a 1-dimensional array before iterating over .data attribute.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions