-
Notifications
You must be signed in to change notification settings - Fork 68
Closed
Description
This method is called by Learner._train_setup() and it checks that regression labels are not strings and that feature values (for both classification and regression) are not strings. However, this method does not work as expected if the featureset is read in as dense rather than sparse. Here's a minimal test case:
>>> from skll.data import NDJReader
>>> fs1 = NDJReader.for_path("examples/boston/train/example_boston_features.jsonlines", sparse=False).read()
>>> l1 = Learner('LinearRegression')
>>> fs2 = NDJReader.for_path("examples/iris/train/example_iris_features.jsonlines", sparse=False).read()
>>> l2 = Learner('LogisticRegression')
>>> l1.train(fs1, grid_search=False)
...
~/work/skll/skll/learner/__init__.py in _check_input_formatting(self, examples)
664 # make sure that feature values are not strings
665 # we need to check this for both sparse and dense arrays
--> 666 for val in examples.features.data:
667 if isinstance(val, str):
668 raise TypeError("You have feature values that are strings. "
NotImplementedError: multi-dimensional sub-views are not implemented
>>> l2.train(fs2, grid_search=False)
....
~/work/skll/skll/learner/__init__.py in _check_input_formatting(self, examples)
664 # make sure that feature values are not strings
665 # we need to check this for both sparse and dense arrays
--> 666 for val in examples.features.data:
667 if isinstance(val, str):
668 raise TypeError("You have feature values that are strings. "
NotImplementedError: multi-dimensional sub-views are not implementedThe solution is to explicitly reshape the dense feature array into a 1-dimensional array before iterating over .data attribute.