-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Better error message needed when accidentally supplying scalar value to parameter tuning classes #12640
Description
Description
As with issue #12621, the hyperparameter optimizers are not very helpful when one (for example) supplies a string as an element in the supplied param_grid.
Steps/Code to Reproduce
Here's what did it for me
# Parameters suggested by previous top scorers from a RandomizedSearchCV
param_grid = {'colsample_bytree': [1],
'learning_rate': [0.05, 0.1, 0.3],
'max_depth': [5, 8],
'n_estimators': [100]}
tree_model = GridSearchCV(xgb.XGBRegressor(), param_grid, cv=3, verbose=100)
tree_model.fit(X_train, y_train)
# And now the linear approach
param_grid['booster'] = ['gblinear']
linear_model = GridSearchCV(xgb.XGBRegressor(), param_grid, cv=3, verbose=100)
linear_model.fit(X_train, y_train)Expected Results
Something along the lines of: "Scalar value supplied in param_grid needs to be wrapped in a list or array of one element."
It doesn't seem like a huge change but it draws direct attention to the issue rather than having the user wonder why the error is happening. IIRC a fairly substantial amount of Python code will accept a scalar just as well as a list or similar so the way things are now may defy user expectations. And in fact if you start typing "parameter values for parameter" into Google there is at least one autocomplete result which suggests that a lot of people have had this issue. A number of Stack Exchange posts also appear if the search is carried out. I tend to think that others are thinking what I did: "But booster is supposed to be a string!"
Alternatively, accepting the scalar value and wrapping it automatically might be even better.
Actual Results
Traceback (most recent call last):
File "predict.py", line 48, in <module>
linear_model = GridSearchCV(xgb.XGBRegressor(), param_grid, cv=3, verbose=100)
File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_search.py", line 1187, in __init__
_check_param_grid(param_grid)
File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_search.py", line 379, in _check_param_grid
" np.ndarray.".format(name))
ValueError: Parameter values for parameter (booster) need to be a sequence(but not a string) or np.ndarray.
Versions
I have now upgraded from Ubuntu's obsolete 0.19 package:
System
------
python: 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609]
machine: Linux-4.4.0-137-generic-i686-with-Ubuntu-16.04-xenial
executable: /usr/bin/python3
BLAS
----
macros: HAVE_CBLAS=None
cblas_libs: openblas, openblas
lib_dirs: /usr/lib
Python deps
-----------
sklearn: 0.20.0
scipy: 1.0.1
pip: 10.0.0
Cython: 0.28.1
pandas: 0.23.0.dev0+708.gc4b4a81.dirty
numpy: 1.14.2
setuptools: 39.0.1