Skip to content

[MRG] Added variable_name as a parameter for check_array#8178

Closed
dalmia wants to merge 10 commits intoscikit-learn:masterfrom
dalmia:add_variable_name
Closed

[MRG] Added variable_name as a parameter for check_array#8178
dalmia wants to merge 10 commits intoscikit-learn:masterfrom
dalmia:add_variable_name

Conversation

@dalmia
Copy link
Copy Markdown
Contributor

@dalmia dalmia commented Jan 9, 2017

What does this implement/fix? Explain your changes.

This adds variable_name as a parameter for check_array to be used as a placeholder for error messages. Related files using check_array on instances passed as a parameter have also been updated.

Any other comments?

Discussed in #7996

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where variables are the result of computation, check_array is used to recast into a particular dtype or order, and/or to ensure finiteness. Perhaps having a variable name in these cases is a distraction in the code, but I'm okay with it.

Note that the alternative is to remove X from error messages (or make the fact that it's merely a placeholder clearer), and expect the user to read the traceback carefully. I personally think this is the right way to go, personally.

Arguably the same issue applies to column_or_1d.


"""
covariance = check_array(covariance)
covariance = check_array(covariance, variable_name='covariance')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is not provided by the user, and it's a bit of a mystery to me that check_array is here at all

Copy link
Copy Markdown
Contributor Author

@dalmia dalmia Feb 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the variable_name argument from here for now.


def _check_init(A, shape, whom):
A = check_array(A)
A = check_array(A, variable_name='A')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable name is actually included in whom (see calls to the function, below). Replace whom with variable_name.

_, _, X_offset, _, X_scale = _preprocess_data(X, y, fit_intercept,
normalize,
return_mean=True)
normalize,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually avoid fixing incidental things. It makes reviewing PRs harder.

if check_input:
precompute = check_array(precompute, dtype=X.dtype.type,
order='C')
order='C', variable_name='precompute')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precompute is not user-provided, but this is okay.

"""
weights = check_array(weights, dtype=[np.float64, np.float32],
ensure_2d=False)
ensure_2d=False, variable_name='weights')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user sees this variable as "weights_init"

"""
means = check_array(means, dtype=[np.float64, np.float32], ensure_2d=False)
means = check_array(means, dtype=[np.float64, np.float32], ensure_2d=False,
variable_name='means')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

means_init

ensure_2d=False,
allow_nd=covariance_type == 'full')
allow_nd=covariance_type == 'full',
variable_name='precisions')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precisions_init

@dalmia
Copy link
Copy Markdown
Contributor Author

dalmia commented Jan 10, 2017

I agree on making the error message independent of the variable_name. Also, better for future usages of check_array. I thought maybe something like this for check_array:

Got 1-dimensional input. Reshape your data either using .reshape(-1, 1)
if your data has a single feature or .reshape(1, -1) if it contains a single sample.

And maybe for column_or_1d:

A column-vector was passed when a 1d array was expected. Please 
change the shape of your data to (n_samples, ), for example using ravel().

@dalmia
Copy link
Copy Markdown
Contributor Author

dalmia commented Jan 12, 2017

Should we go ahead with this? @jnothman

@dalmia
Copy link
Copy Markdown
Contributor Author

dalmia commented Feb 6, 2017

ping @jnothman


dictionary = check_array(dictionary.T, order='F', dtype=np.float64,
copy=False)
copy=False, variable_name='dictionary')
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove variable_name here as it's not seen by the user.

# Check if dimensions are consistent.
transformed_labels = check_array(transformed_labels)
transformed_labels = check_array(transformed_labels,
variable_name='transformed_labels')
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user can't see this.

if (label_type == "multilabel-indicator" and
len(set(check_array(y, ['csr', 'csc', 'coo']).shape[1]
len(set(check_array(y, ['csr', 'csc', 'coo'],
variable_name='y').shape[1]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user doesn't see it as y.

@dalmia
Copy link
Copy Markdown
Contributor Author

dalmia commented Feb 18, 2017

I think that we shouldn't leave the responsibility of figuring out what the error is, to the user and that trying to come up with a general description of the error might not be the right thing to do. On the other hand, variable_name helps to identify the error faster. Not adding variable_name to any function call doesn't cause any harm in any case, but can be used in situations where the name in the error message might be confusing (#7996). Also, as you mentioned, I did the same for column_or_1d as the same argument holds.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 18, 2017

Codecov Report

Merging #8178 into master will not change coverage.
The diff coverage is 98.95%.

@@           Coverage Diff           @@
##           master    #8178   +/-   ##
=======================================
  Coverage   94.75%   94.75%           
=======================================
  Files         342      342           
  Lines       60813    60813           
=======================================
  Hits        57621    57621           
  Misses       3192     3192
Impacted Files Coverage Δ
sklearn/metrics/cluster/supervised.py 99.23% <ø> (ø)
sklearn/utils/mocking.py 100% <ø> (ø)
sklearn/metrics/pairwise.py 97.21% <ø> (ø)
sklearn/linear_model/sag.py 93.84% <ø> (ø)
sklearn/metrics/ranking.py 98.75% <100%> (ø)
sklearn/neural_network/multilayer_perceptron.py 97.81% <100%> (ø)
sklearn/decomposition/dict_learning.py 93.45% <100%> (ø)
sklearn/linear_model/ridge.py 93.88% <100%> (ø)
sklearn/dummy.py 98.24% <100%> (ø)
sklearn/base.py 94.15% <100%> (ø)
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f952e43...57bf914. Read the comment docs.

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Jun 8, 2017

Thanks for all the work, though I think the proposal in #8178 (comment) that we avoid mentioning the variable by name in error messages might be the better way to go.

@jnothman
Copy link
Copy Markdown
Member

Approach of #8178 (comment) adopted in #9068

@lesteve lesteve closed this Jun 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants