Skip to content

correcting information criterion calculation in least_angle.py#6080

Closed
mehmetbasbug wants to merge 1 commit intoscikit-learn:masterfrom
mehmetbasbug:patch-1
Closed

correcting information criterion calculation in least_angle.py#6080
mehmetbasbug wants to merge 1 commit intoscikit-learn:masterfrom
mehmetbasbug:patch-1

Conversation

@mehmetbasbug
Copy link
Copy Markdown

The information criterion calculation is not compatible with the original paper
Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192.
APA

The information criterion calculation is not compatible with the original paper 
Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192.
APA
@amueller
Copy link
Copy Markdown
Member

amueller commented Oct 7, 2016

@mehmetbasbug sorry for the slow reply. @agramfort @GaelVaroquaux do you have any input on this?

@agramfort
Copy link
Copy Markdown
Member

sorry @mehmetbasbug for the slow reaction

can you point me to the equation you refer to?

@mehmetbasbug
Copy link
Copy Markdown
Author

Eq. 2.15 and Eq. 2.16
Basically, the error is due to taking the log of mse.


R = y[:, np.newaxis] - np.dot(X, coef_path_) # residuals
mean_squared_error = np.mean(R ** 2, axis=0)
sigma = np.var(y)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it sigma2

self.alphas_ = alphas_
with np.errstate(divide='ignore'):
self.criterion_ = n_samples * np.log(mean_squared_error) + K * df
self.criterion_ = n_samples * mean_squared_error/sigma + K * df
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me it should be

mean_squared_error / sigma2 + K * df

given that K is 2 or log(n)

ideally everything should be divided by n.

Did you check how it affects the example using BIC and AIC?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to divide everything by n then we should have

mean_squared_error / sigma2 + K * df / n_samples

It was a long while ago but I remember getting the same results as the R package from the authors when I used the corrected version.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok agreed. can you see why travis is not happy and add a test that compares with R solution? thx

Copy link
Copy Markdown

@yuachen yuachen Oct 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reply to #7692. In general, it is hard to estimate sigma2 with a single formula for LASSO. I looked at R implementation of selectiveInference Page 12.

Estimate of error standard deviation. If NULL (default), this is estimated using
the mean squared residual of the full least squares fit when n >= 2p, and using
the standard deviation of y when n < 2p.

So if n >= 2p, it should be mean_squared_error * n_samples / (n_samples - df - 1).
if n < 2p, it should be var(y).

For the diabetes data in travis test, n >= 2p, the first formula should be used. This is probably why travis is not happy.

Also, sigma2 should be moved to the second term as a multiplicative term. Dividing by sigma2 is numerically unstable

@agramfort
Copy link
Copy Markdown
Member

also it breaks the tests. Please see travis errors

@agramfort
Copy link
Copy Markdown
Member

@mehmetbasbug @yuachen follow up on #9022

please review if you have time

thanks

@agramfort agramfort closed this Jun 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants