Maybe this should be an enhancement proposal...
So I think our current __repr__ is not that helpful.
Most construction parameters are default parameters that are never seen by a user, so reporting them is basically noise.
We don't report other important things, though, like whether the model was fitted at all, or, like, what the training score is, or the training time.
I know R has a very different approach, and I'm not sure their approach is good. But I think our current approach is pretty suboptimal.
I think the __repr__ has become more important because of the popularity of jupyter notebook. If I run fit, I get the __repr__ back. And it is just noise.
A slight improvement might be to just print the construction parameters that are not set to the default value. But we could also think about something more helpful, and maybe something more model specific. GridSearchCV for example could report the best score, and the best parameters found etc.
I have been thinking about this for a while, but this is somewhat inspired by looking at #5299. Why does adding a faster solver change the __repr__ of PCA? That seems really weird to me. PCA is one of the simplest and most commonly used methods in ML, in particular in courses. Now people will constantly see something about tol and number of iterations that is probably not relevant for them at all.
Maybe this should be an enhancement proposal...
So I think our current
__repr__is not that helpful.Most construction parameters are default parameters that are never seen by a user, so reporting them is basically noise.
We don't report other important things, though, like whether the model was fitted at all, or, like, what the training score is, or the training time.
I know R has a very different approach, and I'm not sure their approach is good. But I think our current approach is pretty suboptimal.
I think the
__repr__has become more important because of the popularity of jupyter notebook. If I run fit, I get the__repr__back. And it is just noise.A slight improvement might be to just print the construction parameters that are not set to the default value. But we could also think about something more helpful, and maybe something more model specific. GridSearchCV for example could report the best score, and the best parameters found etc.
I have been thinking about this for a while, but this is somewhat inspired by looking at #5299. Why does adding a faster solver change the
__repr__ofPCA? That seems really weird to me.PCAis one of the simplest and most commonly used methods in ML, in particular in courses. Now people will constantly see something about tol and number of iterations that is probably not relevant for them at all.