Pprett/gradient boosting#6
Conversation
|
@glouppe some of the tests fail due to numerical issues (an aftermath of changing dtype). I fixed those but I notice a performance regression for the following benchmark:: it goes from:: to:: |
|
hmm... I think I hunted it down:: This is 4 times the usual timing due to y and y_pred having different dtype. |
sklearn/tree/tree.py
Outdated
There was a problem hiding this comment.
why should init_error or best_error have type DTYPE which is the dtype of the data array? Either use np.float32 or np.float64. I tend to use np.float64 whenever possible (i.e. when memory consumption is not an issue).
|
wow... seems like 32bit floating point arithmetic in numpy is substantially slower than 64bit arithmetic:: vs 32bit:: it seems that |
|
Wow that's huge. I was not aware of this. Actually, my machine is 32 bits that's the reason why I like to have the possibility to not use float64. I will have a deeper look at it tomorrow. I'll revert my changes if I come to no good solution. |
|
it might be slower on 64bit machines but a 6-fold increase is too 2012/3/19 Gilles Louppe
Peter Prettenhofer |
|
Gilles, I just checked the other (regression) models in sklearn, it seems that only |
|
Okay, I agree. I'll revert my changes tomorrow. On 19 March 2012 22:32, Peter Prettenhofer
|
This reverts commit 3509e16. Conflicts: sklearn/ensemble/gradient_boosting.py sklearn/tree/tree.py
|
I just pushed a reverse commit. |
|
@glouppe thanks - I updated |
nitpick fixes, pep8 and fix math equations
Revised text classification chapter
This is my first bunch of commits regarding your PR.
I really like how you managed to remove the "terminal" mechanisms from the Tree code :)
My changes are the following:
Most of those do not actually concern the boosting module. I still have to review the gradient_boosting.py file into more depth. (Later today or tomorrow).