[MRG] remove warnings in univariate feature selection#2369
[MRG] remove warnings in univariate feature selection#2369ogrisel merged 1 commit intoscikit-learn:masterfrom
Conversation
|
+1 |
|
👍 for removal, as long as we use a stable, non-random, sort. The reason is that I want to have 100% reproducibility. The default sort used by argsort is quicksort which is not stable. Should we switch to a heapsort, which is stable, but has the drawback of requiring p/2 work space in memory? I think that the work space requirement is not too bad, is it is in O(p) and not O(n p). |
|
What is p in this formula? According to Wikipedia, heapsort should require O(1) auxiliary space (apart from the n indices allocated by |
Number of features in the learning problem.
Correct, I made a mistake and meant mergesort rather than heapsort, which |
|
Actually there's a heapsort in NumPy master and it seems to have been there since the days of |
|
Timings: Again, with fresh random numbers: Memory usage: Without the |
So let's use mergesort. I don't find the memory-usage numbers |
|
while you're at it I'd also like to have a stable sort in StratifiedKFold :) On Mon, Aug 19, 2013 at 2:33 PM, Gael Varoquaux
|
PR welcomed :P |
I've heard this before ;) |
These warnings are issued practically always when using frequency-valued or boolean data. Switched to a stable sort to get reproducible results.
|
Force-pushed a new version. Time to go back to the actual experiment I was performing, @agramfort stratified k-fold is yours :p |
|
👍 for merge. Thanks! |
[MRG] remove warnings in univariate feature selection
|
I pushed the green button as travis was happy. |
These warnings are practically always triggered when doing text classification or any task with lots of boolean features. I suggest to just remove them, since in those cases the warning is so confusing that it does more harm than good.