Conversation
…ices. Also enabled the tests. NOTE: Only tested on rectangular matrices of shape nxm such that m > n. Tests need to be expanded to test m < n.
… rows. All assignments are made, but the algorithm wants to keep going because there are some rows left.
|
Looks good thanks. Out of curiosity, are you using the Hungarian for a machine learning related task? I have been a bit slow at merging it, because the machine-learning usage that I had went away, and I hate merging code in the scikit without a clear cut usecase that fits in the big picture. |
|
On Mon, Jun 6, 2011 at 4:29 PM, GaelVaroquaux <
No, I am using it for storm cell tracking. No machine-learning here. Also, If it doesn't seem to belong in scikits, I am fine with it going somewhere Ben Root |
|
It is actually used a lot in the machine learning community; it's just that in the scikit, we don't currently have a usage for it. I use it in machine-learning related code that is not yet in the scikit. The code is fairly stand alone and under a BSD license, so it could fit in scipy too, but it is probably too specific for that. I still think that it should be integrated in the scikit-learn, I just need to find a good usecase. |
Improved error handling
changes from GaelVaroquaux
Update random forest example to demo n_jobs=2
…scikit-learn#7838) * initial commit for return_std * initial commit for return_std * adding tests, examples, ARD predict_std * adding tests, examples, ARD predict_std * a smidge more documentation * a smidge more documentation * Missed a few PEP8 issues * Changing predict_std to return_std #1 * Changing predict_std to return_std #2 * Changing predict_std to return_std #3 * Changing predict_std to return_std final * adding better plots via polynomial regression * trying to fix flake error * fix to ARD plotting issue * fixing some flakes * Two blank lines part 1 * Two blank lines part 2 * More newlines! * Even more newlines * adding info to the doc string for the two plot files * Rephrasing "polynomial" for Bayesian Ridge Regression * Updating "polynomia" for ARD * Adding more formal references * Another asked-for improvement to doc string. * Fixing flake8 errors * Cleaning up the tests a smidge. * A few more flakes * requested fixes from Andy * Mini bug fix * Final pep8 fix * pep8 fix round 2 * Fix beta_ to alpha_ in the comments
* resurrect quantile scaler * move the code in the pre-processing module * first draft * Add tests. * Fix bug in QuantileNormalizer. * Add quantile_normalizer. * Implement pickling * create a specific function for dense transform * Create a fit function for the dense case * Create a toy examples * First draft with sparse matrices * remove useless functions and non-negative sparse compatibility * fix slice call * Fix tests of QuantileNormalizer. * Fix estimator compatibility * List of functions became tuple of functions * Check X consistency at transform and inverse transform time * fix doc * Add negative ValueError tests for QuantileNormalizer. * Fix cosmetics * Fix compatibility numpy <= 1.8 * Add n_features tests and correct ValueError. * PEP8 * fix fill_value for early scipy compatibility * simplify sampling * Fix tests. * removing last pring * Change choice for permutation * cosmetics * fix remove remaining choice * DOC * Fix inconsistencies * pep8 * Add checker for init parameters. * hack bounds and make a test * FIX/TST bounds are provided by the fitting and not X at transform * PEP8 * FIX/TST axis should be <= 1 * PEP8 * ENH Add parameter ignore_implicit_zeros * ENH match output distribution * ENH clip the data to avoid infinity due to output PDF * FIX ENH restraint to uniform and norm * [MRG] ENH Add example comparing the distribution of all scaling preprocessor (#2) * ENH Add example comparing the distribution of all scaling preprocessor * Remove Jupyter notebook convert * FIX/ENH Select feat before not after; Plot interquantile data range for all * Add heatmap legend * Remove comment maybe? * Move doc from robust_scaling to plot_all_scaling; Need to update doc * Update the doc * Better aesthetics; Better spacing and plot colormap only at end * Shameless author re-ordering ;P * Use env python for she-bang * TST Validity of output_pdf * EXA Use OrderedDict; Make it easier to add more transformations * FIX PEP8 and replace scipy.stats by str in example * FIX remove useless import * COSMET change variable names * FIX change output_pdf occurence to output_distribution * FIX partial fixies from comments * COMIT change class name and code structure * COSMIT change direction to inverse * FIX factorize transform in _transform_col * PEP8 * FIX change the magic 10 * FIX add interp1d to fixes * FIX/TST allow negative entries when ignore_implicit_zeros is True * FIX use np.interp instead of sp.interpolate.interp1d * FIX/TST fix tests * DOC start checking doc * TST add test to check the behaviour of interp numpy * TST/EHN Add the possibility to add noise to compute quantile * FIX factorize quantile computation * FIX fixes issues * PEP8 * FIX/DOC correct doc * TST/DOC improve doc and add random state * EXA add examples to illustrate the use of smoothing_noise * FIX/DOC fix some grammar * DOC fix example * DOC/EXA make plot titles more succint * EXA improve explanation * EXA improve the docstring * DOC add a bit more documentation * FIX advance review * TST add subsampling test * DOC/TST better example for the docstring * DOC add ellipsis to docstring * FIX address olivier comments * FIX remove random_state in sparse.rand * FIX spelling doc * FIX cite example in user guide and docstring * FIX olivier comments * EHN improve the example comparing all the pre-processing methods * FIX/DOC remove title * FIX change the scaling of the figure * FIX plotting layout * FIX ratio w/h * Reorder and reword the plot_all_scaling example * Fix aspect ratio and better explanations in the plot_all_scaling.py example * Fix broken link and remove useless sentence * FIX fix couples of spelling * FIX comments joel * FIX/DOC address documentation comments * FIX address comments joel * FIX inline sparse and dense transform * PEP8 * TST/DOC temporary skipping test * FIX raise an error if n_quantiles > subsample * FIX wording in smoothing_noise example * EXA Denis comments * FIX rephrasing * FIX make smoothing_noise to be a boolearn and change doc * FIX address comments * FIX verbose the doc slightly more * PEP8/DOC * ENH: 2-ways interpolation to avoid smoothing_noise Simplifies also the code, examples, and documentation
removing exception handling in favor of conditional check
…y calculation (scikit-learn#11464) * Fix to allow M * Updated MAE test to consider sample_weights in calculation * Removed comment * Fixed: E501 line too long (82 > 79 characters) * syntax correction * Added fix details * Changed to use consistent datatypes during calculaions * Corrected formatting * Requested Changes * removed explicit casts * Removed unnecessary explicits * Removed unnecessary explicit casts * added additional test * updated comments * Requested changes incl additional unit test * fix mistake * formatting * removed whitespace * added test notes * formatting * Requested changes * Trailing space fix attempt * Trailing whitespace fix attempt #2 * Remove trailing whitespace
* Add averaging option to AMI and NMI Leave current behavior unchanged * Flake8 fixes * Incorporate tests of means for AMI and NMI * Add note about `average_method` in NMI * Update docs from AMI, NMI changes (#1) * Correct the NMI and AMI descriptions in docs * Update docstrings due to averaging changes - V-measure - Homogeneity - Completeness - NMI - AMI * Update documentation and remove nose tests (#2) * Update v0.20.rst * Update test_supervised.py * Update clustering.rst * Fix multiple spaces after operator * Rename all arguments * No more arbitrary values! * Improve handling of floating-point imprecision * Clearly state when the change occurs * Update AMI/NMI docs * Update v0.20.rst * Catch FutureWarnings in AMI and NMI
…13243) * Remove unused code * Squash all the PR 9040 commits initial PR commit seq_dataset.pyx generated from template seq_dataset.pyx generated from template #2 rename variables fused types consistency test for seq_dataset a sklearn/utils/tests/test_seq_dataset.py new if statement add doc sklearn/utils/seq_dataset.pyx.tp minor changes minor changes typo fix check numeric accuracy only up 5th decimal Address oliver's request for changing test name add test for make_dataset and rename a variable in test_seq_dataset * FIX tests * TST more numerically stable test_sgd.test_tol_parameter * Added benchmarks to compare SAGA 32b and 64b * Fixing gael's comments * fix * solve some issues * PEP8 * Address lesteve comments * fix merging * avoid using assert_equal * use all_close * use explicit ArrayDataset64 and CSRDataset64 * fix: remove unused import * Use parametrized to cover ArrayDaset-CSRDataset-32-64 matrix * for consistency use 32 first then 64 + add 64 suffix to variables * it would be cool if this worked !!! * more verbose version * revert SGD changes as much as possible. * Add solvers back to bench_saga * make 64 explicit in the naming * remove checking native python type + add comparison between 32 64 * Add whatsnew with everyone with commits * simplify a bit the testing * simplify the parametrize * update whatsnew * fix pep8
* initial commit * used random class * fixed failing testcases, reverted __init__.py * fixed failing testcases #2 - passed rng as parameter to ParameterSampler class - changed seed from 0 to 42 (as original) * fixed failing testcases #2 - passed rng as parameter to SparseRandomProjection class * fixed failing testcases #4 - passed rng as parameter to GaussianRandomProjection class * fixed failing test case because of flake 8
…scikit-learn#10591) * Initial add DET curve to classification metrics * Add DET to exports * Fix DET-curve doctest errors - Sample snippet in model_evaluation documentation was outdated. * Clarify wording in DET-curve computation - Align to the wording of ranking module to make it consistent. - Add correct describtion of input and outputs. - Update and fix non-existent links * Beautify DET curve documentation source - Limit line length to 80 characters. * Expand DET curve documentation - Add an example plot to show difference between ROC and DET curves. - Expand Usage Note section with background information and properties of DET curves. * Update DET-curve documentation - Fix typos and some grammar improvements. - Use named references to avoid potential conflicts with other sections. - Remove unneeded references and improved existing ones by using e.g. using versioned links. * Select relevant DET points using slice object * Remove some dubiety from DET curve doc-string * Add DET curve contributors * Add tests for DET curves * Streamline DET test by using parametrization * Increase verbosity of DET curve error handling - Explicitly sanity check input before computing a DET curve. - Add test for perfect scores. - Adapt indentation style to match the test module. * Add reference for DET curves in invariance test * Add automated invariance checks for DET curves * Resolve merge artifacts * Make doctest happy * Fix whitespaces for doctest * Revert unintended whitespace changes * Revert unintended white space changes #2 * Fix typos and grammar * Fix white space in doc * Streamline test code * Remove rebase artifacts * Fix PR link in doc * Fix test_ranking * Fix rebase errors * Fix import * Bring back newlines - Swallowed by copy/paste * Remove uncited ref link * Remove matplotlib deprecation warning * Bring back hidden reference * Add motivation to DET example * Fix lint * Add citation * Use modern matplotlib API Co-authored-by: Jeremy Karnowski <jeremy.karnowski@gmail.com> Co-authored-by: Julien Cornebise <julien@cornebise.com> Co-authored-by: Daniel Mohns <daniel.mohns@zenguard.org>
Better CSS for doc links
This should make the hungarian algorithm accept rectangular cost matrices. Also enabled the tests.
NOTE: Only tested on rectangular matrices of shape n-by-m such that m > n. Tests need to be expanded to test m < n.