Correct hungarian support for rectangular matrices by WeatherGod · Pull Request #4 · GaelVaroquaux/scikit-learn

WeatherGod · 2011-06-09T21:50:28Z

Additional testing revealed that my previous square2rect branch was wrong in the case of m > n. Therefore, this branch fixes that mistake by going the route of padding the array and recording the original matrix shape.

Also modified the hungarian test to also test the transpose of the test arrays to see if identical results occur.

As a consequence of this new approach, we do not need some of the code that was introduced by the zerohungarian branch (but we do still need to keep some of it).

…ices. Also enabled the tests. NOTE: Only tested on rectangular matrices of shape nxm such that m > n. Tests need to be expanded to test m < n.

… rows. All assignments are made, but the algorithm wants to keep going because there are some rows left.

Square2rect

…f the cost function is zero-length. The result in such a situation should be an empty array.

Zerohungarian

…, we do it by padding the array. Also modified the hungarian test to also test the transpose of the test arrays to see if identical results occur.

WeatherGod · 2011-06-10T15:18:35Z

Heh, oddly enough... my tracking results are better with the code from zerohungarian branch. Weird...

GaelVaroquaux · 2011-06-26T21:41:45Z

Hey,

Sorry, that pull request had slipped under my radar. I was about to merge it, but I noticed you last remark, and I am not sure what to do about it. Could you comment a bit more on it: I don't see how supporting non square matrix should affect the performance of a hungarian algorithm. But maybe I understood you wrong.

WeatherGod · 2011-06-27T15:31:07Z

No, it is not a performance issue. I am not 100% sure that the results are correct. I will be doing further analysis today.

GaelVaroquaux · 2011-06-27T15:40:07Z

No, it is not a performance issue.

OK, so you are saying that the square matrices are taking a performance
hit because of the support for non-square matrices. That's supprising, it
would seem to me that non-square matrices can simply be extend to square
matrices, and thus that square matrix code shouldn't be changed (as you
can see, I haven't reviewed your code well, and right now I am at a
conference and I can do it).

I am not 100% sure that the results are correct. I will be doing further analysis today.

Tahnks, keep me poster.

WeatherGod · 2011-06-27T15:48:20Z

OK, so you are saying that the square matrices are taking a performance hit because of the support for non-square matrices. That's supprising, it would seem to me that non-square matrices can simply be extend to square matrices, and thus that square matrix code shouldn't be changed (as you can see, I haven't reviewed your code well, and right now I am at a conference and I can do it).

No... I am not talking at all about performance (in terms of speed or memory usage). I am only talking about the correctness of the algorithm output. I need to find some more test cases to see how the algorithm is right or wrong.

WeatherGod · 2011-06-28T19:45:41Z

Ok, I finally figured it out. It was a combination of issues. First, we don't need to pad the cost matrix. One could also transpose it when m > n and then just make sure to switch the columns of the results before returning it.

Second (and most importantly!), the functional method "hungarian()" strips out important information when the matrix is rectangular. When the input cost matrix is square, we know that there will be an association for each row. However, when m > n, there can only be n associations, which means some rows are not accounted for in the results list from H.compute(). Because hungarian() strips the first column from the results list, we typically loop over the enumeration of results, accidentally assuming that the results are for the first n rows.

Therefore, I recommend not stripping the first column from the results list in hungarian(). Also, if desired, I can make a new pull request that implements the rectangular hungarian solver without cost padding.

GaelVaroquaux · 2011-06-28T20:49:12Z

Ok, I finally figured it out. It was a combination of issues. First, we don't need to pad the cost matrix.

OK, that's good to know.

One could also transpose it when m > n and then just make sure to switch the columns of the results before returning it.

Yes, that seems the right thing to do.

Therefore, I recommend not stripping the first column from the results list in hungarian().

Fair enough. Could you just make sure that find_permutation still works.

Also, if desired, I can make a new pull request that implements the rectangular hungarian solver without cost padding.

That would be great. Thanks

Documentation

Update tutorial to match the current master API + typo. Thanks Lars.

* initial commit * used random class * fixed failing testcases, reverted __init__.py * fixed failing testcases #2 - passed rng as parameter to ParameterSampler class - changed seed from 0 to 42 (as original) * fixed failing testcases #2 - passed rng as parameter to SparseRandomProjection class * fixed failing testcases #4 - passed rng as parameter to GaussianRandomProjection class * fixed failing test case because of flake 8

* ENH Adds files * ENH Adds permutation importance * RFC Better names * STY Flake8 * ENH: Adds inspect module * DOC Adds pre_dispatch * DOC Adds permutation importance example * Trigger CI * BLD Adds inspect to configuration * RFC Update to only inspect fitted model * RFC Removes parameters * ENH: Adds pandas support * STY Flake8 * DOC Adds new permutation importance example * ENH Renames module to model_inspection * DOC Fix links * DOC Fixes image link * DOC Fixes image link * DOC Spelling * DOC * TST Fix keyword * Rework RF Imp vs Perm Imp example (#4) * WIP * WIP * WIP * DOC Adds multcollinear features example * WIP * DOC: Clean up docs * TST Adds tests for strings * STY Indent correction * WIP * ENH Uses check_X_y * TST Adds test with strings * STY Fix * TST Adds column transformer to test * CLN Address comments * CLN Removes import * TST Adds test with nan * CLN Removes import * ENH Parallel * DOC comments * ENH Better handling of pandas * ENH Clear checking of pandas dataframe * STY Formatting * ENH Copies in parallel helper * DOC Adds comments * BUG Fix copying * BUG Fix for pandas * BUG Fix for pandas * REV * BLD Trigger CI * BUG Fix * BUG Fix * TST Does this work * BUG Fixes test * BUG Fixes test * BUG Fix * BUG Fix * BUG Fix * STY Fix * TST Fix * TST Fix segfault * CLN Address comments * CLN Address comments * ENH Returns a bunch * STY Flake8 * CLN Renames bunch key * DOC Updates api * DOC Updates api * TST Adds permutation test with linear_regression * DOC update * DOC Fix label cutoff * CLN Address comments * TST Adds test for random_state effect * DOC Adds permutation importance * DOC Adds ogrisel suggestion * DOC Address guillaumes comments * DOC Address andreas comments * DOC Update

GaelVaroquaux and others added 14 commits May 15, 2011 17:24

ENH: Add the hungarian algorithm

2145ef4

TEST: Increase testing of hungarian

9377307

MISC: cosmit in hungarian

8c8dd18

ENH: Speed up in hungarian

70e511b

ENH: More speedups in hungarian

54b7ae6

ENH: More speedups in hungarian

adf1d00

ENH: Still more speed ups in Hungarian

f805703

ENH: More speedups on Hungarian

562d5b3

This should make the hungarian algorithm accept rectangular cost matr…

1e49908

…ices. Also enabled the tests. NOTE: Only tested on rectangular matrices of shape nxm such that m > n. Tests need to be expanded to test m < n.

An additional check needed in case where there are fewer columns than…

38f9a46

… rows. All assignments are made, but the algorithm wants to keep going because there are some rows left.

Merge pull request GaelVaroquaux#2 from WeatherGod/square2rect

372b125

Square2rect

Added support for hungarian assignment problems where one dimension o…

a3c733d

…f the cost function is zero-length. The result in such a situation should be an empty array.

Merge pull request GaelVaroquaux#3 from WeatherGod/zerohungarian

5070fa1

Zerohungarian

Fixing previous attempt at supporting rectangular matrices. This time…

1010668

…, we do it by padding the array. Also modified the hungarian test to also test the transpose of the test arrays to see if identical results occur.

Restoring the check for empty arrays.

491c6d5

agramfort pushed a commit that referenced this pull request Dec 29, 2011

Merge pull request #4 from glouppe/dev-doc

382bea2

Documentation

WeatherGod closed this Jan 28, 2012

GaelVaroquaux pushed a commit that referenced this pull request Jan 18, 2014

Merged pull request #4 from larsmans/master.

f7afbcd

Update tutorial to match the current master API + typo. Thanks Lars.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct hungarian support for rectangular matrices#4

Correct hungarian support for rectangular matrices#4
WeatherGod wants to merge 15 commits intoGaelVaroquaux:hungarianfrom
WeatherGod:correctRect

WeatherGod commented Jun 9, 2011

Uh oh!

WeatherGod commented Jun 10, 2011

Uh oh!

GaelVaroquaux commented Jun 26, 2011

Uh oh!

WeatherGod commented Jun 27, 2011

Uh oh!

GaelVaroquaux commented Jun 27, 2011

Uh oh!

WeatherGod commented Jun 27, 2011

Uh oh!

WeatherGod commented Jun 28, 2011

Uh oh!

GaelVaroquaux commented Jun 28, 2011

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

WeatherGod commented Jun 9, 2011

Uh oh!

WeatherGod commented Jun 10, 2011

Uh oh!

GaelVaroquaux commented Jun 26, 2011

Uh oh!

WeatherGod commented Jun 27, 2011

Uh oh!

GaelVaroquaux commented Jun 27, 2011

Uh oh!

WeatherGod commented Jun 27, 2011

Uh oh!

WeatherGod commented Jun 28, 2011

Uh oh!

GaelVaroquaux commented Jun 28, 2011

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants