Skip to content

Correct hungarian support for rectangular matrices#4

Closed
WeatherGod wants to merge 15 commits intoGaelVaroquaux:hungarianfrom
WeatherGod:correctRect
Closed

Correct hungarian support for rectangular matrices#4
WeatherGod wants to merge 15 commits intoGaelVaroquaux:hungarianfrom
WeatherGod:correctRect

Conversation

@WeatherGod
Copy link
Copy Markdown

Additional testing revealed that my previous square2rect branch was wrong in the case of m > n. Therefore, this branch fixes that mistake by going the route of padding the array and recording the original matrix shape.

Also modified the hungarian test to also test the transpose of the test arrays to see if identical results occur.

As a consequence of this new approach, we do not need some of the code that was introduced by the zerohungarian branch (but we do still need to keep some of it).

GaelVaroquaux and others added 14 commits May 15, 2011 17:24
…ices. Also enabled the tests.

NOTE: Only tested on rectangular matrices of shape nxm such that m > n.  Tests need to be expanded to test m < n.
… rows.

All assignments are made, but the algorithm wants to keep going because there are some rows left.
…f the cost function is zero-length.

The result in such a situation should be an empty array.
…, we do it by padding the array.

Also modified the hungarian test to also test the transpose of the test arrays to see if identical results occur.
@WeatherGod
Copy link
Copy Markdown
Author

Heh, oddly enough... my tracking results are better with the code from zerohungarian branch. Weird...

@GaelVaroquaux
Copy link
Copy Markdown
Owner

Hey,

Sorry, that pull request had slipped under my radar. I was about to merge it, but I noticed you last remark, and I am not sure what to do about it. Could you comment a bit more on it: I don't see how supporting non square matrix should affect the performance of a hungarian algorithm. But maybe I understood you wrong.

@WeatherGod
Copy link
Copy Markdown
Author

No, it is not a performance issue. I am not 100% sure that the results are correct. I will be doing further analysis today.

@GaelVaroquaux
Copy link
Copy Markdown
Owner

No, it is not a performance issue.

OK, so you are saying that the square matrices are taking a performance
hit because of the support for non-square matrices. That's supprising, it
would seem to me that non-square matrices can simply be extend to square
matrices, and thus that square matrix code shouldn't be changed (as you
can see, I haven't reviewed your code well, and right now I am at a
conference and I can do it).

I am not 100% sure that the results are correct. I will be doing further analysis today.

Tahnks, keep me poster.

@WeatherGod
Copy link
Copy Markdown
Author

OK, so you are saying that the square matrices are taking a performance hit because of the support for non-square matrices. That's supprising, it would seem to me that non-square matrices can simply be extend to square matrices, and thus that square matrix code shouldn't be changed (as you can see, I haven't reviewed your code well, and right now I am at a conference and I can do it).

No... I am not talking at all about performance (in terms of speed or memory usage). I am only talking about the correctness of the algorithm output. I need to find some more test cases to see how the algorithm is right or wrong.

@WeatherGod
Copy link
Copy Markdown
Author

Ok, I finally figured it out. It was a combination of issues. First, we don't need to pad the cost matrix. One could also transpose it when m > n and then just make sure to switch the columns of the results before returning it.

Second (and most importantly!), the functional method "hungarian()" strips out important information when the matrix is rectangular. When the input cost matrix is square, we know that there will be an association for each row. However, when m > n, there can only be n associations, which means some rows are not accounted for in the results list from H.compute(). Because hungarian() strips the first column from the results list, we typically loop over the enumeration of results, accidentally assuming that the results are for the first n rows.

Therefore, I recommend not stripping the first column from the results list in hungarian(). Also, if desired, I can make a new pull request that implements the rectangular hungarian solver without cost padding.

@GaelVaroquaux
Copy link
Copy Markdown
Owner

Ok, I finally figured it out. It was a combination of issues. First, we don't need to pad the cost matrix.

OK, that's good to know.

One could also transpose it when m > n and then just make sure to switch the columns of the results before returning it.

Yes, that seems the right thing to do.

Therefore, I recommend not stripping the first column from the results list in hungarian().

Fair enough. Could you just make sure that find_permutation still works.

Also, if desired, I can make a new pull request that implements the rectangular hungarian solver without cost padding.

That would be great. Thanks

agramfort pushed a commit that referenced this pull request Dec 29, 2011
@WeatherGod WeatherGod closed this Jan 28, 2012
GaelVaroquaux pushed a commit that referenced this pull request Jan 18, 2014
Update tutorial to match the current master API + typo. Thanks Lars.
GaelVaroquaux pushed a commit that referenced this pull request Mar 2, 2019
* initial commit

* used random class

* fixed failing testcases, reverted __init__.py

* fixed failing testcases #2
- passed rng as parameter to ParameterSampler class
- changed seed from 0 to 42 (as original)

* fixed failing testcases #2
- passed rng as parameter to SparseRandomProjection class

* fixed failing testcases #4
- passed rng as parameter to GaussianRandomProjection class

* fixed failing test case because of flake 8
GaelVaroquaux pushed a commit that referenced this pull request Sep 8, 2019
* ENH Adds files

* ENH Adds permutation importance

* RFC Better names

* STY Flake8

* ENH: Adds inspect module

* DOC Adds pre_dispatch

* DOC Adds permutation importance example

* Trigger CI

* BLD Adds inspect to configuration

* RFC Update to only inspect fitted model

* RFC Removes parameters

* ENH: Adds pandas support

* STY Flake8

* DOC Adds new permutation importance example

* ENH Renames module to model_inspection

* DOC Fix links

* DOC Fixes image link

* DOC Fixes image link

* DOC Spelling

* DOC

* TST Fix keyword

* Rework RF Imp vs Perm Imp example (#4)

* WIP

* WIP

* WIP

* DOC Adds multcollinear features example

* WIP

* DOC: Clean up docs

* TST Adds tests for strings

* STY Indent correction

* WIP

* ENH Uses check_X_y

* TST Adds test with strings

* STY Fix

* TST Adds column transformer to test

* CLN Address comments

* CLN Removes import

* TST Adds test with nan

* CLN Removes import

* ENH Parallel

* DOC comments

* ENH Better handling of pandas

* ENH Clear checking of pandas dataframe

* STY Formatting

* ENH Copies in parallel helper

* DOC Adds comments

* BUG Fix copying

* BUG Fix for pandas

* BUG Fix for pandas

* REV

* BLD Trigger CI

* BUG Fix

* BUG Fix

* TST Does this work

* BUG Fixes test

* BUG Fixes test

* BUG Fix

* BUG Fix

* BUG Fix

* STY Fix

* TST Fix

* TST Fix segfault

* CLN Address comments

* CLN Address comments

* ENH Returns a bunch

* STY Flake8

* CLN Renames bunch key

* DOC Updates api

* DOC Updates api

* TST Adds permutation test with linear_regression

* DOC update

* DOC Fix label cutoff

* CLN Address comments

* TST Adds test for random_state effect

* DOC Adds permutation importance

* DOC Adds ogrisel suggestion

* DOC Address guillaumes comments

* DOC Address andreas comments

* DOC Update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants