Reimplemented Graph.DataFrame with improved perfomance#418
Reimplemented Graph.DataFrame with improved perfomance#418ntamas merged 3 commits intoigraph:masterfrom fwitter:master
Conversation
|
Great to see work in this direction. Can you please explain where the performance increase is coming from? Just making sure we're not losing generality |
|
For motivation and details, please see #419 . With my reimplementation, I tried to stick to the behavior of the current implementation. However, I propose some changes:
Moreover, I extended the unit test for |
|
Thank you again. Please explain where the performance is coming from... The issue doesn't say either |
|
Sure. I replaced the parts of the code which I identified to have the biggest impact on runtime. These are:
|
|
Thank you. I'm suspecting pandas might throw weird exceptions is we do batch operations hoping the data frames are consistent, did you check for those cases? |
|
By consistency, you mean that the vertex IDs / names must be consistent throughout both edge and vertex DataFrames? This is checked and an exception is raised in case consistency is not given. The behavior did not change compared to the current implementation. |
|
Thanks, sounds good. @ntamas ? |
|
I'm not that familiar with Pandas, but it looks good to me! I have a slight problem with changing the default value of |
|
Yes, I am fine with that. I reverted the change of the default value for |
|
Great, thanks! I'll wait for the CI checks to complete and then I'll squash-and-merge this. |
|
Thanks a lot! |
New implementation is ~500 times faster than the current one.