Reimplemented Graph.DataFrame with improved perfomance by fwitter · Pull Request #418 · igraph/python-igraph

fwitter · 2021-06-22T08:26:41Z

New implementation is ~500 times faster than the current one.

iosonofabio · 2021-06-22T08:31:12Z

Great to see work in this direction. Can you please explain where the performance increase is coming from? Just making sure we're not losing generality

fwitter · 2021-06-22T08:40:14Z

For motivation and details, please see #419 .

With my reimplementation, I tried to stick to the behavior of the current implementation. However, I propose some changes:

Additional checks for negative vertex IDs when use_vids=True
Additional checks for valid index of the vertex DataFrame containing vertex IDs when use_vids=True
Setting use_vids=True as the default to align with the rest of the DataFrame API, such that one could use g_clone = Graph.DataFrame(g.get_edge_dataframe(), g.is_directed(), g.get_vertex_dataframe()) to clone a graph g

Moreover, I extended the unit test for Graph.DataFrame.

iosonofabio · 2021-06-22T08:42:59Z

Thank you again. Please explain where the performance is coming from... The issue doesn't say either

fwitter · 2021-06-22T09:17:41Z

Sure. I replaced the parts of the code which I identified to have the biggest impact on runtime. These are:

I replaced the call to np.setdiff1d by using Pandas DataFrame.isin (line 3495)
I changed the nested for-loops for adding vertex attributes one by one into a single for-loop iterating over columns and transferring each attribute for all vertices in batch (lines 3500-3501)
I changed the nested for-loops for adding edge attributes one by one into converting the edges DataFrame to a dictionary and passing the attributes directly to Graph.add_edges (lines 3505-3506)
I replaced the creation of the edges list using Python's zip by using DataFrame.itertuples (line 3504)

iosonofabio · 2021-06-22T09:25:34Z

Thank you. I'm suspecting pandas might throw weird exceptions is we do batch operations hoping the data frames are consistent, did you check for those cases?

fwitter · 2021-06-22T10:45:44Z

By consistency, you mean that the vertex IDs / names must be consistent throughout both edge and vertex DataFrames? This is checked and an exception is raised in case consistency is not given. The behavior did not change compared to the current implementation.

iosonofabio · 2021-06-22T10:54:33Z

Thanks, sounds good. @ntamas ?

ntamas · 2021-06-22T11:27:42Z

I'm not that familiar with Pandas, but it looks good to me! I have a slight problem with changing the default value of use_vids=... to True from False as it is technically a breaking change so I could not release this in a patch release as is. Would it be okay to change the default back to False temporarily and change it to True only in the next minor release?

fwitter · 2021-06-22T15:55:18Z

Yes, I am fine with that. I reverted the change of the default value for use_vids and will create a new PR for that change which can be merged before the next minor release.

ntamas · 2021-06-22T18:12:09Z

Great, thanks! I'll wait for the CI checks to complete and then I'll squash-and-merge this.

ntamas · 2021-06-22T18:37:52Z

Thanks a lot!

Reimplemented Graph.DataFrame with improved perfomance

e085896

fwitter mentioned this pull request Jun 22, 2021

Improve Performance when Creating Large Graphs from Pandas DataFrames #419

Closed

Fixed wrong import

6bcbc25

Reverted change of default value for param use_vids

9f3b4a1

ntamas added this to the 0.9.7 milestone Jun 22, 2021

ntamas merged commit 2981fe1 into igraph:master Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplemented Graph.DataFrame with improved perfomance#418

Reimplemented Graph.DataFrame with improved perfomance#418
ntamas merged 3 commits intoigraph:masterfrom
fwitter:master

fwitter commented Jun 22, 2021

Uh oh!

iosonofabio commented Jun 22, 2021

Uh oh!

fwitter commented Jun 22, 2021

Uh oh!

iosonofabio commented Jun 22, 2021

Uh oh!

fwitter commented Jun 22, 2021

Uh oh!

iosonofabio commented Jun 22, 2021

Uh oh!

fwitter commented Jun 22, 2021

Uh oh!

iosonofabio commented Jun 22, 2021

Uh oh!

ntamas commented Jun 22, 2021

Uh oh!

fwitter commented Jun 22, 2021

Uh oh!

ntamas commented Jun 22, 2021

Uh oh!

ntamas commented Jun 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fwitter commented Jun 22, 2021

Uh oh!

iosonofabio commented Jun 22, 2021

Uh oh!

fwitter commented Jun 22, 2021

Uh oh!

iosonofabio commented Jun 22, 2021

Uh oh!

fwitter commented Jun 22, 2021

Uh oh!

iosonofabio commented Jun 22, 2021

Uh oh!

fwitter commented Jun 22, 2021

Uh oh!

iosonofabio commented Jun 22, 2021

Uh oh!

ntamas commented Jun 22, 2021

Uh oh!

fwitter commented Jun 22, 2021

Uh oh!

ntamas commented Jun 22, 2021

Uh oh!

ntamas commented Jun 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants