Skip to content

Reimplemented Graph.DataFrame with improved perfomance#418

Merged
ntamas merged 3 commits intoigraph:masterfrom
fwitter:master
Jun 22, 2021
Merged

Reimplemented Graph.DataFrame with improved perfomance#418
ntamas merged 3 commits intoigraph:masterfrom
fwitter:master

Conversation

@fwitter
Copy link
Copy Markdown
Contributor

@fwitter fwitter commented Jun 22, 2021

New implementation is ~500 times faster than the current one.

@iosonofabio
Copy link
Copy Markdown
Member

Great to see work in this direction. Can you please explain where the performance increase is coming from? Just making sure we're not losing generality

@fwitter
Copy link
Copy Markdown
Contributor Author

fwitter commented Jun 22, 2021

For motivation and details, please see #419 .

With my reimplementation, I tried to stick to the behavior of the current implementation. However, I propose some changes:

  • Additional checks for negative vertex IDs when use_vids=True
  • Additional checks for valid index of the vertex DataFrame containing vertex IDs when use_vids=True
  • Setting use_vids=True as the default to align with the rest of the DataFrame API, such that one could use g_clone = Graph.DataFrame(g.get_edge_dataframe(), g.is_directed(), g.get_vertex_dataframe()) to clone a graph g

Moreover, I extended the unit test for Graph.DataFrame.

@iosonofabio
Copy link
Copy Markdown
Member

Thank you again. Please explain where the performance is coming from... The issue doesn't say either

@fwitter
Copy link
Copy Markdown
Contributor Author

fwitter commented Jun 22, 2021

Sure. I replaced the parts of the code which I identified to have the biggest impact on runtime. These are:

  1. I replaced the call to np.setdiff1d by using Pandas DataFrame.isin (line 3495)
  2. I changed the nested for-loops for adding vertex attributes one by one into a single for-loop iterating over columns and transferring each attribute for all vertices in batch (lines 3500-3501)
  3. I changed the nested for-loops for adding edge attributes one by one into converting the edges DataFrame to a dictionary and passing the attributes directly to Graph.add_edges (lines 3505-3506)
  4. I replaced the creation of the edges list using Python's zip by using DataFrame.itertuples (line 3504)

@iosonofabio
Copy link
Copy Markdown
Member

Thank you. I'm suspecting pandas might throw weird exceptions is we do batch operations hoping the data frames are consistent, did you check for those cases?

@fwitter
Copy link
Copy Markdown
Contributor Author

fwitter commented Jun 22, 2021

By consistency, you mean that the vertex IDs / names must be consistent throughout both edge and vertex DataFrames? This is checked and an exception is raised in case consistency is not given. The behavior did not change compared to the current implementation.

@iosonofabio
Copy link
Copy Markdown
Member

Thanks, sounds good. @ntamas ?

@ntamas
Copy link
Copy Markdown
Member

ntamas commented Jun 22, 2021

I'm not that familiar with Pandas, but it looks good to me! I have a slight problem with changing the default value of use_vids=... to True from False as it is technically a breaking change so I could not release this in a patch release as is. Would it be okay to change the default back to False temporarily and change it to True only in the next minor release?

@fwitter
Copy link
Copy Markdown
Contributor Author

fwitter commented Jun 22, 2021

Yes, I am fine with that. I reverted the change of the default value for use_vids and will create a new PR for that change which can be merged before the next minor release.

@ntamas
Copy link
Copy Markdown
Member

ntamas commented Jun 22, 2021

Great, thanks! I'll wait for the CI checks to complete and then I'll squash-and-merge this.

@ntamas ntamas added this to the 0.9.7 milestone Jun 22, 2021
@ntamas ntamas merged commit 2981fe1 into igraph:master Jun 22, 2021
@ntamas
Copy link
Copy Markdown
Member

ntamas commented Jun 22, 2021

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants