Skip to content

Move a few imports inside functions to improve import speed of the library#4296

Merged
jarrodmillman merged 9 commits intonetworkx:masterfrom
dschult:time_imports
Nov 12, 2020
Merged

Move a few imports inside functions to improve import speed of the library#4296
jarrodmillman merged 9 commits intonetworkx:masterfrom
dschult:time_imports

Conversation

@dschult
Copy link
Copy Markdown
Member

@dschult dschult commented Oct 28, 2020

Some of the imports are at the module level still -- instead of inside a function.
This negates our efforts at lazy importing, thus increasing import time (by a very small amount but users are still complaining).

Fixes #2900

I will take another look at this in light of #4277 before it should be merged.

Feedback? @pb-cdunn

@dschult dschult added this to the networkx-2.6 milestone Oct 28, 2020
Copy link
Copy Markdown
Contributor

@Erotemic Erotemic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few comments. Hope these help.

Copy link
Copy Markdown

@pb-cdunn pb-cdunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess all I want is to avoid importing the most expensive packages (scipy and numpy?) when they are not needed.

But I think the pattern in this PR could be problematic. (See my other comment.)

@Erotemic
Copy link
Copy Markdown
Contributor

@pb-cdunn Nested imports do not import multiple times. There is a global sys.modules dictionary that maps module names to loaded modules. The second time around its as fast as a dict lookup (one of the fastest things in Python).

You are right about the not found case. This can be avoided by having some cached function like:

import functools

@functools.lru_cache(None)
def _scipy_module():
    try:
        import scipy
    except Exception:
        scipy = None
    return scipy
In [4]: with ubelt.Timer('foo'): 
   ...:     _scipy_module() 
   ...:                                                                                                                                

tic('foo')
...toc('foo')=0.0151s

In [5]: with ubelt.Timer('foo'): 
   ...:     _scipy_module() 
   ...:                                                                                                                                

tic('foo')
...toc('foo')=0.0000s

This avoids the issue with the not-found case. However, how many times are you calling this in the not found case when you don't have the ubiquitous scipy or numpy installed? And how large of graphs are those pure-python methods able to handle? Does this actually matter in the case where graph size is going to bottleneck you before file-system searches? If the answer is No, I think the current import style might be simpler and faster in the most common use case.

On numpy, I guess avoiding it is nice in some applications, but parallel computation on long lists of numbers is near-universally helpful. The numpy packages is also significantly smaller and more ubiquitous (a difficult feat to accomplish) than scipy. I typically put import scipy inside functions, but I think there is an argument for assuming numpy globally (because avoiding that dict lookup does still help in critical sections).

@jarrodmillman jarrodmillman merged commit ddf06e1 into networkx:master Nov 12, 2020
@SeanDS SeanDS mentioned this pull request Nov 12, 2020
@dschult dschult deleted the time_imports branch November 15, 2020 19:48
@pb-cdunn
Copy link
Copy Markdown

I typically put import scipy inside functions, but I think there is an argument for assuming numpy globally...

I completely disagree with that. This is a graph library, not a numerics library. Anyway, I'll look at this again someday after this is released.

@Erotemic
Copy link
Copy Markdown
Contributor

This is a graph library, not a numerics library

There are deep connections between graph theory and linear algebra. In some cases I imagine its faster to compute the answer to a graph question by converting to a matrix representation, solving the reduced problem and converting back. My point is that graphs and numerics are not mutually exclusive. I'm not saying that numpy must be assumed to exist and globally imported, but I am saying there is an argument for it.

@pb-cdunn
Copy link
Copy Markdown

True. But it's unfortunate that we are forced to load things we do not need.

The real problem was that, long ago, all components were star-imported into the top module, and many others recursively, for convenience. That was a mistake, I think inarguably. Now we're looking for work-arounds. If scipy is no longer load, that should help.

@Erotemic
Copy link
Copy Markdown
Contributor

Yeah, I agree that was a mistake. I used to do that with my libraries as well.

Sometimes, I still do like to have something close to that functionality, but I never use import * anymore. Instead I use a package I wrote called mkinit (https://github.com/Erotemic/mkinit) that autogenerates explicit __init__.py files that have a similar effect to import * (it statically introspects module attributes and includes them depending on command line arguments and special dunder variables in the __init__.py itself).

If you are looking for workarounds, you could try using mkinit to generate the __init__.py files, which would give you backwards compatible behavior that you could gradually move away from. In PR #4349 I create two new submodules and there is a comment that documents how I used mkinit to generate each of them: (e.g.

mkinit ~/code/networkx/networkx/algorithms/minors/__init__.py -w
)

MridulS pushed a commit to MridulS/networkx that referenced this pull request Feb 4, 2023
…brary (networkx#4296)

* move scipy imports into functions

* run black

* Move xml and lxml import inside functions in graphml.py

* Move imports inside functions

* Add tests to graphml.py to appease the codecov CI

* rearrange KDTree import handling

* remove global np from alg_connectivity code

* fix black

* fix black
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Long import times

4 participants