-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Annif takes several seconds to start even when it's doing nothing but printing the version number or help text:
$ time annif --version
0.54.0.dev0
real 0m4,398s
user 0m4,322s
sys 0m0,470s
I investigated this a little bit using the -X importtime feature in Python 3.7+ and the tuna tool for visualizing profiling information. It seems that the time is mostly spent importing large libraries such as tensorflow, scikit-learn, optuna, connexion and nltk:
These libraries are all unnecessary in simple operations such as annif --help and --version so it would be better to avoid importing them altogether. There are some tutorials on lazy importing (e.g. this one) and the importlib library contains (since Python 3.5) a LazyLoader utility class that could be used here.
I experimented a bit with this lazy_import function but couldn't get it to work for nltk submodules:
# Adapted from: https://stackoverflow.com/questions/42703908/
def lazy_import(fullname):
"""lazily import a module the first time it is used"""
try:
return sys.modules[fullname]
except KeyError:
spec = importlib.util.find_spec(fullname)
module = importlib.util.module_from_spec(spec)
loader = importlib.util.LazyLoader(spec.loader)
# Make module with proper locking and get it inserted into sys.modules.
loader.exec_module(module)
return moduleThis needs more experimentation but for now I'm just opening the issue...
