[MRG+2] Rewrite setup.py files to handle cython dependencies#7719
[MRG+2] Rewrite setup.py files to handle cython dependencies#7719jnothman merged 12 commits intoscikit-learn:masterfrom
Conversation
|
+10000 Thanks heaps for working on this... |
2f13e88 to
ed6b671
Compare
lesteve
left a comment
There was a problem hiding this comment.
A few inline comments to help the review
sklearn/_build_utils/__init__.py
Outdated
| is_cpp_filename = filename.endswith('.cpp') | ||
|
|
||
| # files in src are .c and .cpp files that are not cython-generated | ||
| if 'src/' in filename and (is_c_filename or is_cpp_filename): |
There was a problem hiding this comment.
This is the most brittle part of the PR I think. I don't know whether there is a better way of doing it, suggestions welcome.
There was a problem hiding this comment.
I guess one possible improvement would be to add an additional argument in add_cython_extension like cython_generated_sources so that we can discriminate more easily betwen the .c files coming from cython and the ones in the repo.
sklearn/_build_utils/__init__.py
Outdated
| sources = [get_cython_source(filename) for filename in sources] | ||
| config.add_extension(name, sources, **kwargs) | ||
|
|
||
| config.ext_modules = cythonize(config.ext_modules) |
There was a problem hiding this comment.
Hmmm I need to double-check this, maybe it should be inside an if is_dev_version
sklearn/_build_utils/__init__.py
Outdated
| raise ValueError('Please install cython in order ' | ||
| 'to build a scikit-learn development version') | ||
|
|
||
| sources = [get_cython_source(filename) for filename in sources] |
There was a problem hiding this comment.
This is the part that switches between using .c{,pp} files when building a release and the .pyx when building a dev version.
By using Cython.Build.cythonize and switching between .c and .pyx files as appropriate cython dependencies are correctly taken into account.
ed6b671 to
1eade7d
Compare
|
Just to note that sometimes there are inter module dependencies that needs to be recythonized too... For instance any change in the tree's cython sources needs a recythonizing of the gradient boosting... Does this PR handle that? |
It seems like it does: Excerpt of the output: |
|
Amazing |
|
This is nice ! |
sklearn/_build_utils/__init__.py
Outdated
| is_cpp_filename = filename.endswith('.cpp') | ||
|
|
||
| # files in src are .c and .cpp files that are not cython-generated | ||
| if ('src' + os.sep) in filename and (is_c_filename or is_cpp_filename): |
There was a problem hiding this comment.
This is the most brittle part of the PR I think. I don't know whether there is a better way of doing it, suggestions welcome.
I guess one possible improvement would be to add an additional argument in add_cython_extension like cython_generated_sources so that we can discriminate more easily betwen the .c files coming from cython and the ones in the repo.
There was a problem hiding this comment.
This is fine I think. Optionally you could just hardcode a list with all the non cython source paths...
There was a problem hiding this comment.
How about checking if there exists a pyx file with the same name in the same folder?
There was a problem hiding this comment.
That's a simple way of doing it indeed.
There was a problem hiding this comment.
Or we could add .pyx extension when it's a cython file, and add a function like:
import os.path
def no_cythonize(extensions, **_ignore):
for extension in extensions:
sources = []
for sfile in extension.sources:
path, ext = os.path.splitext(sfile)
if ext in ('.pyx', '.py'):
if extension.language == 'c++':
ext = '.cpp'
else:
ext = '.c'
sfile = path + ext
sources.append(sfile)
extension.sources[:] = sources
return extensions(copy-paste from cython doc)
There was a problem hiding this comment.
I think it would make more sense this way around indeed. The .pyx would be mentioned in the setup.py and you would do the switching to .c or .cpp only for releases. I have to say I didn't fully understand the no_cythonize function when I read the doc but it makes more sense now (in particular the naming is confusing because it's not really related to Cython.Build.cythonize).
sklearn/_build_utils/__init__.py
Outdated
| is_dev_version = not os.path.exists(os.path.join(top_path, 'PKG-INFO')) | ||
|
|
||
| if is_dev_version: | ||
| sources = [get_cython_source(filename) for filename in sources] |
There was a problem hiding this comment.
This is the part that switches between using .c{,pp} files when building a release and the .pyx when building a dev version.
amueller
left a comment
There was a problem hiding this comment.
This looks pretty good if it works as expected. Also, I feel like this shouldn't live in scikit-learn alone. I feel like at least numpy, pandas and scikit-image need the same. Shouldn't we share a solution?
sklearn/_build_utils/__init__.py
Outdated
| is_cpp_filename = filename.endswith('.cpp') | ||
|
|
||
| # files in src are .c and .cpp files that are not cython-generated | ||
| if ('src' + os.sep) in filename and (is_c_filename or is_cpp_filename): |
There was a problem hiding this comment.
How about checking if there exists a pyx file with the same name in the same folder?
| raise ValueError('Please install cython in order ' | ||
| 'to build a scikit-learn development version') | ||
|
|
||
| config.ext_modules = cythonize(config.ext_modules) |
There was a problem hiding this comment.
So that takes care of the caching?
There was a problem hiding this comment.
From https://cython.readthedocs.io/en/latest/src/reference/compilation.html#compiling-with-distutils:
The cythonize command also allows for multi-threaded compilation and dependency resolution. Recompilation will be skipped if the target file is up to date with its main source file and dependencies.
It does work as expected in the few tests I run like touching sklearn/tree/_criterion.pxd and making sure that all the dependents are recompiled (see #7719 (comment)). If you have more complicated scenarios that could go wrong I'd be happy to try them out.
Maybe we could have this in scikit-learn for some time so that it gets stress-tested a bit before consolidating across multiple projects? |
Yes, but we could also try reaching out to others and see what their approaches and requirements are? |
|
Why WIP? |
+:100: |
|
As I said in the issue description:
No one complained about the Travis cache being useless as a side effect of this PR so I just pushed a commit that removes the build products from theTravis cache and changed the status to MRG. |
Cython dependencies are taken care of by Cython.Build.cythonize and based on file timestamps, so .C and .so files will always be rebuild from scratch on each build in Travis.
23ee373 to
7c731a6
Compare
More natural this way. Tweak the extensions to generate from .c and .cpp files for a release.
043f6d6 to
69c5ae1
Compare
+1. And then contribute it to Cython, IMHO. |
jnothman
left a comment
There was a problem hiding this comment.
I've not tested that this works. Otherwise, it looks good to me...
sklearn/utils/sparsetools/setup.py
Outdated
| import numpy | ||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Interestingly flake8_diff.sh is not able to spot these. Quickly looking at it this is because we do the diff without context (--unified=0) in order to avoid noise coming from lines that the PR hasn't touched.
sklearn/_build_utils/__init__.py
Outdated
| try: | ||
| from Cython.Build import cythonize | ||
| except ImportError: | ||
| raise ValueError('Please install cython in order ' |
There was a problem hiding this comment.
Given that we keep having issues related to cython versioning, I think we should be checking for or specifying version.
sklearn/_build_utils/__init__.py
Outdated
| extension.sources = sources | ||
|
|
||
|
|
||
| def tweak_extensions(top_path, config): |
There was a problem hiding this comment.
Can we call this something less opaque: conditional_cythonize or cythonize_if_development?
sklearn/_build_utils/__init__.py
Outdated
| return cblas_libs, blas_info | ||
|
|
||
|
|
||
| def tweak_extensions_to_build_from_c_and_cpp_files(extensions): |
There was a problem hiding this comment.
maybe just rename_cython_extensions?
|
@jnothman I have tackled your comments. |
sklearn/_build_utils/__init__.py
Outdated
| tweak_extensions_to_build_from_c_and_cpp_files(config.ext_modules) | ||
| build_from_c_and_cpp_files(config.ext_modules) | ||
| else: | ||
| message = ('Please install cython with a version >= 0.23' |
There was a problem hiding this comment.
We should really pull this min version out as a constant.
|
Otherwise LGTM. |
now that cython >= 0.23 requirement is enforced in setup.py
No easy way to put comments inside multi-line command
|
You can merge after CI approves. thanks! |
|
Thank a lot for this @lesteve! |
|
Is there a benefit to having this in 0.18.1? On 2 November 2016 at 23:00, Raghav RV notifications@github.com wrote:
|
|
Having this included in the backports will help avoid travis cache based error in the backport PR #7672 :D |
…cikit-learn#7719) * Rewriting of cythonization in setup.py By using Cython.Build.cythonize and switching between .c and .pyx files as appropriate cython dependencies are correctly taken into account. * Use cythonize once on the root config rather than in each subpackage * Fix for Windows * Remove caching from Travis Cython dependencies are taken care of by Cython.Build.cythonize and based on file timestamps, so .C and .so files will always be rebuild from scratch on each build in Travis. * Specify .pyx in setup.files for cython generated extensions More natural this way. Tweak the extensions to generate from .c and .cpp files for a release. * COSMIT Remove commented out code * Check cython version is greater than 0.23 * COSMIT better names for functions * flake8 fix (imported module not at top of file) * Install cython 0.23 for Python 2.6 now that cython >= 0.23 requirement is enforced in setup.py * Use module constant for minimum required cython version * Fix Travis install.sh No easy way to put comments inside multi-line command
|
Really enjoying this change, compiling is very much faster ! |
…cikit-learn#7719) * Rewriting of cythonization in setup.py By using Cython.Build.cythonize and switching between .c and .pyx files as appropriate cython dependencies are correctly taken into account. * Use cythonize once on the root config rather than in each subpackage * Fix for Windows * Remove caching from Travis Cython dependencies are taken care of by Cython.Build.cythonize and based on file timestamps, so .C and .so files will always be rebuild from scratch on each build in Travis. * Specify .pyx in setup.files for cython generated extensions More natural this way. Tweak the extensions to generate from .c and .cpp files for a release. * COSMIT Remove commented out code * Check cython version is greater than 0.23 * COSMIT better names for functions * flake8 fix (imported module not at top of file) * Install cython 0.23 for Python 2.6 now that cython >= 0.23 requirement is enforced in setup.py * Use module constant for minimum required cython version * Fix Travis install.sh No easy way to put comments inside multi-line command
…cikit-learn#7719) * Rewriting of cythonization in setup.py By using Cython.Build.cythonize and switching between .c and .pyx files as appropriate cython dependencies are correctly taken into account. * Use cythonize once on the root config rather than in each subpackage * Fix for Windows * Remove caching from Travis Cython dependencies are taken care of by Cython.Build.cythonize and based on file timestamps, so .C and .so files will always be rebuild from scratch on each build in Travis. * Specify .pyx in setup.files for cython generated extensions More natural this way. Tweak the extensions to generate from .c and .cpp files for a release. * COSMIT Remove commented out code * Check cython version is greater than 0.23 * COSMIT better names for functions * flake8 fix (imported module not at top of file) * Install cython 0.23 for Python 2.6 now that cython >= 0.23 requirement is enforced in setup.py * Use module constant for minimum required cython version * Fix Travis install.sh No easy way to put comments inside multi-line command
…cikit-learn#7719) * Rewriting of cythonization in setup.py By using Cython.Build.cythonize and switching between .c and .pyx files as appropriate cython dependencies are correctly taken into account. * Use cythonize once on the root config rather than in each subpackage * Fix for Windows * Remove caching from Travis Cython dependencies are taken care of by Cython.Build.cythonize and based on file timestamps, so .C and .so files will always be rebuild from scratch on each build in Travis. * Specify .pyx in setup.files for cython generated extensions More natural this way. Tweak the extensions to generate from .c and .cpp files for a release. * COSMIT Remove commented out code * Check cython version is greater than 0.23 * COSMIT better names for functions * flake8 fix (imported module not at top of file) * Install cython 0.23 for Python 2.6 now that cython >= 0.23 requirement is enforced in setup.py * Use module constant for minimum required cython version * Fix Travis install.sh No easy way to put comments inside multi-line command
…cikit-learn#7719) * Rewriting of cythonization in setup.py By using Cython.Build.cythonize and switching between .c and .pyx files as appropriate cython dependencies are correctly taken into account. * Use cythonize once on the root config rather than in each subpackage * Fix for Windows * Remove caching from Travis Cython dependencies are taken care of by Cython.Build.cythonize and based on file timestamps, so .C and .so files will always be rebuild from scratch on each build in Travis. * Specify .pyx in setup.files for cython generated extensions More natural this way. Tweak the extensions to generate from .c and .cpp files for a release. * COSMIT Remove commented out code * Check cython version is greater than 0.23 * COSMIT better names for functions * flake8 fix (imported module not at top of file) * Install cython 0.23 for Python 2.6 now that cython >= 0.23 requirement is enforced in setup.py * Use module constant for minimum required cython version * Fix Travis install.sh No easy way to put comments inside multi-line command
Fixes #7094 and maybe related issues if there are other. Closes #7708.
What does this implement/fix? Explain your changes.
Basically build_tools/cythonize.py doesn't take into account cython files dependencies. For example if you change _criterion.pxd the dependents .pyx don't get recompiled, potentially yielding to confusing segmentation faults.
The thing is that cython takes into acount correctly if you write your setup.py carefull (mostly by using Cython.Build.cythonize and switching sources between .c and .pyx as appropriate). Compared to #7708 which is a work-around this PR is trying to tackle the root problem.
Upsides:
Downsides:
This is WIP because I haven't disabled caching on Travis yet. Just wanted to show the changes in setup.py files and get feed-back for now.
ping @raghavrv, @jnothman, @amueller.