Skip to content

Remove cython md5 hashing since it breaks the build process#3254

Merged
sciunto merged 3 commits intoscikit-image:masterfrom
hmaarrfk:remove_md5_hashing
Oct 12, 2018
Merged

Remove cython md5 hashing since it breaks the build process#3254
sciunto merged 3 commits intoscikit-image:masterfrom
hmaarrfk:remove_md5_hashing

Conversation

@hmaarrfk
Copy link
Copy Markdown
Member

@hmaarrfk hmaarrfk commented Jul 7, 2018

In developing #3253, I had to keep issuing commands:

rm */*/*.md5 */*/*.c 
git checkout -- skimage/restoration/unwrap_2d_ljmu.c skimage/restoration/unwrap_3d_ljmu.c

this is because the hashing doesn't consider pxd files.
This leads to the cython compiler doing really weird things and crashing.

Finally, cython will check itself if the files have changed, so this isn't necessary.

As a bonus, if you clump together c and c++ files seperately, cython will cythonize them in parallel using our infrastructure. This speeds things up greatly on multicore machines.

You have to hard code the number of threads to use, or maybe somebody knows the nproc command in python, but 8 is a safe number for now :/ Thanks @emmanuelle

Checklist

[It's fine to submit PRs which are a work in progress! But before they are merged, all PRs should provide:]

[For detailed information on these and other aspects see scikit-image contribution guidelines]

References

[If this is a bug-fix or enhancement, it closes issue # ]
[If this is a new feature, it implements the following paper: ]

For reviewers

(Don't remove the checklist below.)

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.
  • Consider backporting the PR with @meeseeksdev backport to v0.14.x

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jul 8, 2018

Codecov Report

Merging #3254 into master will decrease coverage by 0.76%.
The diff coverage is 6.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3254      +/-   ##
==========================================
- Coverage   87.77%   87.01%   -0.77%     
==========================================
  Files         323      340      +17     
  Lines       27138    27430     +292     
==========================================
+ Hits        23820    23867      +47     
- Misses       3318     3563     +245
Impacted Files Coverage Δ
skimage/restoration/setup.py 26.31% <0%> (ø)
skimage/filters/setup.py 27.77% <0%> (ø)
skimage/transform/setup.py 31.25% <0%> (ø)
skimage/morphology/setup.py 27.77% <0%> (ø)
skimage/_shared/setup.py 33.33% <0%> (ø)
skimage/feature/setup.py 22.72% <0%> (ø)
skimage/measure/setup.py 27.77% <0%> (ø)
skimage/segmentation/setup.py 35.71% <0%> (ø)
skimage/graph/setup.py 33.33% <0%> (ø)
skimage/_build.py 26.66% <20%> (ø)
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef1d91a...7a3c154. Read the comment docs.

@hmaarrfk hmaarrfk force-pushed the remove_md5_hashing branch from 5487ae2 to e52dfe7 Compare July 8, 2018 22:18
@hmaarrfk
Copy link
Copy Markdown
Member Author

hmaarrfk commented Jul 8, 2018

Other people thinking about parallel builds: matplotlib/matplotlib#1507

@hmaarrfk
Copy link
Copy Markdown
Member Author

hmaarrfk commented Aug 6, 2018

As a note, if you setup CCACHE correctly, then recompiling skimage takes almost no time.

@hmaarrfk hmaarrfk force-pushed the remove_md5_hashing branch from d17a82c to c2e4b4a Compare August 6, 2018 00:06
Copy link
Copy Markdown
Member

@soupault soupault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

pyx_files[i] = pyxfile.replace('.pyx.in', '.pyx')

return md5_cached != md5_new.encode('utf-8')
cythonize(pyx_files, nthreads=8)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about hardcoding the number of threads. I mean this looks like a reasonable default, but I'm just not sure :).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me experiment with this number tonight. Cython might use ncores*2 automatically.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note for it.

@soupault soupault added 🔧 type: Maintenance Refactoring and maintenance of internals and removed ⚖️ Needs decision labels Aug 12, 2018
@soupault soupault added this to the 0.15 milestone Aug 12, 2018
@hmaarrfk hmaarrfk force-pushed the remove_md5_hashing branch from c2e4b4a to 5f11cb4 Compare August 13, 2018 12:46
@hmaarrfk hmaarrfk changed the title Remove cython md5 hashing as it isn't needed anymore Remove cython md5 hashing since it breaks the build process Sep 12, 2018
@sciunto
Copy link
Copy Markdown
Member

sciunto commented Sep 12, 2018

This might (partially) solve #3387 don't you think?

@sciunto
Copy link
Copy Markdown
Member

sciunto commented Sep 12, 2018

@hmaarrfk Could you rebase please? Could you indicate your command line for ccache as well?

Copy link
Copy Markdown
Member

@stefanv stefanv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new feature in Cython? How does it know not to regenerate the C file?

pyx_files[i] = pyxfile.replace('.pyx.in', '.pyx')

return md5_cached != md5_new.encode('utf-8')
# Cython doesn't automatically chose a number of threads > 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

@hmaarrfk hmaarrfk force-pushed the remove_md5_hashing branch 2 times, most recently from ccd97f9 to 0544afd Compare September 12, 2018 08:24
@hmaarrfk
Copy link
Copy Markdown
Member Author

@stefanv beats me, this might be kinda like "I'm so new to cython I do not really remember when it didn't do that"

@hmaarrfk
Copy link
Copy Markdown
Member Author

@sciunto I don't really know a reliable installation guide, can you try

sudo apt install ccache
export PATH="/usr/lib/ccache:$PATH"

type these manually to test

make clean
make
make clean 
make

@hmaarrfk
Copy link
Copy Markdown
Member Author

I have this in my ~/.profile fille

export PATH="/usr/lib/ccache:$PATH"

@hmaarrfk
Copy link
Copy Markdown
Member Author

➤ which gcc                                          
/usr/lib/ccache/gcc

➤ which g++                                          
/usr/lib/ccache/g++

@hmaarrfk
Copy link
Copy Markdown
Member Author

How did python 3.7 finish so quickly? 15 mins vs 30+ mins.

@sciunto
Copy link
Copy Markdown
Member

sciunto commented Sep 12, 2018

I gained 2 minutes (from 7 to 5) on my machine. I changed the path to reflect the difference of my distribution. (which g++ /usr/lib/ccache/bin/g++) The gain is not tremendous.

@hmaarrfk
Copy link
Copy Markdown
Member Author

ccache should make gcc commands instant...

i7 laptop:

  • Cythonizing 1 at a time, no ccache 4m 12s
  • Cythonizing 1 at a time, with ccache 58s
  • Parallel cythonizing with ccache 29s (this branch)

@hmaarrfk
Copy link
Copy Markdown
Member Author

Can you give the output of:

ccache --zero-stats 
make clean
make
ccache --show-stats

Mine is (once ccache is built of course)

cache directory                     /home/mark/.ccache
primary config                      /home/mark/.ccache/ccache.conf
secondary config      (readonly)    /etc/ccache.conf
cache hit (direct)                    47
cache hit (preprocessed)               0
cache miss                             0
called for link                       50
no input file                          1
files in cache                       180
cache size                         117.7 MB
max cache size                       5.0 GB

@sciunto
Copy link
Copy Markdown
Member

sciunto commented Sep 12, 2018

Thanks for sharing.

Quite weird over here. My cache size was 2GB over 5GB. Cache hits were low and only preprocessed.
I cleared the cache. Now, the hits are only direct. However, a make clean; make still takes the same duration. Anyway, it might be a specific issue on my side.

@hmaarrfk
Copy link
Copy Markdown
Member Author

it is not always perfect, and soemtimes finiky, but yeah, sometimes it works.

@stefanv
Copy link
Copy Markdown
Member

stefanv commented Sep 12, 2018 via email

@hmaarrfk
Copy link
Copy Markdown
Member Author

@stefanv but we aren't calling cythonize directly, we are calling it through python. I think it behaves differently for me:

Try this command

python -c 'from Cython.Build import cythonize; cythonize("foo.pyx")'

@hmaarrfk
Copy link
Copy Markdown
Member Author

@stefanv I think cython and cythonize are two different commands. Dam dam dam.............

cythonize

date; ls -lah foo.c; cythonize foo.pyx; ls -lah foo.c; python -c 'import Cython; print(Cython.__version__)'                                                                                                (owl) 
Wed Sep 12 14:08:09 EDT 2018
-rw-rw-r-- 1 mark mark 90K Sep 12 14:05 foo.c
-rw-rw-r-- 1 mark mark 90K Sep 12 14:05 foo.c
0.28.5

cython

date; ls -lah foo.c; cython foo.pyx; ls -lah foo.c; python -c 'import Cython; print(Cython.__version__)'                                                                                                   (owl) 
Wed Sep 12 14:10:31 EDT 2018
-rw-rw-r-- 1 mark mark 89K Sep 12 14:09 foo.c
-rw-rw-r-- 1 mark mark 89K Sep 12 14:10 foo.c
0.28.5

@stefanv
Copy link
Copy Markdown
Member

stefanv commented Sep 13, 2018 via email

@emmanuelle
Copy link
Copy Markdown
Member

For an automatic selection of the number of threads, use multiprocessing.cpu_count() (joblib code is useful to find such goodies :-))

@hmaarrfk hmaarrfk force-pushed the remove_md5_hashing branch 3 times, most recently from a566694 to 4dd5152 Compare September 30, 2018 17:38
@hmaarrfk
Copy link
Copy Markdown
Member Author

Thanks @emmanuelle tests are now failing because of #3432

Copy link
Copy Markdown
Member

@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, this now may be holding up other PRs, so I'm merging. Thanks @hmaarrfk! Based on the discussion, this looks brilliant!

@hmaarrfk
Copy link
Copy Markdown
Member Author

@jni, let me rebase.

@hmaarrfk
Copy link
Copy Markdown
Member Author

Thanks for waiting @jni, this build stuff tends to break if we wait too long between test + merge.

@sciunto
Copy link
Copy Markdown
Member

sciunto commented Oct 12, 2018

I merge for @jni.

@sciunto sciunto merged commit 9dd29ae into scikit-image:master Oct 12, 2018
@jni
Copy link
Copy Markdown
Member

jni commented Oct 12, 2018

💯

@sciunto
Copy link
Copy Markdown
Member

sciunto commented Oct 14, 2018

A comment regarding the discussion on CCACHE. I checked on a system with a i7 CPU, and it divided the duration by a factor 4. It might be that my old atom CPU is not good enough for such optimization.

@hmaarrfk
Copy link
Copy Markdown
Member Author

Sorry you have an Atom 😢

@hmaarrfk hmaarrfk deleted the remove_md5_hashing branch October 14, 2018 12:55
@hmaarrfk hmaarrfk mentioned this pull request Oct 19, 2019
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🔧 type: Maintenance Refactoring and maintenance of internals

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants